Commit Graph

92 Commits

Author SHA1 Message Date
doctorpangloss
a9a0f96408 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-09-22 14:29:50 -07:00
rattus128
e42682b24e
Reduce Peak WAN inference VRAM usage (#9898)
* flux: Do the xq and xk ropes one at a time

This was doing independendent interleaved tensor math on the q and k
tensors, leading to the holding of more than the minimum intermediates
in VRAM. On a bad day, it would VRAM OOM on xk intermediates.

Do everything q and then everything k, so torch can garbage collect
all of qs intermediates before k allocates its intermediates.

This reduces peak VRAM usage for some WAN2.2 inferences (at least).

* wan: Optimize qkv intermediates on attention

As commented. The former logic computed independent pieces of QKV in
parallel which help more inference intermediates in VRAM spiking
VRAM usage. Fully roping Q and garbage collecting the intermediates
before touching K reduces the peak inference VRAM usage.
2025-09-16 19:21:14 -04:00
Jedrzej Kosinski
d7f40442f9
Enable Runtime Selection of Attention Functions (#9639)
* Looking into a @wrap_attn decorator to look for 'optimized_attention_override' entry in transformer_options

* Created logging code for this branch so that it can be used to track down all the code paths where transformer_options would need to be added

* Fix memory usage issue with inspect

* Made WAN attention receive transformer_options, test node added to wan to test out attention override later

* Added **kwargs to all attention functions so transformer_options could potentially be passed through

* Make sure wrap_attn doesn't make itself recurse infinitely, attempt to load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention

* Turn off attention logging for now, make AttentionOverrideTestNode have a dropdown with available attention (this is a test node only)

* Make flux work with optimized_attention_override

* Add logs to verify optimized_attention_override is passed all the way into attention function

* Make Qwen work with optimized_attention_override

* Made hidream work with optimized_attention_override

* Made wan patches_replace work with optimized_attention_override

* Made SD3 work with optimized_attention_override

* Made HunyuanVideo work with optimized_attention_override

* Made Mochi work with optimized_attention_override

* Made LTX work with optimized_attention_override

* Made StableAudio work with optimized_attention_override

* Made optimized_attention_override work with ACE Step

* Made Hunyuan3D work with optimized_attention_override

* Make CosmosPredict2 work with optimized_attention_override

* Made CosmosVideo work with optimized_attention_override

* Made Omnigen 2 work with optimized_attention_override

* Made StableCascade work with optimized_attention_override

* Made AuraFlow work with optimized_attention_override

* Made Lumina work with optimized_attention_override

* Made Chroma work with optimized_attention_override

* Made SVD work with optimized_attention_override

* Fix WanI2VCrossAttention so that it expects to receive transformer_options

* Fixed Wan2.1 Fun Camera transformer_options passthrough

* Fixed WAN 2.1 VACE transformer_options passthrough

* Add optimized to get_attention_function

* Disable attention logs for now

* Remove attention logging code

* Remove _register_core_attention_functions, as we wouldn't want someone to call that, just in case

* Satisfy ruff

* Remove AttentionOverrideTest node, that's something to cook up for later
2025-09-12 18:07:38 -04:00
doctorpangloss
179c2d35c8 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-09-03 12:04:32 -07:00
comfyanonymous
e3018c2a5a
uso -> uxo/uno as requested. (#9688) 2025-09-02 16:12:07 -04:00
comfyanonymous
3412d53b1d
USO style reference. (#9677)
Load the projector.safetensors file with the ModelPatchLoader node and use
the siglip_vision_patch14_384.safetensors "clip vision" model and the
USOStyleReferenceNode.
2025-09-02 15:36:22 -04:00
comfyanonymous
27e067ce50
Implement the USO subject identity lora. (#9674)
Use the lora with FluxContextMultiReferenceLatentMethod node set to "uso"
and a ReferenceLatent node with the reference image.
2025-09-01 18:54:02 -04:00
comfyanonymous
b5ac6ed7ce
Fixes to make controlnet type models work on qwen edit and kontext. (#9581) 2025-08-27 15:26:28 -04:00
doctorpangloss
443bb45eaf Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-08-26 13:59:45 -07:00
Jedrzej Kosinski
fc247150fe
Implement EasyCache and Invent LazyCache (#9496)
* Attempting a universal implementation of EasyCache, starting with flux as test; I screwed up the math a bit, but when I set it just right it works.

* Fixed math to make threshold work as expected, refactored code to use EasyCacheHolder instead of a dict wrapped by object

* Use sigmas from transformer_options instead of timesteps to be compatible with a greater amount of models, make end_percent work

* Make log statement when not skipping useful, preparing for per-cond caching

* Added DIFFUSION_MODEL wrapper around forward function for wan model

* Add subsampling for heuristic inputs

* Add subsampling to output_prev (output_prev_subsampled now)

* Properly consider conds in EasyCache logic

* Created SuperEasyCache to test what happens if caching and reuse is moved outside the scope of conds, added PREDICT_NOISE wrapper to facilitate this test

* Change max reuse_threshold to 3.0

* Mark EasyCache/SuperEasyCache as experimental (beta)

* Make Lumina2 compatible with EasyCache

* Add EasyCache support for Qwen Image

* Fix missing comma, curse you Cursor

* Add EasyCache support to AceStep

* Add EasyCache support to Chroma

* Added EasyCache support to Cosmos Predict t2i

* Make EasyCache not crash with Cosmos Predict ImagToVideo latents, but does not work well at all

* Add EasyCache support to hidream

* Added EasyCache support to hunyuan video

* Added EasyCache support to hunyuan3d

* Added EasyCache support to LTXV (not very good, but does not crash)

* Implemented EasyCache for aura_flow

* Renamed SuperEasyCache to LazyCache, hardcoded subsample_factor to 8 on nodes

* Eatra logging when verbose is true for EasyCache
2025-08-22 22:41:08 -04:00
doctorpangloss
dfc47e0611 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-08-22 13:24:52 -07:00
comfyanonymous
c308a8840a
Add FluxKontextMultiReferenceLatentMethod node. (#9356)
This node is only useful if someone trains the kontext model to properly
use multiple reference images via the index method.

The default is the offset method which feeds the multiple images like if
they were stitched together as one. This method works with the current
flux kontext model.
2025-08-15 15:50:39 -04:00
doctorpangloss
a7aff3565b Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-06-26 16:57:25 -07:00
comfyanonymous
ef5266b1c1
Support Flux Kontext Dev model. (#8679) 2025-06-26 11:28:41 -04:00
doctorpangloss
f6339e8115 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-06-24 12:52:54 -07:00
comfyanonymous
f7fb193712
Small flux optimization. (#8611) 2025-06-20 05:37:32 -04:00
comfyanonymous
7e9267fa77
Make flux controlnet work with sd3 text enc. (#8599) 2025-06-19 18:50:05 -04:00
doctorpangloss
1d901e32eb Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-06-17 13:20:31 -07:00
doctorpangloss
82388d51a2 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-06-17 10:35:10 -07:00
comfyanonymous
8a4ff747bd
Fix mistake in last commit. (#8496)
* Move to right place.
2025-06-11 15:13:29 -04:00
comfyanonymous
af1eb58be8
Fix black images on some flux models in fp16. (#8495) 2025-06-11 15:09:11 -04:00
comfyanonymous
4248b1618f
Let chroma TE work on regular flux. (#8429) 2025-06-05 10:07:17 -04:00
doctorpangloss
040a324346 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-03-29 15:57:24 -07:00
comfyanonymous
3b19fc76e3 Allow disabling pe in flux code for some other models. 2025-03-18 05:09:25 -04:00
comfyanonymous
e8e990d6b8 Cleanup code. 2025-03-16 06:29:12 -04:00
comfyanonymous
9aac21f894 Fix issues with new hunyuan img2vid model and bumb version to v0.3.26 2025-03-09 05:07:22 -04:00
comfyanonymous
7395b0c0d1 Support new hunyuan video i2v model.
Use the new "v2 (replace)" guidance type in HunyuanImageToVideo and set
image_interleave to 4 on the "Text Encode Hunyuan Video" node.
2025-03-08 20:34:47 -05:00
doctorpangloss
693038738a Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-02-24 09:39:26 -08:00
HishamC
b124256817
Fix for running via DirectML (#6542)
* Fix for running via DirectML

Fix DirectML empty image generation issue with Flux1. add CPU fallback for unsupported path. Verified the model works on AMD GPUs

* fix formating

* update casual mask calculation
2025-02-11 17:11:32 -05:00
doctorpangloss
a3452f6e6a Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-01-28 13:45:51 -08:00
comfyanonymous
fb2ad645a3 Add FluxDisableGuidance node to disable using the guidance embed. 2025-01-20 14:50:24 -05:00
doctorpangloss
631d9e44c6 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-01-16 09:58:02 -08:00
comfyanonymous
31831e6ef1 Code refactor. 2025-01-16 07:23:54 -05:00
comfyanonymous
6320d05696 Slightly lower hunyuan video memory usage. 2025-01-16 00:23:01 -05:00
comfyanonymous
b7572b2f87 Fix and enforce no trailing whitespace. 2024-12-31 03:16:37 -05:00
doctorpangloss
7655be873c Updates to support Hunyuan Video 2024-12-25 22:39:12 -08:00
doctorpangloss
0fd407ae87 Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2024-12-24 16:48:03 -08:00
comfyanonymous
bda1482a27 Basic Hunyuan Video model support. 2024-12-16 19:35:40 -05:00
Raphael Walker
61b50720d0
Add support for attention masking in Flux (#5942)
* fix attention OOM in xformers

* allow passing attention mask in flux attention

* allow an attn_mask in flux

* attn masks can be done using replace patches instead of a separate dict

* fix return types

* fix return order

* enumerate

* patch the right keys

* arg names

* fix a silly bug

* fix xformers masks

* replace match with if, elif, else

* mask with image_ref_size

* remove unused import

* remove unused import 2

* fix pytorch/xformers attention

This corrects a weird inconsistency with skip_reshape.
It also allows masks of various shapes to be passed, which will be
automtically expanded (in a memory-efficient way) to a size that is
compatible with xformers or pytorch sdpa respectively.

* fix mask shapes
2024-12-16 18:21:17 -05:00
Chenlei Hu
0fd4e6c778
Lint unused import (#5973)
* Lint unused import

* nit

* Remove unused imports

* revert fix_torch import

* nit
2024-12-09 15:24:39 -05:00
doctorpangloss
f39b8dfebc Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2024-11-22 15:50:23 -08:00
comfyanonymous
8f0009aad0 Support new flux model variants. 2024-11-21 08:38:23 -05:00
doctorpangloss
264d84db39 Fix Pylint warnings 2024-11-18 14:10:58 -08:00
doctorpangloss
c0f072ee0f Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2024-11-18 13:12:31 -08:00
comfyanonymous
8ebf2d8831 Add block replace transformer_options to flux. 2024-11-12 08:00:39 -05:00
contentis
69694f40b3
fix dynamic shape export (#5490) 2024-11-04 14:59:28 -05:00
doctorpangloss
a8d8bff548 Improve support for torch compilation and sage attention 2024-10-29 19:22:26 -07:00
doctorpangloss
8512f361fe Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2024-10-14 15:26:27 -07:00
City
7d1c420d19 Flux torch.compile fix (#5082) 2024-10-09 09:47:46 -07:00
doctorpangloss
bbe2ed330c Memory management and compilation improvements
- Experimental support for sage attention on Linux
 - Diffusers loader now supports model indices
 - Transformers model management now aligns with updates to ComfyUI
 - Flux layers correctly use unbind
 - Add float8 support for model loading in more places
 - Experimental quantization approaches from Quanto and torchao
 - Model upscaling interacts with memory management better

This update also disables ROCm testing because it isn't reliable enough
on consumer hardware. ROCm is not really supported by the 7600.
2024-10-09 09:13:47 -07:00