Commit Graph

2533 Commits

Author SHA1 Message Date
comfyanonymous
2c2aa409b0
Log message for cudnn disable on AMD. (#10418) 2025-10-20 15:43:24 -04:00
patientx
657a7872ab
Merge branch 'comfyanonymous:master' into master 2025-10-19 15:20:17 +03:00
comfyanonymous
b4f30bd408
Pytorch is stupid. (#10398) 2025-10-19 01:25:35 -04:00
comfyanonymous
dad076aee6
Speed up chroma radiance. (#10395) 2025-10-18 23:19:52 -04:00
comfyanonymous
0cf33953a7
Fix batch size above 1 giving bad output in chroma radiance. (#10394) 2025-10-18 23:15:34 -04:00
comfyanonymous
5b80addafd
Turn off cuda malloc by default when --fast autotune is turned on. (#10393) 2025-10-18 22:35:46 -04:00
comfyanonymous
9da397ea2f
Disable torch compiler for cast_bias_weight function (#10384)
* Disable torch compiler for cast_bias_weight function

* Fix torch compile.
2025-10-17 20:03:28 -04:00
patientx
76dde47dbb
Merge branch 'comfyanonymous:master' into master 2025-10-18 00:05:02 +03:00
comfyanonymous
b1293d50ef
workaround also works on cudnn 91200 (#10375) 2025-10-16 19:59:56 -04:00
comfyanonymous
19b466160c
Workaround for nvidia issue where VAE uses 3x more memory on torch 2.9 (#10373) 2025-10-16 18:16:03 -04:00
patientx
7b0643ada1
Merge branch 'comfyanonymous:master' into master 2025-10-16 16:40:50 +03:00
Faych
afa8a24fe1
refactor: Replace manual patches merging with merge_nested_dicts (#10360) 2025-10-15 17:16:09 -07:00
Jedrzej Kosinski
493b81e48f
Fix order of inputs nested merge_nested_dicts (#10362) 2025-10-15 16:47:26 -07:00
patientx
26589a3a0b
Merge branch 'comfyanonymous:master' into master 2025-10-15 12:18:21 +03:00
comfyanonymous
1c10b33f9b
gfx942 doesn't support fp8 operations. (#10348) 2025-10-15 00:21:11 -04:00
comfyanonymous
3374e900d0
Faster workflow cancelling. (#10301) 2025-10-13 23:43:53 -04:00
comfyanonymous
dfff7e5332
Better memory estimation for the SD/Flux VAE on AMD. (#10334) 2025-10-13 22:37:19 -04:00
comfyanonymous
e4ea393666
Fix loading old stable diffusion ckpt files on newer numpy. (#10333) 2025-10-13 22:18:58 -04:00
comfyanonymous
c8674bc6e9
Enable RDNA4 pytorch attention on ROCm 7.0 and up. (#10332) 2025-10-13 21:19:03 -04:00
patientx
eae7a58e60
Merge branch 'comfyanonymous:master' into master 2025-10-14 02:07:30 +03:00
rattus128
95ca2e56c8
WAN2.2: Fix cache VRAM leak on error (#10308)
Same change pattern as 7e8dd275c2
applied to WAN2.2

If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-13 15:23:11 -04:00
comfyanonymous
e693e4db6a
Always set diffusion model to eval() mode. (#10331) 2025-10-13 14:57:27 -04:00
patientx
fa7942933b
Merge branch 'comfyanonymous:master' into master 2025-10-12 13:56:39 +03:00
comfyanonymous
a125cd84b0
Improve AMD performance. (#10302)
I honestly have no idea why this improves things but it does.
2025-10-12 00:28:01 -04:00
comfyanonymous
84e9ce32c6
Implement the mmaudio VAE. (#10300) 2025-10-11 22:57:23 -04:00
patientx
aa6afacc01
Merge branch 'comfyanonymous:master' into master 2025-10-10 02:25:35 +03:00
comfyanonymous
f1dd6e50f8
Fix bug with applying loras on fp8 scaled without fp8 ops. (#10279) 2025-10-09 19:02:40 -04:00
patientx
3553ce45e5
Merge branch 'comfyanonymous:master' into master 2025-10-09 23:40:21 +03:00
comfyanonymous
139addd53c
More surgical fix for #10267 (#10276) 2025-10-09 16:37:35 -04:00
patientx
77fc639ed2
Merge branch 'comfyanonymous:master' into master 2025-10-09 15:55:12 +03:00
comfyanonymous
6e59934089
Refactor model sampling sigmas code. (#10250) 2025-10-08 17:49:02 -04:00
patientx
2502069447
Merge branch 'comfyanonymous:master' into master 2025-10-07 14:02:12 +03:00
comfyanonymous
8aea746212
Implement gemma 3 as a text encoder. (#10241)
Not useful yet.
2025-10-06 22:08:08 -04:00
patientx
e5c08bb5c2
Merge branch 'comfyanonymous:master' into master 2025-10-06 00:08:39 +03:00
comfyanonymous
195e0b0639
Remove useless code. (#10223) 2025-10-05 15:41:19 -04:00
patientx
c3a59c8e40
Merge branch 'comfyanonymous:master' into master 2025-10-04 00:51:31 +03:00
Finn-Hecker
93d859cfaa
Fix type annotation syntax in MotionEncoder_tc __init__ (#10186)
## Summary
Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list.

## Changes
- Line 647: Changed `num_heads=int` to `num_heads: int` 
- This corrects the parameter annotation from a default value assignment to proper type hint syntax

## Details
The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.
2025-10-03 14:32:19 -07:00
patientx
603dfa1a65
Merge branch 'comfyanonymous:master' into master 2025-10-02 14:06:08 +03:00
rattus128
4965c0e2ac
WAN: Fix cache VRAM leak on error (#10141)
If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-01 18:42:16 -04:00
rattus128
911331c06c
sd: fix VAE tiled fallback VRAM leak (#10139)
When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.

Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.

So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.

The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.
2025-10-01 18:40:28 -04:00
comfyanonymous
a6f83a4a1a
Support the new hunyuan vae. (#10150) 2025-10-01 17:19:13 -04:00
patientx
21dc67a0b9
Merge branch 'comfyanonymous:master' into master 2025-09-28 04:38:19 +03:00
rattus128
653ceab414
Reduce Peak WAN inference VRAM usage - part II (#10062)
* flux: math: Use _addcmul to avoid expensive VRAM intermediate

The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.

* wan: Delete the self attention before cross attention

This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
2025-09-27 18:14:16 -04:00
patientx
7e6b077cd7
Merge branch 'comfyanonymous:master' into master 2025-09-27 14:25:54 +03:00
Jedrzej Kosinski
196954ab8c
Add 'input_cond' and 'input_uncond' to the args dictionary passed into sampler_cfg_function (#10044) 2025-09-26 19:55:03 -07:00
comfyanonymous
1e098d6132
Don't add template to qwen2.5vl when template is in prompt. (#10043)
Make the hunyuan image refiner template_end 36.
2025-09-26 18:34:17 -04:00
patientx
258da26c98
Merge branch 'comfyanonymous:master' into master 2025-09-25 15:08:16 +03:00
Guy Niv
c8d2117f02
Fix memory leak by properly detaching model finalizer (#9979)
When unloading models in load_models_gpu(), the model finalizer was not
being explicitly detached, leading to a memory leak. This caused
linear memory consumption increase over time as models are repeatedly
loaded and unloaded.

This change prevents orphaned finalizer references from accumulating in
memory during model switching operations.
2025-09-24 22:35:12 -04:00
comfyanonymous
fccab99ec0
Fix issue with .view() in HuMo. (#10014) 2025-09-24 20:09:42 -04:00
patientx
64aa08cf53
Merge branch 'comfyanonymous:master' into master 2025-09-23 00:01:32 +03:00
comfyanonymous
1fee8827cb
Support for qwen edit plus model. Use the new TextEncodeQwenImageEditPlus. (#9986) 2025-09-22 16:49:48 -04:00
Rando717
5dcd8d2428
add files via upload
uploaded nvcuda.zluda_get_nightly_flag.py to get nightly flag info inside batch
2025-09-21 19:31:18 +02:00
patientx
9d2b926f56
Merge branch 'comfyanonymous:master' into master 2025-09-21 16:57:32 +03:00
comfyanonymous
d1d9eb94b1
Lower wan memory estimation value a bit. (#9964)
Previous pr reduced the peak memory requirement.
2025-09-20 22:09:35 -04:00
Kohaku-Blueleaf
7be2b49b6b
Fix LoRA Trainer bugs with FP8 models. (#9854)
* Fix adapter weight init

* Fix fp8 model training

* Avoid inference tensor
2025-09-20 21:24:48 -04:00
patientx
c62e820d45
Merge branch 'comfyanonymous:master' into master 2025-09-20 01:51:06 +03:00
comfyanonymous
e8df53b764
Update WanAnimateToVideo to more easily extend videos. (#9959) 2025-09-19 18:48:56 -04:00
comfyanonymous
dc95b6acc0
Basic WIP support for the wan animate model. (#9939) 2025-09-19 03:07:17 -04:00
comfyanonymous
24b0fce099
Do padding of audio embed in model for humo for more flexibility. (#9935) 2025-09-18 19:54:16 -04:00
DELUXA
8d6653fca6
Enable fp8 ops by default on gfx1200 (#9926) 2025-09-18 19:50:37 -04:00
patientx
50e281dc6d
Merge branch 'comfyanonymous:master' into master 2025-09-18 03:02:08 +03:00
comfyanonymous
dd611a7700
Support the HuMo 17B model. (#9912) 2025-09-17 18:39:24 -04:00
patientx
a8b63b21fe
Merge branch 'comfyanonymous:master' into master 2025-09-17 12:29:45 +03:00
comfyanonymous
9288c78fc5
Support the HuMo model. (#9903) 2025-09-17 00:12:48 -04:00
rattus128
e42682b24e
Reduce Peak WAN inference VRAM usage (#9898)
* flux: Do the xq and xk ropes one at a time

This was doing independendent interleaved tensor math on the q and k
tensors, leading to the holding of more than the minimum intermediates
in VRAM. On a bad day, it would VRAM OOM on xk intermediates.

Do everything q and then everything k, so torch can garbage collect
all of qs intermediates before k allocates its intermediates.

This reduces peak VRAM usage for some WAN2.2 inferences (at least).

* wan: Optimize qkv intermediates on attention

As commented. The former logic computed independent pieces of QKV in
parallel which help more inference intermediates in VRAM spiking
VRAM usage. Fully roping Q and garbage collecting the intermediates
before touching K reduces the peak inference VRAM usage.
2025-09-16 19:21:14 -04:00
patientx
9cdd9e38d2
Merge branch 'comfyanonymous:master' into master 2025-09-17 00:30:44 +03:00
comfyanonymous
a39ac59c3e
Add encoder part of whisper large v3 as an audio encoder model. (#9894)
Not useful yet but some models use it.
2025-09-16 01:19:50 -04:00
blepping
1a85483da1
Fix depending on asserts to raise an exception in BatchedBrownianTree and Flash attn module (#9884)
Correctly handle the case where w0 is passed by kwargs in BatchedBrownianTree
2025-09-15 20:05:03 -04:00
patientx
a08c1e1613
Merge branch 'comfyanonymous:master' into master 2025-09-16 01:22:18 +03:00
comfyanonymous
47a9cde5d3
Support the omnigen2 umo lora. (#9886) 2025-09-15 18:10:55 -04:00
patientx
09bd37e843
Merge branch 'comfyanonymous:master' into master 2025-09-14 13:23:58 +03:00
Jedrzej Kosinski
f228367c5e
Make ModuleNotFoundError ImportError instead (#9850) 2025-09-13 21:34:21 -04:00
patientx
78c0630849
Merge branch 'comfyanonymous:master' into master 2025-09-14 03:43:29 +03:00
comfyanonymous
80b7c9455b
Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
blepping
c1297f4eb3
Add support for Chroma Radiance (#9682)
* Initial Chroma Radiance support

* Minor Chroma Radiance cleanups

* Update Radiance nodes to ensure latents/images are on the intermediate device

* Fix Chroma Radiance memory estimation.

* Increase Chroma Radiance memory usage factor

* Increase Chroma Radiance memory usage factor once again

* Ensure images are multiples of 16 for Chroma Radiance
Add batch dimension and fix channels when necessary in ChromaRadianceImageToLatent node

* Tile Chroma Radiance NeRF to reduce memory consumption, update memory usage factor

* Update Radiance to support conv nerf final head type.

* Allow setting NeRF embedder dtype for Radiance
Bump Radiance nerf tile size to 32
Support EasyCache/LazyCache on Radiance (maybe)

* Add ChromaRadianceStubVAE node

* Crop Radiance image inputs to multiples of 16 instead of erroring to be in line with existing VAE behavior

* Convert Chroma Radiance nodes to V3 schema.

* Add ChromaRadianceOptions node and backend support.
Cleanups/refactoring to reduce code duplication with Chroma.

* Fix overriding the NeRF embedder dtype for Chroma Radiance

* Minor Chroma Radiance cleanups

* Move Chroma Radiance to its own directory in ldm
Minor code cleanups and tooltip improvements

* Fix Chroma Radiance embedder dtype overriding

* Remove Radiance dynamic nerf_embedder dtype override feature

* Unbork Radiance NeRF embedder init

* Remove Chroma Radiance image conversion and stub VAE nodes
Add a chroma_radiance option to the VAELoader builtin node which uses comfy.sd.PixelspaceConversionVAE
Add a PixelspaceConversionVAE to comfy.sd for converting BHWC 0..1 <-> BCHW -1..1
2025-09-13 17:58:43 -04:00
Kimbing Ng
e5e70636e7
Remove single quote pattern to avoid wrong matches (#9842) 2025-09-13 16:59:19 -04:00
patientx
596049a855
Merge branch 'comfyanonymous:master' into master 2025-09-13 14:38:42 +03:00
comfyanonymous
29bf807b0e
Cleanup. (#9838) 2025-09-12 21:57:04 -04:00
Jukka Seppänen
2559dee492
Support wav2vec base models (#9637)
* Support wav2vec base models

* trim trailing whitespace

* Do interpolation after
2025-09-12 21:52:58 -04:00
comfyanonymous
a3b04de700
Hunyuan refiner vae now works with tiled. (#9836) 2025-09-12 19:46:46 -04:00
patientx
42a2c109ec
Merge branch 'comfyanonymous:master' into master 2025-09-13 01:16:39 +03:00
Jedrzej Kosinski
d7f40442f9
Enable Runtime Selection of Attention Functions (#9639)
* Looking into a @wrap_attn decorator to look for 'optimized_attention_override' entry in transformer_options

* Created logging code for this branch so that it can be used to track down all the code paths where transformer_options would need to be added

* Fix memory usage issue with inspect

* Made WAN attention receive transformer_options, test node added to wan to test out attention override later

* Added **kwargs to all attention functions so transformer_options could potentially be passed through

* Make sure wrap_attn doesn't make itself recurse infinitely, attempt to load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention

* Turn off attention logging for now, make AttentionOverrideTestNode have a dropdown with available attention (this is a test node only)

* Make flux work with optimized_attention_override

* Add logs to verify optimized_attention_override is passed all the way into attention function

* Make Qwen work with optimized_attention_override

* Made hidream work with optimized_attention_override

* Made wan patches_replace work with optimized_attention_override

* Made SD3 work with optimized_attention_override

* Made HunyuanVideo work with optimized_attention_override

* Made Mochi work with optimized_attention_override

* Made LTX work with optimized_attention_override

* Made StableAudio work with optimized_attention_override

* Made optimized_attention_override work with ACE Step

* Made Hunyuan3D work with optimized_attention_override

* Make CosmosPredict2 work with optimized_attention_override

* Made CosmosVideo work with optimized_attention_override

* Made Omnigen 2 work with optimized_attention_override

* Made StableCascade work with optimized_attention_override

* Made AuraFlow work with optimized_attention_override

* Made Lumina work with optimized_attention_override

* Made Chroma work with optimized_attention_override

* Made SVD work with optimized_attention_override

* Fix WanI2VCrossAttention so that it expects to receive transformer_options

* Fixed Wan2.1 Fun Camera transformer_options passthrough

* Fixed WAN 2.1 VACE transformer_options passthrough

* Add optimized to get_attention_function

* Disable attention logs for now

* Remove attention logging code

* Remove _register_core_attention_functions, as we wouldn't want someone to call that, just in case

* Satisfy ruff

* Remove AttentionOverrideTest node, that's something to cook up for later
2025-09-12 18:07:38 -04:00
comfyanonymous
b149e2e1e3
Better way of doing the generator for the hunyuan image noise aug. (#9834) 2025-09-12 17:53:15 -04:00
comfyanonymous
7757d5a657
Set default hunyuan refiner shift to 4.0 (#9833) 2025-09-12 16:40:12 -04:00
comfyanonymous
e600520f8a
Fix hunyuan refiner blownout colors at noise aug less than 0.25 (#9832) 2025-09-12 16:35:34 -04:00
patientx
39a0d246ee
Merge branch 'comfyanonymous:master' into master 2025-09-12 23:24:35 +03:00
comfyanonymous
fd2b820ec2
Add noise augmentation to hunyuan image refiner. (#9831)
This was missing and should help with colors being blown out.
2025-09-12 16:03:08 -04:00
patientx
4c5915d5cb
Merge branch 'comfyanonymous:master' into master 2025-09-12 09:29:27 +03:00
comfyanonymous
33bd9ed9cb
Implement hunyuan image refiner model. (#9817) 2025-09-12 00:43:20 -04:00
comfyanonymous
18de0b2830
Fast preview for hunyuan image. (#9814) 2025-09-11 19:33:02 -04:00
patientx
aae8c1486f
Merge pull request #297 from Rando717/Rando717-zluda.py
zluda.py "Expanded gfx identifier, lowercase gpu search, detect Triton version"
2025-09-11 20:35:35 +03:00
patientx
06fe8754d2
Merge branch 'comfyanonymous:master' into master 2025-09-11 13:46:42 +03:00
comfyanonymous
e01e99d075
Support hunyuan image distilled model. (#9807) 2025-09-10 23:17:34 -04:00
patientx
666b2e05fa
Merge branch 'comfyanonymous:master' into master 2025-09-10 10:47:09 +03:00
comfyanonymous
543888d3d8
Fix lowvram issue with hunyuan image vae. (#9794) 2025-09-10 02:15:34 -04:00
comfyanonymous
85e34643f8
Support hunyuan image 2.1 regular model. (#9792) 2025-09-10 02:05:07 -04:00
comfyanonymous
5c33872e2f
Fix issue on old torch. (#9791) 2025-09-10 00:23:47 -04:00
comfyanonymous
b288fb0db8
Small refactor of some vae code. (#9787) 2025-09-09 18:09:56 -04:00
Rando717
4057f2984c
Update zluda.py (MEM_BUS_WIDTH#3)
Lower casing the lookup inside MEM_BUS_WIDTH, just in case of incorrect casing on Radeon Pro (PRO) GPUs.

fixed/lower-casing "Triton device properties" lookup inside MEM_BUS_WIDTH.
2025-09-09 20:04:20 +02:00
Rando717
13ba6a8a8d
Update zluda.py (cleanup print Triton version)
compacted, without exception, silent if no version string
2025-09-09 19:30:54 +02:00
Rando717
ce8900fa25
Update zluda.py (gpu_name_to_gfx)
-function changed into list of rules

-correct gfx codes attached to each GPU name

-addressed potential incorrect designation for  RX 6000 S Series, sort priority
2025-09-09 18:51:41 +02:00
patientx
a531352603
Merge branch 'comfyanonymous:master' into master 2025-09-09 01:35:58 +03:00
comfyanonymous
103a12cb66
Support qwen inpaint controlnet. (#9772) 2025-09-08 17:30:26 -04:00
patientx
6f38e729cc
Merge branch 'comfyanonymous:master' into master 2025-09-08 22:15:28 +03:00
Rando717
e7d48450a3
Update zluda.py (removed previously added gfx90c)
'radeon graphics' check is not viable enough
considering 'radeon (tm) graphics' also exists on Vega.

Plus gfx1036 Raphael (Ryzen 7000) is called 'radeon (tm) graphics' , same with Granite Ridge (Ryzen 9000).
2025-09-08 21:10:20 +02:00
contentis
97652d26b8
Add explicit casting in apply_rope for Qwen VL (#9759) 2025-09-08 15:08:18 -04:00
Rando717
590f46ab41
Update zluda.py (typo) 2025-09-08 20:31:49 +02:00
Rando717
675d6d8f4c
Update zluda.py (gfx gpu names)
-expanded GPU gfx names
-added RDNA4, RDNA3.5, ...
-added missing Polaris cards to prevent 'gfx1010' and 'gfx1030' fallback
-kept gfx designation mostly the same, based on available custom lib's for hip57/62

might need some post adjustments
2025-09-08 17:55:29 +02:00
Rando717
ddb1e3da47
Update zluda.py (typo) 2025-09-08 17:22:41 +02:00
Rando717
a7336ad630
Update zluda.py (MEM_BUS_WIDTH#2)
Added Vega10/20 cards.
Can't test, no clue if it has effect or just a placebo effect.
2025-09-08 17:19:03 +02:00
Rando717
40199a5244
Update zluda.py (print Triton version)
Added check for Triton version string, if it exists.
Could be useful info for troubleshooting reports.
2025-09-08 17:00:40 +02:00
patientx
b46622ffa5
Merge branch 'comfyanonymous:master' into master 2025-09-08 11:14:04 +03:00
comfyanonymous
fb763d4333
Fix amd_min_version crash when cpu device. (#9754) 2025-09-07 21:16:29 -04:00
patientx
9417753a6c
Merge branch 'comfyanonymous:master' into master 2025-09-07 13:16:57 +03:00
comfyanonymous
bcbd7884e3
Don't enable pytorch attention on AMD if triton isn't available. (#9747) 2025-09-07 00:29:38 -04:00
comfyanonymous
27a0fcccc3
Enable bf16 VAE on RDNA4. (#9746) 2025-09-06 23:25:22 -04:00
patientx
afbcd5d57e
Merge branch 'comfyanonymous:master' into master 2025-09-06 11:51:33 +03:00
comfyanonymous
ea6cdd2631
Print all fast options in --help (#9737) 2025-09-06 01:05:05 -04:00
patientx
3ca065a755
fix 2025-09-05 23:11:57 +03:00
patientx
0488fe3748
rmsnorm patch second try 2025-09-05 23:10:27 +03:00
patientx
8966009181
added rmsnorm patch for torch's older than 2.4 2025-09-05 22:43:39 +03:00
patientx
f9d7fcb696
Merge branch 'comfyanonymous:master' into master 2025-09-05 22:09:30 +03:00
comfyanonymous
2ee7879a0b
Fix lowvram issues with hunyuan3d 2.1 (#9735) 2025-09-05 14:57:35 -04:00
patientx
c7c7269f48
Merge branch 'comfyanonymous:master' into master 2025-09-05 17:11:07 +03:00
comfyanonymous
c9ebe70072
Some changes to the previous hunyuan PR. (#9725) 2025-09-04 20:39:02 -04:00
Yousef R. Gamaleldin
261421e218
Add Hunyuan 3D 2.1 Support (#8714) 2025-09-04 20:36:20 -04:00
patientx
d79e93a0a9
Merge branch 'comfyanonymous:master' into master 2025-09-04 12:41:48 +03:00
comfyanonymous
72855db715
Fix potential rope issue. (#9710) 2025-09-03 22:20:13 -04:00
patientx
991209d11d
Merge branch 'comfyanonymous:master' into master 2025-09-03 00:05:33 +03:00
comfyanonymous
e3018c2a5a
uso -> uxo/uno as requested. (#9688) 2025-09-02 16:12:07 -04:00
patientx
b30a38dca0
Merge branch 'comfyanonymous:master' into master 2025-09-02 22:46:44 +03:00
comfyanonymous
3412d53b1d
USO style reference. (#9677)
Load the projector.safetensors file with the ModelPatchLoader node and use
the siglip_vision_patch14_384.safetensors "clip vision" model and the
USOStyleReferenceNode.
2025-09-02 15:36:22 -04:00
patientx
47c6fb34c9
Merge branch 'comfyanonymous:master' into master 2025-09-02 09:46:42 +03:00
contentis
e2d1e5dad9
Enable Convolution AutoTuning (#9301) 2025-09-01 20:33:50 -04:00
comfyanonymous
27e067ce50
Implement the USO subject identity lora. (#9674)
Use the lora with FluxContextMultiReferenceLatentMethod node set to "uso"
and a ReferenceLatent node with the reference image.
2025-09-01 18:54:02 -04:00
patientx
9cb469282e
Merge branch 'comfyanonymous:master' into master 2025-08-31 11:24:57 +03:00
chaObserv
32a627bf1f
SEEDS: update noise decomposition and refactor (#9633)
- Update the decomposition to reflect interval dependency
- Extract phi computations into functions
- Use torch.lerp for interpolation
2025-08-31 00:01:45 -04:00
patientx
c6b0bf480f
Merge branch 'comfyanonymous:master' into master 2025-08-29 09:31:05 +03:00
comfyanonymous
e80a14ad50
Support wan2.2 5B fun control model. (#9611)
Use the Wan22FunControlToVideo node.
2025-08-28 22:13:07 -04:00
patientx
c8af694267
Merge pull request #279 from sfinktah/sfink-cudnn-benchmark
Added env_var for cudnn.benchmark
2025-08-28 23:17:05 +03:00
patientx
1db0a73a2a
Merge branch 'comfyanonymous:master' into master 2025-08-28 09:06:22 +03:00
comfyanonymous
4aa79dbf2c
Adjust flux mem usage factor a bit. (#9588) 2025-08-27 23:08:17 -04:00
patientx
fc93a6f534
Merge branch 'comfyanonymous:master' into master 2025-08-28 02:22:15 +03:00
Gangin Park
3aad339b63
Add DPM++ 2M SDE Heun (RES) sampler (#9542) 2025-08-27 19:07:31 -04:00
comfyanonymous
491755325c
Better s2v memory estimation. (#9584) 2025-08-27 19:02:42 -04:00
Christopher Anderson
cf22cbd8d5 Added env_var for cudnn.benchmark 2025-08-28 09:00:08 +10:00
comfyanonymous
496888fd68
Improve s2v performance when generating videos longer than 120 frames. (#9582) 2025-08-27 16:06:40 -04:00
comfyanonymous
b5ac6ed7ce
Fixes to make controlnet type models work on qwen edit and kontext. (#9581) 2025-08-27 15:26:28 -04:00
Kohaku-Blueleaf
b20ba1f27c
Fix #9537 (#9576) 2025-08-27 12:45:02 -04:00
patientx
eeab23fc0b
Merge branch 'comfyanonymous:master' into master 2025-08-27 10:07:57 +03:00
comfyanonymous
88aee596a3
WIP Wan 2.2 S2V model. (#9568) 2025-08-27 01:10:34 -04:00
patientx
c1aef0126d
Merge pull request #276 from sfinktah/sfink-cudnn-benchmark-env
Deleted torch.backends.cudnn.benchmark line, defaults are fine
2025-08-26 19:34:35 +03:00
patientx
1efeba7066
Merge branch 'comfyanonymous:master' into master 2025-08-26 10:41:38 +03:00
comfyanonymous
914c2a2973
Implement wav2vec2 as an audio encoder model. (#9549)
This is useless on its own but there are multiple models that use it.
2025-08-25 23:26:47 -04:00
Christopher Anderson
110cb0a9d9 Deleted torch.backends.cudnn.benchmark line, defaults are fine 2025-08-26 08:43:31 +10:00
Christopher Anderson
1b9a3b12c2 had to move cudnn disablement up much higher 2025-08-25 14:11:54 +10:00
Christopher Anderson
cd3d60254b argggh, white space hell 2025-08-25 09:52:58 +10:00
Christopher Anderson
184fa5921f worst PR ever, really. 2025-08-25 09:42:27 +10:00
Christopher Anderson
33c43b68c3 worst PR ever 2025-08-25 09:38:22 +10:00
Christopher Anderson
2a06dc8e87 Merge remote-tracking branch 'origin/sfink-cudnn-env' into sfink-cudnn-env
# Conflicts:
#	comfy/customzluda/zluda.py
2025-08-25 09:34:32 +10:00
Christopher Anderson
3504eeeb4a rebased onto upstream master (woops) 2025-08-25 09:32:34 +10:00
Christopher Anderson
7eda4587be Added env var TORCH_BACKENDS_CUDNN_ENABLED, defaults to 1. 2025-08-25 09:31:12 +10:00
Christopher Anderson
954644ef83 Added env var TORCH_BACKENDS_CUDNN_ENABLED, defaults to 1. 2025-08-25 08:56:48 +10:00
Rando717
053a6b95e5
Update zluda.py (MEM_BUS_WIDTH)
Added more cards, mostly RDNA(1) and Radeon Pro.

Reasoning: Every time zluda.py gets update I have to manually add 256 for my RX 5700, otherwise it default to 128. Also, manual local edits fails at git pull.
2025-08-24 18:39:40 +02:00
patientx
c92a07594b
Update zluda.py 2025-08-24 12:01:20 +03:00
patientx
dba9d20791
Update zluda.py 2025-08-24 10:23:30 +03:00
patientx
cdc04b5a8a
Merge branch 'comfyanonymous:master' into master 2025-08-23 07:47:07 +03:00
comfyanonymous
41048c69b4
Fix Conditioning masks on 3d latents. (#9506) 2025-08-22 23:15:44 -04:00
Jedrzej Kosinski
fc247150fe
Implement EasyCache and Invent LazyCache (#9496)
* Attempting a universal implementation of EasyCache, starting with flux as test; I screwed up the math a bit, but when I set it just right it works.

* Fixed math to make threshold work as expected, refactored code to use EasyCacheHolder instead of a dict wrapped by object

* Use sigmas from transformer_options instead of timesteps to be compatible with a greater amount of models, make end_percent work

* Make log statement when not skipping useful, preparing for per-cond caching

* Added DIFFUSION_MODEL wrapper around forward function for wan model

* Add subsampling for heuristic inputs

* Add subsampling to output_prev (output_prev_subsampled now)

* Properly consider conds in EasyCache logic

* Created SuperEasyCache to test what happens if caching and reuse is moved outside the scope of conds, added PREDICT_NOISE wrapper to facilitate this test

* Change max reuse_threshold to 3.0

* Mark EasyCache/SuperEasyCache as experimental (beta)

* Make Lumina2 compatible with EasyCache

* Add EasyCache support for Qwen Image

* Fix missing comma, curse you Cursor

* Add EasyCache support to AceStep

* Add EasyCache support to Chroma

* Added EasyCache support to Cosmos Predict t2i

* Make EasyCache not crash with Cosmos Predict ImagToVideo latents, but does not work well at all

* Add EasyCache support to hidream

* Added EasyCache support to hunyuan video

* Added EasyCache support to hunyuan3d

* Added EasyCache support to LTXV (not very good, but does not crash)

* Implemented EasyCache for aura_flow

* Renamed SuperEasyCache to LazyCache, hardcoded subsample_factor to 8 on nodes

* Eatra logging when verbose is true for EasyCache
2025-08-22 22:41:08 -04:00
contentis
fe31ad0276
Add elementwise fusions (#9495)
* Add elementwise fusions

* Add addcmul pattern to Qwen
2025-08-22 19:39:15 -04:00
patientx
7bc46389fa
Merge branch 'comfyanonymous:master' into master 2025-08-22 10:52:52 +03:00
comfyanonymous
ff57793659
Support InstantX Qwen controlnet. (#9488) 2025-08-22 00:53:11 -04:00
comfyanonymous
f7bd5e58dd
Make it easier to implement future qwen controlnets. (#9485) 2025-08-21 23:18:04 -04:00
patientx
7ff01ded58
Merge branch 'comfyanonymous:master' into master 2025-08-21 09:24:26 +03:00
comfyanonymous
0963493a9c
Support for Qwen Diffsynth Controlnets canny and depth. (#9465)
These are not real controlnets but actually a patch on the model so they
will be treated as such.

Put them in the models/model_patches/ folder.

Use the new ModelPatchLoader and QwenImageDiffsynthControlnet nodes.
2025-08-20 22:26:37 -04:00
patientx
6dca25e2a8
Merge branch 'comfyanonymous:master' into master 2025-08-20 10:14:34 +03:00
comfyanonymous
8d38ea3bbf
Fix bf16 precision issue with qwen image embeddings. (#9441) 2025-08-20 02:58:54 -04:00
comfyanonymous
5a8f502db5
Disable prompt weights for qwen. (#9438) 2025-08-20 01:08:11 -04:00
comfyanonymous
7cd2c4bd6a
Qwen rotary embeddings should now match reference code. (#9437) 2025-08-20 00:45:27 -04:00
comfyanonymous
dfa791eb4b
Rope fix for qwen vl. (#9435) 2025-08-19 20:47:42 -04:00
patientx
1cbb5fdc14
Merge branch 'comfyanonymous:master' into master 2025-08-19 10:21:12 +03:00
comfyanonymous
4977f203fa
P2 of qwen edit model. (#9412)
* P2 of qwen edit model.

* Typo.

* Fix normal qwen.

* Fix.

* Make the TextEncodeQwenImageEdit also set the ref latent.

If you don't want it to set the ref latent and want to use the
ReferenceLatent node with your custom latent instead just disconnect the
VAE.
2025-08-18 22:38:34 -04:00
patientx
3f09b4dba5
Merge branch 'comfyanonymous:master' into master 2025-08-18 15:14:34 +03:00
Jedrzej Kosinski
7f3b9b16c6
Make step index detection much more robust (#9392) 2025-08-17 18:54:07 -04:00
comfyanonymous
ed43784b0d
WIP Qwen edit model: The diffusion model part. (#9383) 2025-08-17 16:45:39 -04:00
patientx
64d6cf045e
Merge branch 'comfyanonymous:master' into master 2025-08-17 11:29:13 +03:00
comfyanonymous
0f2b8525bc
Qwen image model refactor. (#9375) 2025-08-16 17:51:28 -04:00
patientx
5a21015adb
Merge branch 'comfyanonymous:master' into master 2025-08-16 09:54:01 +03:00
comfyanonymous
1702e6df16
Implement wan2.2 camera model. (#9357)
Use the old WanCameraImageToVideo node.
2025-08-15 17:29:58 -04:00
patientx
eb283b5fd7
Merge branch 'comfyanonymous:master' into master 2025-08-16 00:26:31 +03:00
comfyanonymous
c308a8840a
Add FluxKontextMultiReferenceLatentMethod node. (#9356)
This node is only useful if someone trains the kontext model to properly
use multiple reference images via the index method.

The default is the offset method which feeds the multiple images like if
they were stitched together as one. This method works with the current
flux kontext model.
2025-08-15 15:50:39 -04:00
patientx
13f5f9d78f
Merge branch 'comfyanonymous:master' into master 2025-08-15 10:54:10 +03:00
comfyanonymous
e08ecfbd8a
Add warning when using old pytorch. (#9347) 2025-08-15 00:22:26 -04:00
comfyanonymous
4e5c230f6a
Fix last commit not working on older pytorch. (#9346) 2025-08-14 23:44:02 -04:00
Xiangxi Guo (Ryan)
f0d5d0111f
Avoid torch compile graphbreak for older pytorch versions (#9344)
Turns out torch.compile has some gaps in context manager decorator
syntax support. I've sent patches to fix that in PyTorch, but it won't
be available for all the folks running older versions of PyTorch, hence
this trivial patch.
2025-08-14 23:41:37 -04:00
comfyanonymous
ad19a069f6
Make SLG nodes work on Qwen Image model. (#9345) 2025-08-14 23:16:01 -04:00
patientx
a927fbd99b
Merge branch 'comfyanonymous:master' into master 2025-08-14 12:16:50 +03:00
Jedrzej Kosinski
e4f7ea105f
Added context window support to core sampling code (#9238)
* Added initial support for basic context windows - in progress

* Add prepare_sampling wrapper for context window to more accurately estimate latent memory requirements, fixed merging wrappers/callbacks dicts in prepare_model_patcher

* Made context windows compatible with different dimensions; works for WAN, but results are bad

* Fix comfy.patcher_extension.merge_nested_dicts calls in prepare_model_patcher in sampler_helpers.py

* Considering adding some callbacks to context window code to allow extensions of behavior without the need to rewrite code

* Made dim slicing cleaner

* Add Wan Context WIndows node for testing

* Made context schedule and fuse method functions be stored on the handler instead of needing to be registered in core code to be found

* Moved some code around between node_context_windows.py and context_windows.py

* Change manual context window nodes names/ids

* Added callbacks to IndexListContexHandler

* Adjusted default values for context_length and context_overlap, made schema.inputs definition for WAN Context Windows less annoying

* Make get_resized_cond more robust for various dim sizes

* Fix typo

* Another small fix
2025-08-13 21:33:05 -04:00
Simon Lui
c991a5da65
Fix XPU iGPU regressions (#9322)
* Change bf16 check and switch non-blocking to off default with option to force to regain speed on certain classes of iGPUs and refactor xpu check.

* Turn non_blocking off by default for xpu.

* Update README.md for Intel GPUs.
2025-08-13 19:13:35 -04:00
patientx
804c7097fa
Merge branch 'comfyanonymous:master' into master 2025-08-13 23:56:43 +03:00
comfyanonymous
9df8792d4b
Make last PR not crash comfy on old pytorch. (#9324) 2025-08-13 15:12:41 -04:00
contentis
3da5a07510
SDPA backend priority (#9299) 2025-08-13 14:53:27 -04:00
patientx
bcafc3f7a3
Merge branch 'comfyanonymous:master' into master 2025-08-13 10:36:20 +03:00
comfyanonymous
560d38f34c
Wan2.2 fun control support. (#9292) 2025-08-12 23:26:33 -04:00
patientx
f80a9bb674
Merge branch 'comfyanonymous:master' into master 2025-08-12 00:33:53 +03:00
PsychoLogicAu
2208aa616d
Support SimpleTuner lycoris lora for Qwen-Image (#9280) 2025-08-11 16:56:16 -04:00
patientx
c2686a3968
Merge branch 'comfyanonymous:master' into master 2025-08-10 12:09:19 +03:00
comfyanonymous
5828607ccf
Not sure if AMD actually support fp16 acc but it doesn't crash. (#9258) 2025-08-09 12:49:25 -04:00
patientx
89499c6fae
Merge branch 'comfyanonymous:master' into master 2025-08-08 11:40:07 +03:00
comfyanonymous
735bb4bdb1
Users report gfx1201 is buggy on flux with pytorch attention. (#9244) 2025-08-08 04:21:00 -04:00
patientx
8795ae98aa
Merge branch 'comfyanonymous:master' into master 2025-08-06 20:24:47 +03:00
flybirdxx
4c3e57b0ae
Fixed an issue where qwenLora could not be loaded properly. (#9208) 2025-08-06 13:23:11 -04:00
patientx
2e39e0999f
Update zluda.py 2025-08-05 19:21:20 +03:00
patientx
28957a7bd6
Merge branch 'comfyanonymous:master' into master 2025-08-05 13:37:09 +03:00
comfyanonymous
d044a24398
Fix default shift and any latent size for qwen image model. (#9186) 2025-08-05 06:12:27 -04:00
patientx
e419bade03
Merge pull request #244 from sfinktah/sfink-zluda-is-nasty
Bad ideas from zluda update.
2025-08-05 09:48:53 +03:00
patientx
ea8122f065
Merge branch 'comfyanonymous:master' into master 2025-08-05 09:47:31 +03:00
comfyanonymous
c012400240
Initial support for qwen image model. (#9179) 2025-08-04 22:53:25 -04:00
Christopher Anderson
4f853403fe Bad ideas from zluda update. 2025-08-05 06:00:55 +10:00
patientx
88b7fe87ff
Merge branch 'comfyanonymous:master' into master 2025-08-04 12:38:56 +03:00
comfyanonymous
03895dea7c
Fix another issue with the PR. (#9170) 2025-08-04 04:33:04 -04:00
comfyanonymous
84f9759424
Add some warnings and prevent crash when cond devices don't match. (#9169) 2025-08-04 04:20:12 -04:00
comfyanonymous
7991341e89
Various fixes for broken things from earlier PR. (#9168) 2025-08-04 04:02:40 -04:00
patientx
37415c40c1
device identification and setting triton arch override 2025-08-04 10:44:18 +03:00
patientx
d823c0c615
Merge branch 'comfyanonymous:master' into master 2025-08-04 10:42:15 +03:00
comfyanonymous
140ffc7fdc
Fix broken controlnet from last PR. (#9167) 2025-08-04 03:28:12 -04:00
comfyanonymous
182f90b5ec
Lower cond vram use by casting at the same time as device transfer. (#9159) 2025-08-04 03:11:53 -04:00
patientx
7258461c23
Merge branch 'comfyanonymous:master' into master 2025-08-03 16:33:54 +03:00
comfyanonymous
aebac22193
Cleanup. (#9160) 2025-08-03 07:08:11 -04:00
patientx
da4fc8189a
Merge branch 'comfyanonymous:master' into master 2025-08-03 00:17:56 +03:00
comfyanonymous
13aaa66ec2
Make sure context is on the right device. (#9154) 2025-08-02 15:09:23 -04:00
comfyanonymous
5f582a9757
Make sure all the conds are on the right device. (#9151) 2025-08-02 15:00:13 -04:00
patientx
83dbd68651
Merge branch 'comfyanonymous:master' into master 2025-08-01 14:42:25 +03:00
comfyanonymous
1e638a140b
Tiny wan vae optimizations. (#9136) 2025-08-01 05:25:38 -04:00
patientx
321d683af0
Merge branch 'comfyanonymous:master' into master 2025-07-31 14:49:33 +03:00
chaObserv
61b08d4ba6
Replace manual x * sigmoid(x) with torch silu in VAE nonlinearity (#9057) 2025-07-30 19:25:56 -04:00
comfyanonymous
da9dab7edd
Small wan camera memory optimization. (#9111) 2025-07-30 05:55:26 -04:00
patientx
1bd4b6489e
Merge branch 'comfyanonymous:master' into master 2025-07-30 11:11:46 +03:00
comfyanonymous
dca6bdd4fa
Make wan2.2 5B i2v take a lot less memory. (#9102) 2025-07-29 19:44:18 -04:00
patientx
d8ca8134c3
Merge branch 'comfyanonymous:master' into master 2025-07-29 11:56:59 +03:00
comfyanonymous
7d593baf91
Extra reserved vram on large cards on windows. (#9093) 2025-07-29 04:07:45 -04:00
patientx
fc4e82537c
Merge pull request #233 from sfinktah/sfink-flash-attn-gfx-startswith
This will allow much better support for gfx1032 and other things not …
2025-07-28 23:12:38 +03:00
patientx
7ba2a8d3b0
Merge branch 'comfyanonymous:master' into master 2025-07-28 22:15:10 +03:00
comfyanonymous
c60dc4177c
Remove unecessary clones in the wan2.2 VAE. (#9083) 2025-07-28 14:48:19 -04:00
Christopher Anderson
b5ede18481 This will allow much better support for gfx1032 and other things not specifically named 2025-07-29 04:21:45 +10:00
patientx
769ab3bd25
Merge branch 'comfyanonymous:master' into master 2025-07-28 15:21:30 +03:00
comfyanonymous
a88788dce6
Wan 2.2 support. (#9080) 2025-07-28 08:00:23 -04:00
patientx
5a45e12b61
Merge branch 'comfyanonymous:master' into master 2025-07-26 14:09:19 +03:00
comfyanonymous
0621d73a9c
Remove useless code. (#9059) 2025-07-26 04:44:19 -04:00
comfyanonymous
e6e5d33b35
Remove useless code. (#9041)
This is only needed on old pytorch 2.0 and older.
2025-07-25 04:58:28 -04:00