Commit Graph

2338 Commits

Author SHA1 Message Date
patientx
043932cb67
Merge pull request #347 from nota-rudveld/patch-2
fix: check for version compatibility before calling PyTorch method
2025-10-23 02:28:35 +03:00
patientx
d4bcb93575
Merge branch 'comfyanonymous:master' into master 2025-10-22 11:34:33 +03:00
comfyanonymous
9cdc64998f
Only disable cudnn on newer AMD GPUs. (#10437) 2025-10-21 19:15:23 -04:00
patientx
5bf1c8be44
Merge branch 'comfyanonymous:master' into master 2025-10-21 03:49:14 +03:00
nota-rudveld
c1c044c8cb
fix: check for version compatibility before calling PyTorch method 2025-10-20 17:28:14 -04:00
comfyanonymous
2c2aa409b0
Log message for cudnn disable on AMD. (#10418) 2025-10-20 15:43:24 -04:00
patientx
657a7872ab
Merge branch 'comfyanonymous:master' into master 2025-10-19 15:20:17 +03:00
comfyanonymous
b4f30bd408
Pytorch is stupid. (#10398) 2025-10-19 01:25:35 -04:00
comfyanonymous
dad076aee6
Speed up chroma radiance. (#10395) 2025-10-18 23:19:52 -04:00
comfyanonymous
0cf33953a7
Fix batch size above 1 giving bad output in chroma radiance. (#10394) 2025-10-18 23:15:34 -04:00
comfyanonymous
5b80addafd
Turn off cuda malloc by default when --fast autotune is turned on. (#10393) 2025-10-18 22:35:46 -04:00
comfyanonymous
9da397ea2f
Disable torch compiler for cast_bias_weight function (#10384)
* Disable torch compiler for cast_bias_weight function

* Fix torch compile.
2025-10-17 20:03:28 -04:00
patientx
76dde47dbb
Merge branch 'comfyanonymous:master' into master 2025-10-18 00:05:02 +03:00
comfyanonymous
b1293d50ef
workaround also works on cudnn 91200 (#10375) 2025-10-16 19:59:56 -04:00
comfyanonymous
19b466160c
Workaround for nvidia issue where VAE uses 3x more memory on torch 2.9 (#10373) 2025-10-16 18:16:03 -04:00
patientx
7b0643ada1
Merge branch 'comfyanonymous:master' into master 2025-10-16 16:40:50 +03:00
Faych
afa8a24fe1
refactor: Replace manual patches merging with merge_nested_dicts (#10360) 2025-10-15 17:16:09 -07:00
Jedrzej Kosinski
493b81e48f
Fix order of inputs nested merge_nested_dicts (#10362) 2025-10-15 16:47:26 -07:00
patientx
26589a3a0b
Merge branch 'comfyanonymous:master' into master 2025-10-15 12:18:21 +03:00
comfyanonymous
1c10b33f9b
gfx942 doesn't support fp8 operations. (#10348) 2025-10-15 00:21:11 -04:00
comfyanonymous
3374e900d0
Faster workflow cancelling. (#10301) 2025-10-13 23:43:53 -04:00
comfyanonymous
dfff7e5332
Better memory estimation for the SD/Flux VAE on AMD. (#10334) 2025-10-13 22:37:19 -04:00
comfyanonymous
e4ea393666
Fix loading old stable diffusion ckpt files on newer numpy. (#10333) 2025-10-13 22:18:58 -04:00
comfyanonymous
c8674bc6e9
Enable RDNA4 pytorch attention on ROCm 7.0 and up. (#10332) 2025-10-13 21:19:03 -04:00
patientx
eae7a58e60
Merge branch 'comfyanonymous:master' into master 2025-10-14 02:07:30 +03:00
rattus128
95ca2e56c8
WAN2.2: Fix cache VRAM leak on error (#10308)
Same change pattern as 7e8dd275c2
applied to WAN2.2

If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-13 15:23:11 -04:00
comfyanonymous
e693e4db6a
Always set diffusion model to eval() mode. (#10331) 2025-10-13 14:57:27 -04:00
patientx
fa7942933b
Merge branch 'comfyanonymous:master' into master 2025-10-12 13:56:39 +03:00
comfyanonymous
a125cd84b0
Improve AMD performance. (#10302)
I honestly have no idea why this improves things but it does.
2025-10-12 00:28:01 -04:00
comfyanonymous
84e9ce32c6
Implement the mmaudio VAE. (#10300) 2025-10-11 22:57:23 -04:00
patientx
aa6afacc01
Merge branch 'comfyanonymous:master' into master 2025-10-10 02:25:35 +03:00
comfyanonymous
f1dd6e50f8
Fix bug with applying loras on fp8 scaled without fp8 ops. (#10279) 2025-10-09 19:02:40 -04:00
patientx
3553ce45e5
Merge branch 'comfyanonymous:master' into master 2025-10-09 23:40:21 +03:00
comfyanonymous
139addd53c
More surgical fix for #10267 (#10276) 2025-10-09 16:37:35 -04:00
patientx
77fc639ed2
Merge branch 'comfyanonymous:master' into master 2025-10-09 15:55:12 +03:00
comfyanonymous
6e59934089
Refactor model sampling sigmas code. (#10250) 2025-10-08 17:49:02 -04:00
patientx
2502069447
Merge branch 'comfyanonymous:master' into master 2025-10-07 14:02:12 +03:00
comfyanonymous
8aea746212
Implement gemma 3 as a text encoder. (#10241)
Not useful yet.
2025-10-06 22:08:08 -04:00
patientx
e5c08bb5c2
Merge branch 'comfyanonymous:master' into master 2025-10-06 00:08:39 +03:00
comfyanonymous
195e0b0639
Remove useless code. (#10223) 2025-10-05 15:41:19 -04:00
patientx
c3a59c8e40
Merge branch 'comfyanonymous:master' into master 2025-10-04 00:51:31 +03:00
Finn-Hecker
93d859cfaa
Fix type annotation syntax in MotionEncoder_tc __init__ (#10186)
## Summary
Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list.

## Changes
- Line 647: Changed `num_heads=int` to `num_heads: int` 
- This corrects the parameter annotation from a default value assignment to proper type hint syntax

## Details
The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.
2025-10-03 14:32:19 -07:00
patientx
603dfa1a65
Merge branch 'comfyanonymous:master' into master 2025-10-02 14:06:08 +03:00
rattus128
4965c0e2ac
WAN: Fix cache VRAM leak on error (#10141)
If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-01 18:42:16 -04:00
rattus128
911331c06c
sd: fix VAE tiled fallback VRAM leak (#10139)
When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.

Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.

So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.

The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.
2025-10-01 18:40:28 -04:00
comfyanonymous
a6f83a4a1a
Support the new hunyuan vae. (#10150) 2025-10-01 17:19:13 -04:00
patientx
21dc67a0b9
Merge branch 'comfyanonymous:master' into master 2025-09-28 04:38:19 +03:00
rattus128
653ceab414
Reduce Peak WAN inference VRAM usage - part II (#10062)
* flux: math: Use _addcmul to avoid expensive VRAM intermediate

The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.

* wan: Delete the self attention before cross attention

This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
2025-09-27 18:14:16 -04:00
patientx
7e6b077cd7
Merge branch 'comfyanonymous:master' into master 2025-09-27 14:25:54 +03:00
Jedrzej Kosinski
196954ab8c
Add 'input_cond' and 'input_uncond' to the args dictionary passed into sampler_cfg_function (#10044) 2025-09-26 19:55:03 -07:00