Commit Graph

2343 Commits

Author SHA1 Message Date
doctorpangloss
b522208e09 Fix linting issue 2025-10-22 12:30:46 -07:00
doctorpangloss
674b69c291 Fix linting errors, use register_buffer 2025-10-22 12:16:09 -07:00
doctorpangloss
427c21309c Fix whitespace error in comfyui 2025-10-21 15:56:24 -07:00
doctorpangloss
170b8348a3 fix torch error and suggestion-mode 2025-10-21 14:52:11 -07:00
doctorpangloss
2261f18306 Fix wrong cache type 2025-10-21 14:42:20 -07:00
doctorpangloss
99be2c40fa add amd gfx1102 arch for pytorch attention 2025-10-21 14:33:54 -07:00
doctorpangloss
d9269785d3 Better memory trimming and group_offloading logic 2025-10-21 14:27:26 -07:00
doctorpangloss
7ed9292532 fix gguf logs to debug 2025-10-21 14:27:16 -07:00
doctorpangloss
5186d19441 Merge branch 'master' of github.com:comfyanonymous/ComfyUI into fix-merge 2025-10-21 10:56:30 -07:00
doctorpangloss
358cb834d6 fix tests, make fixture of core workflow test function to reclaim RAM better 2025-10-21 10:53:49 -07:00
doctorpangloss
f54af2c7ff Fix pylint errors 2025-10-21 10:53:49 -07:00
doctorpangloss
dc94081155 tune up mmaudio, nodes_eps 2025-10-21 10:53:49 -07:00
doctorpangloss
f1016ef1c1 fix syntax error 2025-10-21 10:53:48 -07:00
doctorpangloss
be56a14e65 Merge commit 'a4787ac83bf6c83eeb459ed80fc9b36f63d2a3a7' of github.com:comfyanonymous/ComfyUI into fix-merge 2025-10-21 10:53:43 -07:00
comfyanonymous
2c2aa409b0
Log message for cudnn disable on AMD. (#10418) 2025-10-20 15:43:24 -04:00
comfyanonymous
b4f30bd408
Pytorch is stupid. (#10398) 2025-10-19 01:25:35 -04:00
comfyanonymous
dad076aee6
Speed up chroma radiance. (#10395) 2025-10-18 23:19:52 -04:00
comfyanonymous
0cf33953a7
Fix batch size above 1 giving bad output in chroma radiance. (#10394) 2025-10-18 23:15:34 -04:00
comfyanonymous
5b80addafd
Turn off cuda malloc by default when --fast autotune is turned on. (#10393) 2025-10-18 22:35:46 -04:00
comfyanonymous
9da397ea2f
Disable torch compiler for cast_bias_weight function (#10384)
* Disable torch compiler for cast_bias_weight function

* Fix torch compile.
2025-10-17 20:03:28 -04:00
comfyanonymous
b1293d50ef
workaround also works on cudnn 91200 (#10375) 2025-10-16 19:59:56 -04:00
comfyanonymous
19b466160c
Workaround for nvidia issue where VAE uses 3x more memory on torch 2.9 (#10373) 2025-10-16 18:16:03 -04:00
Faych
afa8a24fe1
refactor: Replace manual patches merging with merge_nested_dicts (#10360) 2025-10-15 17:16:09 -07:00
Jedrzej Kosinski
493b81e48f
Fix order of inputs nested merge_nested_dicts (#10362) 2025-10-15 16:47:26 -07:00
comfyanonymous
1c10b33f9b
gfx942 doesn't support fp8 operations. (#10348) 2025-10-15 00:21:11 -04:00
comfyanonymous
3374e900d0
Faster workflow cancelling. (#10301) 2025-10-13 23:43:53 -04:00
comfyanonymous
dfff7e5332
Better memory estimation for the SD/Flux VAE on AMD. (#10334) 2025-10-13 22:37:19 -04:00
comfyanonymous
e4ea393666
Fix loading old stable diffusion ckpt files on newer numpy. (#10333) 2025-10-13 22:18:58 -04:00
comfyanonymous
c8674bc6e9
Enable RDNA4 pytorch attention on ROCm 7.0 and up. (#10332) 2025-10-13 21:19:03 -04:00
rattus128
95ca2e56c8
WAN2.2: Fix cache VRAM leak on error (#10308)
Same change pattern as 7e8dd275c2
applied to WAN2.2

If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-13 15:23:11 -04:00
comfyanonymous
e693e4db6a
Always set diffusion model to eval() mode. (#10331) 2025-10-13 14:57:27 -04:00
comfyanonymous
a125cd84b0
Improve AMD performance. (#10302)
I honestly have no idea why this improves things but it does.
2025-10-12 00:28:01 -04:00
comfyanonymous
84e9ce32c6
Implement the mmaudio VAE. (#10300) 2025-10-11 22:57:23 -04:00
comfyanonymous
f1dd6e50f8
Fix bug with applying loras on fp8 scaled without fp8 ops. (#10279) 2025-10-09 19:02:40 -04:00
comfyanonymous
139addd53c
More surgical fix for #10267 (#10276) 2025-10-09 16:37:35 -04:00
comfyanonymous
6e59934089
Refactor model sampling sigmas code. (#10250) 2025-10-08 17:49:02 -04:00
comfyanonymous
8aea746212
Implement gemma 3 as a text encoder. (#10241)
Not useful yet.
2025-10-06 22:08:08 -04:00
doctorpangloss
748924de12 Improve model downloading
- adding known controlnet models now works better
 - comfyui_controlnet_aux sometimes wants torchhub paths with files that are in the form directory/filename.safetensors. This is now supported
 - save_with_filename now correctly matches again
2025-10-06 15:21:10 -07:00
comfyanonymous
195e0b0639
Remove useless code. (#10223) 2025-10-05 15:41:19 -04:00
Finn-Hecker
93d859cfaa
Fix type annotation syntax in MotionEncoder_tc __init__ (#10186)
## Summary
Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list.

## Changes
- Line 647: Changed `num_heads=int` to `num_heads: int` 
- This corrects the parameter annotation from a default value assignment to proper type hint syntax

## Details
The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.
2025-10-03 14:32:19 -07:00
rattus128
4965c0e2ac
WAN: Fix cache VRAM leak on error (#10141)
If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-01 18:42:16 -04:00
rattus128
911331c06c
sd: fix VAE tiled fallback VRAM leak (#10139)
When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.

Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.

So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.

The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.
2025-10-01 18:40:28 -04:00
comfyanonymous
a6f83a4a1a
Support the new hunyuan vae. (#10150) 2025-10-01 17:19:13 -04:00
rattus128
653ceab414
Reduce Peak WAN inference VRAM usage - part II (#10062)
* flux: math: Use _addcmul to avoid expensive VRAM intermediate

The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.

* wan: Delete the self attention before cross attention

This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
2025-09-27 18:14:16 -04:00
Jedrzej Kosinski
196954ab8c
Add 'input_cond' and 'input_uncond' to the args dictionary passed into sampler_cfg_function (#10044) 2025-09-26 19:55:03 -07:00
comfyanonymous
1e098d6132
Don't add template to qwen2.5vl when template is in prompt. (#10043)
Make the hunyuan image refiner template_end 36.
2025-09-26 18:34:17 -04:00
doctorpangloss
8ab28996fb Merge branch 'master' of github.com:comfyanonymous/ComfyUI 2025-09-26 13:18:23 -07:00
Benjamin Berman
e62df3a881 Fix issue finding approx vae taesdxl when used in a workflow 2025-09-26 12:29:00 -07:00
doctorpangloss
6bba743d62 Fix exec info issues 2025-09-26 10:54:30 -07:00
Guy Niv
c8d2117f02
Fix memory leak by properly detaching model finalizer (#9979)
When unloading models in load_models_gpu(), the model finalizer was not
being explicitly detached, leading to a memory leak. This caused
linear memory consumption increase over time as models are repeatedly
loaded and unloaded.

This change prevents orphaned finalizer references from accumulating in
memory during model switching operations.
2025-09-24 22:35:12 -04:00