Commit Graph

1818 Commits

Author SHA1 Message Date
Dr.Lt.Data
4e7f2eeae2 Merge branch 'master' into dr-support-pip-cm 2025-10-10 08:15:03 +09:00
comfyanonymous
f1dd6e50f8
Fix bug with applying loras on fp8 scaled without fp8 ops. (#10279) 2025-10-09 19:02:40 -04:00
comfyanonymous
139addd53c
More surgical fix for #10267 (#10276) 2025-10-09 16:37:35 -04:00
Dr.Lt.Data
05cd5348b6 Merge branch 'master' into dr-support-pip-cm 2025-10-09 10:49:23 +09:00
comfyanonymous
6e59934089
Refactor model sampling sigmas code. (#10250) 2025-10-08 17:49:02 -04:00
Dr.Lt.Data
6b20418ad1 Merge branch 'master' into dr-support-pip-cm 2025-10-07 14:30:16 +09:00
comfyanonymous
8aea746212
Implement gemma 3 as a text encoder. (#10241)
Not useful yet.
2025-10-06 22:08:08 -04:00
comfyanonymous
195e0b0639
Remove useless code. (#10223) 2025-10-05 15:41:19 -04:00
Dr.Lt.Data
8634b19bc7 Merge branch 'master' into dr-support-pip-cm 2025-10-04 07:09:43 +09:00
Finn-Hecker
93d859cfaa
Fix type annotation syntax in MotionEncoder_tc __init__ (#10186)
## Summary
Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list.

## Changes
- Line 647: Changed `num_heads=int` to `num_heads: int` 
- This corrects the parameter annotation from a default value assignment to proper type hint syntax

## Details
The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.
2025-10-03 14:32:19 -07:00
Dr.Lt.Data
28092933c1 Merge branch 'master' into dr-support-pip-cm 2025-10-02 12:49:48 +09:00
rattus128
4965c0e2ac
WAN: Fix cache VRAM leak on error (#10141)
If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-01 18:42:16 -04:00
rattus128
911331c06c
sd: fix VAE tiled fallback VRAM leak (#10139)
When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.

Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.

So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.

The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.
2025-10-01 18:40:28 -04:00
Dr.Lt.Data
17064a993c Merge branch 'master' into dr-support-pip-cm 2025-10-02 07:31:37 +09:00
comfyanonymous
a6f83a4a1a
Support the new hunyuan vae. (#10150) 2025-10-01 17:19:13 -04:00
Dr.Lt.Data
20ac0052f8 Merge branch 'master' into dr-support-pip-cm 2025-09-29 06:58:35 +09:00
rattus128
653ceab414
Reduce Peak WAN inference VRAM usage - part II (#10062)
* flux: math: Use _addcmul to avoid expensive VRAM intermediate

The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.

* wan: Delete the self attention before cross attention

This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
2025-09-27 18:14:16 -04:00
Jedrzej Kosinski
196954ab8c
Add 'input_cond' and 'input_uncond' to the args dictionary passed into sampler_cfg_function (#10044) 2025-09-26 19:55:03 -07:00
comfyanonymous
1e098d6132
Don't add template to qwen2.5vl when template is in prompt. (#10043)
Make the hunyuan image refiner template_end 36.
2025-09-26 18:34:17 -04:00
Dr.Lt.Data
bc8418f55a Merge branch 'master' into dr-support-pip-cm 2025-09-26 07:00:43 +09:00
Guy Niv
c8d2117f02
Fix memory leak by properly detaching model finalizer (#9979)
When unloading models in load_models_gpu(), the model finalizer was not
being explicitly detached, leading to a memory leak. This caused
linear memory consumption increase over time as models are repeatedly
loaded and unloaded.

This change prevents orphaned finalizer references from accumulating in
memory during model switching operations.
2025-09-24 22:35:12 -04:00
comfyanonymous
fccab99ec0
Fix issue with .view() in HuMo. (#10014) 2025-09-24 20:09:42 -04:00
Dr.Lt.Data
74c1a58566 Merge branch 'master' into dr-support-pip-cm 2025-09-23 07:28:52 +09:00
comfyanonymous
1fee8827cb
Support for qwen edit plus model. Use the new TextEncodeQwenImageEditPlus. (#9986) 2025-09-22 16:49:48 -04:00
Dr.Lt.Data
7b1ed9b2b8 Merge branch 'master' into dr-support-pip-cm 2025-09-21 11:24:37 +09:00
comfyanonymous
d1d9eb94b1
Lower wan memory estimation value a bit. (#9964)
Previous pr reduced the peak memory requirement.
2025-09-20 22:09:35 -04:00
Dr.Lt.Data
4ea946778b Merge branch 'master' into dr-support-pip-cm 2025-09-21 10:45:28 +09:00
Kohaku-Blueleaf
7be2b49b6b
Fix LoRA Trainer bugs with FP8 models. (#9854)
* Fix adapter weight init

* Fix fp8 model training

* Avoid inference tensor
2025-09-20 21:24:48 -04:00
Dr.Lt.Data
309c92d6c9 Merge branch 'master' into dr-support-pip-cm 2025-09-21 09:33:38 +09:00
comfyanonymous
e8df53b764
Update WanAnimateToVideo to more easily extend videos. (#9959) 2025-09-19 18:48:56 -04:00
Dr.Lt.Data
ca7492c9d4 Merge branch 'master' into dr-support-pip-cm 2025-09-20 07:13:36 +09:00
comfyanonymous
dc95b6acc0
Basic WIP support for the wan animate model. (#9939) 2025-09-19 03:07:17 -04:00
Dr.Lt.Data
fa51f0c60a Merge branch 'master' into dr-support-pip-cm 2025-09-19 12:00:10 +09:00
comfyanonymous
24b0fce099
Do padding of audio embed in model for humo for more flexibility. (#9935) 2025-09-18 19:54:16 -04:00
DELUXA
8d6653fca6
Enable fp8 ops by default on gfx1200 (#9926) 2025-09-18 19:50:37 -04:00
Dr.Lt.Data
0a084a88a2 Merge branch 'master' into dr-support-pip-cm 2025-09-19 08:16:58 +09:00
comfyanonymous
e7ff647d02 --disable-manager -> --enable-manager 2025-09-17 20:58:42 -04:00
comfyanonymous
dd611a7700
Support the HuMo 17B model. (#9912) 2025-09-17 18:39:24 -04:00
Dr.Lt.Data
77e10752fe Merge branch 'master' into dr-support-pip-cm 2025-09-18 07:32:23 +09:00
comfyanonymous
9288c78fc5
Support the HuMo model. (#9903) 2025-09-17 00:12:48 -04:00
Dr.Lt.Data
2c30881d9c Merge branch 'master' into dr-support-pip-cm 2025-09-17 11:56:35 +09:00
rattus128
e42682b24e
Reduce Peak WAN inference VRAM usage (#9898)
* flux: Do the xq and xk ropes one at a time

This was doing independendent interleaved tensor math on the q and k
tensors, leading to the holding of more than the minimum intermediates
in VRAM. On a bad day, it would VRAM OOM on xk intermediates.

Do everything q and then everything k, so torch can garbage collect
all of qs intermediates before k allocates its intermediates.

This reduces peak VRAM usage for some WAN2.2 inferences (at least).

* wan: Optimize qkv intermediates on attention

As commented. The former logic computed independent pieces of QKV in
parallel which help more inference intermediates in VRAM spiking
VRAM usage. Fully roping Q and garbage collecting the intermediates
before touching K reduces the peak inference VRAM usage.
2025-09-16 19:21:14 -04:00
Dr.Lt.Data
7fa5990dbc Merge branch 'master' into dr-support-pip-cm 2025-09-17 06:09:40 +09:00
comfyanonymous
a39ac59c3e
Add encoder part of whisper large v3 as an audio encoder model. (#9894)
Not useful yet but some models use it.
2025-09-16 01:19:50 -04:00
Dr.Lt.Data
07212a2466 Merge branch 'master' into dr-support-pip-cm 2025-09-16 12:39:43 +09:00
blepping
1a85483da1
Fix depending on asserts to raise an exception in BatchedBrownianTree and Flash attn module (#9884)
Correctly handle the case where w0 is passed by kwargs in BatchedBrownianTree
2025-09-15 20:05:03 -04:00
comfyanonymous
47a9cde5d3
Support the omnigen2 umo lora. (#9886) 2025-09-15 18:10:55 -04:00
Dr.Lt.Data
f4d7a32cd8 Merge branch 'master' into dr-support-pip-cm 2025-09-15 12:16:00 +09:00
Jedrzej Kosinski
f228367c5e
Make ModuleNotFoundError ImportError instead (#9850) 2025-09-13 21:34:21 -04:00
comfyanonymous
80b7c9455b
Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00