Commit Graph

1814 Commits

Author SHA1 Message Date
comfyanonymous
8691037bcc Lowvram fix. 2025-11-20 22:12:56 -05:00
comfyanonymous
726ca6c97e Better VAE encode mem estimation. 2025-11-20 22:12:56 -05:00
comfyanonymous
8ccd4ca886 Allow any number of input frames in VAE. 2025-11-20 22:12:56 -05:00
Rattus
023036ef9d vae_refiner: roll the convolution through temporal II
Roll the convolution through time using 2-latent-frame chunks and a
FIFO queue for the convolution seams.

Added support for encoder, lowered to 1 latent frame to save more
VRAM, made work for Hunyuan Image 3.0 (as code shared).

Fixed names, cleaned up code.
2025-11-20 22:12:56 -05:00
kijai
d8858cb58b Bugfix for the HunyuanVideo15 SR model 2025-11-20 22:12:56 -05:00
kijai
87256acf20 Fix TokenRefiner for fp16
Otherwise x.sum has infs, just in case only casting if input is fp16, I don't know if necessary.
2025-11-20 22:12:56 -05:00
kijai
11061d3ecc Some cleanup
Co-Authored-By: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com>
2025-11-20 22:12:56 -05:00
Rattus
d423272754 fix 2025-11-20 22:12:56 -05:00
kijai
18ae40065a Support HunyuanVideo15 latent resampler 2025-11-20 22:12:56 -05:00
Rattus
dc2e308422 vae_refiner: roll the convolution through temporal
Work in progress.

Roll the convolution through time using 2-latent-frame chunks and a
FIFO queue for the convolution seams.
2025-11-20 22:12:56 -05:00
kijai
0aa6eb2edc SR model fixes
This also still needs timesteps scheduling based on the noise scale, can be used with two samplers too already
2025-11-20 22:12:56 -05:00
kijai
fcd3a00d91 whitespaces... 2025-11-20 22:12:56 -05:00
kijai
fb4739f2f5 Support HunyuanVideo1.5 SR model 2025-11-20 22:12:56 -05:00
kijai
5d640eb407 Use the correct sigclip output... 2025-11-20 22:12:56 -05:00
kijai
5ffcc184fe Better latent rgb factors 2025-11-20 22:12:56 -05:00
kijai
f8fd20a2f3 Update model_base.py 2025-11-20 22:12:56 -05:00
kijai
5fe386157a I2V 2025-11-20 22:12:56 -05:00
kijai
dc736b62a2 fp16 works 2025-11-20 22:12:56 -05:00
kijai
521cc1d38d Prevent empty negative prompt
Really doesn't work otherwise
2025-11-20 22:12:56 -05:00
kijai
eaef7b764e Fix text encoding 2025-11-20 22:12:56 -05:00
kijai
ed3d1942d0 remove print 2025-11-20 22:12:56 -05:00
kijai
7378bf6a27 Update model.py 2025-11-20 22:12:56 -05:00
kijai
24d1b6b88a Update model.py 2025-11-20 22:12:56 -05:00
kijai
4f242de56f update 2025-11-20 22:12:56 -05:00
kijai
cadd00226b init 2025-11-20 22:12:56 -05:00
comfyanonymous
cb96d4d18c
Disable workaround on newer cudnn. (#10807) 2025-11-19 23:56:23 -05:00
comfyanonymous
17027f2a6a
Add a way to disable the final norm in the llama based TE models. (#10794) 2025-11-18 22:36:03 -05:00
comfyanonymous
d526974576
Fix hunyuan 3d 2.0 (#10792) 2025-11-18 16:46:19 -05:00
comfyanonymous
bd01d9f7fd
Add left padding support to tokenizers. (#10753) 2025-11-15 06:54:40 -05:00
comfyanonymous
443056c401
Fix custom nodes import error. (#10747)
This should fix the import errors but will break if the custom nodes actually try to use the class.
2025-11-14 03:26:05 -05:00
comfyanonymous
f60923590c
Use same code for chroma and flux blocks so that optimizations are shared. (#10746) 2025-11-14 01:28:05 -05:00
rattus
94c298f962
flux: reduce VRAM usage (#10737)
Cleanup a bunch of stack tensors on Flux. This take me from B=19 to B=22
for 1600x1600 on RTX5090.
2025-11-13 16:02:03 -08:00
contentis
3b3ef9a77a
Quantized Ops fixes (#10715)
* offload support, bug fixes, remove mixins

* add readme
2025-11-12 18:26:52 -05:00
rattus
1c7eaeca10
qwen: reduce VRAM usage (#10725)
Clean up a bunch of stacked and no-longer-needed tensors on the QWEN
VRAM peak (currently FFN).

With this I go from OOMing at B=37x1328x1328 to being able to
succesfully run B=47 (RTX5090).
2025-11-12 16:20:53 -05:00
rattus
18e7d6dba5
mm/mp: always unload re-used but modified models (#10724)
The partial unloader path in model re-use flow skips straight to the
actual unload without any check of the patching UUID. This means that
if you do an upscale flow with a model patch on an existing model, it
will not apply your patchings.

Fix by delaying the partial_unload until after the uuid checks. This
is done by making partial_unload a model of partial_load where extra_mem
is -ve.
2025-11-12 16:19:53 -05:00
comfyanonymous
1199411747
Don't pin tensor if not a torch.nn.parameter.Parameter (#10718) 2025-11-11 19:33:30 -05:00
rattus
c350009236
ops: Put weight cast on the offload stream (#10697)
This needs to be on the offload stream. This reproduced a black screen
with low resolution images on a slow bus when using FP8.
2025-11-09 22:52:11 -05:00
comfyanonymous
dea899f221
Unload weights if vram usage goes up between runs. (#10690) 2025-11-09 18:51:33 -05:00
comfyanonymous
e632e5de28
Add logging for model unloading. (#10692) 2025-11-09 18:06:39 -05:00
comfyanonymous
2abd2b5c20
Make ScaleROPE node work on Flux. (#10686) 2025-11-08 15:52:02 -05:00
comfyanonymous
a1a70362ca
Only unpin tensor if it was pinned by ComfyUI (#10677) 2025-11-07 11:15:05 -05:00
rattus
cf97b033ee
mm: guard against double pin and unpin explicitly (#10672)
As commented, if you let cuda be the one to detect double pin/unpinning
it actually creates an asyc GPU error.
2025-11-06 21:20:48 -05:00
comfyanonymous
09dc24c8a9
Pinned mem also seems to work on AMD. (#10658) 2025-11-05 19:11:15 -05:00
comfyanonymous
1d69245981
Enable pinned memory by default on Nvidia. (#10656)
Removed the --fast pinned_memory flag.

You can use --disable-pinned-memory to disable it. Please report if it
causes any issues.
2025-11-05 18:08:13 -05:00
comfyanonymous
97f198e421
Fix qwen controlnet regression. (#10657) 2025-11-05 18:07:35 -05:00
comfyanonymous
c4a6b389de
Lower ltxv mem usage to what it was before previous pr. (#10643)
Bring back qwen behavior to what it was before previous pr.
2025-11-04 22:47:35 -05:00
contentis
4cd881866b
Use single apply_rope function across models (#10547) 2025-11-04 20:10:11 -05:00
comfyanonymous
7f3e4d486c
Limit amount of pinned memory on windows to prevent issues. (#10638) 2025-11-04 17:37:50 -05:00
comfyanonymous
af4b7b5edb
More fp8 torch.compile regressions fixed. (#10625) 2025-11-03 22:14:20 -05:00
comfyanonymous
0f4ef3afa0
This seems to slow things down slightly on Linux. (#10624) 2025-11-03 21:47:14 -05:00