comfyanonymous
8691037bcc
Lowvram fix.
2025-11-20 22:12:56 -05:00
comfyanonymous
726ca6c97e
Better VAE encode mem estimation.
2025-11-20 22:12:56 -05:00
comfyanonymous
8ccd4ca886
Allow any number of input frames in VAE.
2025-11-20 22:12:56 -05:00
Rattus
023036ef9d
vae_refiner: roll the convolution through temporal II
...
Roll the convolution through time using 2-latent-frame chunks and a
FIFO queue for the convolution seams.
Added support for encoder, lowered to 1 latent frame to save more
VRAM, made work for Hunyuan Image 3.0 (as code shared).
Fixed names, cleaned up code.
2025-11-20 22:12:56 -05:00
kijai
d8858cb58b
Bugfix for the HunyuanVideo15 SR model
2025-11-20 22:12:56 -05:00
kijai
87256acf20
Fix TokenRefiner for fp16
...
Otherwise x.sum has infs, just in case only casting if input is fp16, I don't know if necessary.
2025-11-20 22:12:56 -05:00
kijai
11061d3ecc
Some cleanup
...
Co-Authored-By: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com>
2025-11-20 22:12:56 -05:00
Rattus
d423272754
fix
2025-11-20 22:12:56 -05:00
kijai
18ae40065a
Support HunyuanVideo15 latent resampler
2025-11-20 22:12:56 -05:00
Rattus
dc2e308422
vae_refiner: roll the convolution through temporal
...
Work in progress.
Roll the convolution through time using 2-latent-frame chunks and a
FIFO queue for the convolution seams.
2025-11-20 22:12:56 -05:00
kijai
0aa6eb2edc
SR model fixes
...
This also still needs timesteps scheduling based on the noise scale, can be used with two samplers too already
2025-11-20 22:12:56 -05:00
kijai
fcd3a00d91
whitespaces...
2025-11-20 22:12:56 -05:00
kijai
fb4739f2f5
Support HunyuanVideo1.5 SR model
2025-11-20 22:12:56 -05:00
kijai
5d640eb407
Use the correct sigclip output...
2025-11-20 22:12:56 -05:00
kijai
5ffcc184fe
Better latent rgb factors
2025-11-20 22:12:56 -05:00
kijai
f8fd20a2f3
Update model_base.py
2025-11-20 22:12:56 -05:00
kijai
5fe386157a
I2V
2025-11-20 22:12:56 -05:00
kijai
dc736b62a2
fp16 works
2025-11-20 22:12:56 -05:00
kijai
521cc1d38d
Prevent empty negative prompt
...
Really doesn't work otherwise
2025-11-20 22:12:56 -05:00
kijai
eaef7b764e
Fix text encoding
2025-11-20 22:12:56 -05:00
kijai
ed3d1942d0
remove print
2025-11-20 22:12:56 -05:00
kijai
7378bf6a27
Update model.py
2025-11-20 22:12:56 -05:00
kijai
24d1b6b88a
Update model.py
2025-11-20 22:12:56 -05:00
kijai
4f242de56f
update
2025-11-20 22:12:56 -05:00
kijai
cadd00226b
init
2025-11-20 22:12:56 -05:00
comfyanonymous
cb96d4d18c
Disable workaround on newer cudnn. ( #10807 )
2025-11-19 23:56:23 -05:00
comfyanonymous
17027f2a6a
Add a way to disable the final norm in the llama based TE models. ( #10794 )
2025-11-18 22:36:03 -05:00
comfyanonymous
d526974576
Fix hunyuan 3d 2.0 ( #10792 )
2025-11-18 16:46:19 -05:00
comfyanonymous
bd01d9f7fd
Add left padding support to tokenizers. ( #10753 )
2025-11-15 06:54:40 -05:00
comfyanonymous
443056c401
Fix custom nodes import error. ( #10747 )
...
This should fix the import errors but will break if the custom nodes actually try to use the class.
2025-11-14 03:26:05 -05:00
comfyanonymous
f60923590c
Use same code for chroma and flux blocks so that optimizations are shared. ( #10746 )
2025-11-14 01:28:05 -05:00
rattus
94c298f962
flux: reduce VRAM usage ( #10737 )
...
Cleanup a bunch of stack tensors on Flux. This take me from B=19 to B=22
for 1600x1600 on RTX5090.
2025-11-13 16:02:03 -08:00
contentis
3b3ef9a77a
Quantized Ops fixes ( #10715 )
...
* offload support, bug fixes, remove mixins
* add readme
2025-11-12 18:26:52 -05:00
rattus
1c7eaeca10
qwen: reduce VRAM usage ( #10725 )
...
Clean up a bunch of stacked and no-longer-needed tensors on the QWEN
VRAM peak (currently FFN).
With this I go from OOMing at B=37x1328x1328 to being able to
succesfully run B=47 (RTX5090).
2025-11-12 16:20:53 -05:00
rattus
18e7d6dba5
mm/mp: always unload re-used but modified models ( #10724 )
...
The partial unloader path in model re-use flow skips straight to the
actual unload without any check of the patching UUID. This means that
if you do an upscale flow with a model patch on an existing model, it
will not apply your patchings.
Fix by delaying the partial_unload until after the uuid checks. This
is done by making partial_unload a model of partial_load where extra_mem
is -ve.
2025-11-12 16:19:53 -05:00
comfyanonymous
1199411747
Don't pin tensor if not a torch.nn.parameter.Parameter ( #10718 )
2025-11-11 19:33:30 -05:00
rattus
c350009236
ops: Put weight cast on the offload stream ( #10697 )
...
This needs to be on the offload stream. This reproduced a black screen
with low resolution images on a slow bus when using FP8.
2025-11-09 22:52:11 -05:00
comfyanonymous
dea899f221
Unload weights if vram usage goes up between runs. ( #10690 )
2025-11-09 18:51:33 -05:00
comfyanonymous
e632e5de28
Add logging for model unloading. ( #10692 )
2025-11-09 18:06:39 -05:00
comfyanonymous
2abd2b5c20
Make ScaleROPE node work on Flux. ( #10686 )
2025-11-08 15:52:02 -05:00
comfyanonymous
a1a70362ca
Only unpin tensor if it was pinned by ComfyUI ( #10677 )
2025-11-07 11:15:05 -05:00
rattus
cf97b033ee
mm: guard against double pin and unpin explicitly ( #10672 )
...
As commented, if you let cuda be the one to detect double pin/unpinning
it actually creates an asyc GPU error.
2025-11-06 21:20:48 -05:00
comfyanonymous
09dc24c8a9
Pinned mem also seems to work on AMD. ( #10658 )
2025-11-05 19:11:15 -05:00
comfyanonymous
1d69245981
Enable pinned memory by default on Nvidia. ( #10656 )
...
Removed the --fast pinned_memory flag.
You can use --disable-pinned-memory to disable it. Please report if it
causes any issues.
2025-11-05 18:08:13 -05:00
comfyanonymous
97f198e421
Fix qwen controlnet regression. ( #10657 )
2025-11-05 18:07:35 -05:00
comfyanonymous
c4a6b389de
Lower ltxv mem usage to what it was before previous pr. ( #10643 )
...
Bring back qwen behavior to what it was before previous pr.
2025-11-04 22:47:35 -05:00
contentis
4cd881866b
Use single apply_rope function across models ( #10547 )
2025-11-04 20:10:11 -05:00
comfyanonymous
7f3e4d486c
Limit amount of pinned memory on windows to prevent issues. ( #10638 )
2025-11-04 17:37:50 -05:00
comfyanonymous
af4b7b5edb
More fp8 torch.compile regressions fixed. ( #10625 )
2025-11-03 22:14:20 -05:00
comfyanonymous
0f4ef3afa0
This seems to slow things down slightly on Linux. ( #10624 )
2025-11-03 21:47:14 -05:00