ComfyUI/comfy/ldm
rattus128 e42682b24e
Reduce Peak WAN inference VRAM usage (#9898)
* flux: Do the xq and xk ropes one at a time

This was doing independendent interleaved tensor math on the q and k
tensors, leading to the holding of more than the minimum intermediates
in VRAM. On a bad day, it would VRAM OOM on xk intermediates.

Do everything q and then everything k, so torch can garbage collect
all of qs intermediates before k allocates its intermediates.

This reduces peak VRAM usage for some WAN2.2 inferences (at least).

* wan: Optimize qkv intermediates on attention

As commented. The former logic computed independent pieces of QKV in
parallel which help more inference intermediates in VRAM spiking
VRAM usage. Fully roping Q and garbage collecting the intermediates
before touching K reduces the peak inference VRAM usage.
2025-09-16 19:21:14 -04:00
..
ace Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
audio Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
aura Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
cascade Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
chroma Add support for Chroma Radiance (#9682) 2025-09-13 17:58:43 -04:00
chroma_radiance Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
cosmos Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
flux Reduce Peak WAN inference VRAM usage (#9898) 2025-09-16 19:21:14 -04:00
genmo Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hidream Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hunyuan3d Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hunyuan3dv2_1 Fix issue on old torch. (#9791) 2025-09-10 00:23:47 -04:00
hunyuan_video Hunyuan refiner vae now works with tiled. (#9836) 2025-09-12 19:46:46 -04:00
hydit Change cosmos and hydit models to use the native RMSNorm. (#7934) 2025-05-04 06:26:20 -04:00
lightricks Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
lumina Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
models Implement hunyuan image refiner model. (#9817) 2025-09-12 00:43:20 -04:00
modules Fix depending on asserts to raise an exception in BatchedBrownianTree and Flash attn module (#9884) 2025-09-15 20:05:03 -04:00
omnigen Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
pixart Remove windows line endings. (#8866) 2025-07-11 02:37:51 -04:00
qwen_image Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
wan Reduce Peak WAN inference VRAM usage (#9898) 2025-09-16 19:21:14 -04:00
common_dit.py add RMSNorm to comfy.ops 2025-04-14 18:00:33 -04:00
util.py Fix and enforce new lines at the end of files. 2024-12-30 04:14:59 -05:00