ComfyUI/comfy/ldm/flux
rattus128 653ceab414
Reduce Peak WAN inference VRAM usage - part II (#10062)
* flux: math: Use _addcmul to avoid expensive VRAM intermediate

The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.

* wan: Delete the self attention before cross attention

This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
2025-09-27 18:14:16 -04:00
..
controlnet.py Make flux controlnet work with sd3 text enc. (#8599) 2025-06-19 18:50:05 -04:00
layers.py Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
math.py Reduce Peak WAN inference VRAM usage - part II (#10062) 2025-09-27 18:14:16 -04:00
model.py Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
redux.py Support new flux model variants. 2024-11-21 08:38:23 -05:00