ComfyUI/comfy/ldm/flux
Rattus 8922c21c9e flux: Do the xq and xk ropes one at a time
This was doing independendent interleaved tensor math on the q and k
tensors, leading to the holding of more than the minimum intermediates
in VRAM. On a bad day, it would VRAM OOM on xk intermediates.

Do everything q and then everything k, so torch can garbage collect
all of qs intermediates before k allocates its intermediates.

This reduces peak VRAM usage for some WAN2.2 inferences (at least).
2025-09-16 22:53:31 +10:00
..
controlnet.py Make flux controlnet work with sd3 text enc. (#8599) 2025-06-19 18:50:05 -04:00
layers.py Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
math.py flux: Do the xq and xk ropes one at a time 2025-09-16 22:53:31 +10:00
model.py Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
redux.py Support new flux model variants. 2024-11-21 08:38:23 -05:00