mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2025-12-16 17:42:58 +08:00
* flux: Do the xq and xk ropes one at a time This was doing independendent interleaved tensor math on the q and k tensors, leading to the holding of more than the minimum intermediates in VRAM. On a bad day, it would VRAM OOM on xk intermediates. Do everything q and then everything k, so torch can garbage collect all of qs intermediates before k allocates its intermediates. This reduces peak VRAM usage for some WAN2.2 inferences (at least). * wan: Optimize qkv intermediates on attention As commented. The former logic computed independent pieces of QKV in parallel which help more inference intermediates in VRAM spiking VRAM usage. Fully roping Q and garbage collecting the intermediates before touching K reduces the peak inference VRAM usage. |
||
|---|---|---|
| .. | ||
| ace | ||
| audio | ||
| aura | ||
| cascade | ||
| chroma | ||
| chroma_radiance | ||
| cosmos | ||
| flux | ||
| genmo | ||
| hidream | ||
| hunyuan3d | ||
| hunyuan3dv2_1 | ||
| hunyuan_video | ||
| hydit | ||
| lightricks | ||
| lumina | ||
| models | ||
| modules | ||
| omnigen | ||
| pixart | ||
| qwen_image | ||
| wan | ||
| common_dit.py | ||
| util.py | ||