mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-03-10 11:47:34 +08:00
This was doing independendent interleaved tensor math on the q and k tensors, leading to the holding of more than the minimum intermediates in VRAM. On a bad day, it would VRAM OOM on xk intermediates. Do everything q and then everything k, so torch can garbage collect all of qs intermediates before k allocates its intermediates. This reduces peak VRAM usage for some WAN2.2 inferences (at least). |
||
|---|---|---|
| .. | ||
| controlnet.py | ||
| layers.py | ||
| math.py | ||
| model.py | ||
| redux.py | ||