EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-03-18 23:55:08 +08:00

History

rattus128 e42682b24e Reduce Peak WAN inference VRAM usage (#9898 ) * flux: Do the xq and xk ropes one at a time This was doing independendent interleaved tensor math on the q and k tensors, leading to the holding of more than the minimum intermediates in VRAM. On a bad day, it would VRAM OOM on xk intermediates. Do everything q and then everything k, so torch can garbage collect all of qs intermediates before k allocates its intermediates. This reduces peak VRAM usage for some WAN2.2 inferences (at least). * wan: Optimize qkv intermediates on attention As commented. The former logic computed independent pieces of QKV in parallel which help more inference intermediates in VRAM spiking VRAM usage. Fully roping Q and garbage collecting the intermediates before touching K reduces the peak inference VRAM usage.		2025-09-16 19:21:14 -04:00
..
model.py	Reduce Peak WAN inference VRAM usage (#9898 )	2025-09-16 19:21:14 -04:00
vae2_2.py	Tiny wan vae optimizations. (#9136 )	2025-08-01 05:25:38 -04:00
vae.py	Tiny wan vae optimizations. (#9136 )	2025-08-01 05:25:38 -04:00