If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.
The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.
Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.
So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
* feature: Set the Ascend NPU to use a single one
* Enable the `--cuda-device` parameter to support both CUDA and Ascend NPUs simultaneously.
* Make the code just set the ASCENT_RT_VISIBLE_DEVICES environment variable without any other edits to master branch
---------
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* flux: math: Use _addcmul to avoid expensive VRAM intermediate
The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.
* wan: Delete the self attention before cross attention
This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
Adds installed and required workflow templates version information to the
/system_stats endpoint, allowing the frontend to detect and notify users
when their templates package is outdated.
- Add get_installed_templates_version() and get_required_templates_version()
methods to FrontendManager
- Include templates version info in system_stats response
- Add comprehensive unit tests for the new functionality