## Summary
Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list.
## Changes
- Line 647: Changed `num_heads=int` to `num_heads: int`
- This corrects the parameter annotation from a default value assignment to proper type hint syntax
## Details
The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.
- Add a fast GPU presence gate with `nvidia-smi -L` at container start; exit early if unavailable or zero GPUs. Perform one thorough torch-based probe only in the root phase, export probe results (GPU_COUNT, COMPAT_GE_75, TORCH_CUDA_ARCH_LIST, SAGE_STRATEGY, SAGE_BUILD_STRATEGY), and call `runuser -p` so the app-user pass skips all GPU checks/logs. Remove any post-switch probing and strategy recovery paths to prevent duplicate logs. Unify wording to “SageAttention” and avoid duplicate “build” messages by logging the compilation once. After a successful install, delete the cloned sources under `.sage_attention/SageAttention` and retain `.built`. No features removed; behavior on GPU hosts is unchanged with cleaner, more accurate logs.
Move a detailed GPU probe to the top, logging per-device name/CC/memory and exiting early if no compatible GPUs (>=7.5), while storing a temporary SAGE_BUILD_STRATEGY for SageAttention builds; unify “SageAttention” naming and remove duplicate “Building” logs; remove UV usage and runtime pip bootstrap since deps are baked; add configure_manager_config to create or update ComfyUI-Manager’s persistent config.ini from CM_* environment variables on first and subsequent boots; keep Triton baked at 3.4.0 but switch to 3.2.0 at runtime for Turing strategies only; preserve system-wide installs and non-root ownership model.
Move GPU probe to the top with a single comprehensive device report (index, name, CC, VRAM) and early exit if no compatible GPUs (CC >= 7.5 not met); avoid duplicate logs after user switch via an internal flag. Remove uv usage and ensurepip, keeping system-wide pip installs only. Add CM_* environment variable handling to seed and reconcile ComfyUI-Manager’s persistent config.ini under user/default/ComfyUI-Manager on first and subsequent boots. Standardize “SageAttention” naming and reduce duplicate “building” messages; keep runtime Triton adjustment only when needed for Turing.
Add ensure_pip_available() that verifies python -s -m pip --version and python -s -m pip list; bootstrap with ensurepip and upgrade pip/setuptools/wheel if needed. Replace duplicate GPU probing with one torch-based probe persisted across user-switch and enumerate each GPU with name/CC/VRAM. Standardize SageAttention logs to a single “Compiling SageAttention…” headline.
Set PIP_USE_PEP517=1 so all builds use the standardized PEP 517 interface, suppressing legacy setup.py deprecation warnings during image build and runtime installs. Keep CUDA 12.9 toolchain and bake GitPython/toml to satisfy ComfyUI-Manager’s import checks without uv or venvs.
Remove the multi-stage COPY that brought uv (/uv and /uvx) into /usr/local/bin. This image targets system-wide package management with no virtual environments, and uv’s pip interface does not support the --user scheme, requiring either a venv or explicit --system usage. Eliminating uv avoids the “No virtual environment found” and “--user is unsupported” paths while keeping ComfyUI-Manager functional via standard pip. ComfyUI-Manager can be configured via config.ini (use_uv) and, with GitPython preinstalled system-wide, will skip any uv-based bootstrap during startup.
Add a pre-configured config.ini for ComfyUI-Manager with use_uv = false to prevent uv from attempting --user installs which are unsupported. Since GitPython and toml are pre-installed system-wide, Manager will find them via import without needing to install, but setting use_uv = false ensures any remaining dependency installs use regular pip instead of uv's unsupported --user path. This eliminates the "No virtual environment found; run uv venv or pass --system" error while maintaining the "no venvs" constraint.
Move a torch.cuda-based GPU probe to the top of the entrypoint, logging device count and compute capabilities and exiting immediately when no compatible GPU is found. Remove pip --user usage and PIP_USER so all runtime installs are system-wide (enabled by early chown of site-packages), avoiding uv’s lack of --user support while honoring the “no venvs” constraint. Keep Triton re-pin only when Turing strategy is detected; otherwise re-use baked Triton. Preserve SageAttention runtime build and Manager update behavior.
Bake more runtime dependencies into the image to reduce entrypoint work and avoid uv’s unsupported --user path without virtual environments. Pin Triton==3.4.0 alongside PyTorch 2.8/cu129, and install GitPython and toml system-wide so ComfyUI-Manager starts without attempting uv-based installs. Pre-clone ComfyUI-Manager into custom_nodes for faster startup; entrypoint will still update to origin/HEAD. No features removed; runtime paths and CUDA toolkit remain for SageAttention builds at startup.
If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.
The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.
Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.
So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.
Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.
So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.
The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.