- Drop runtime ONNX Runtime installer/check block that used a heredoc followed by a brace group, causing “unexpected end of file”.
- Keep Manager pip preflight and toml preinstall; retain unified torch-based GPU probe and SageAttention flow.
- Add “Free Disk Space (Ubuntu)” and Docker prune steps before/after the GitHub-hosted build to recover tens of GB and avoid “no space left on device” failures on ubuntu-latest.
- Remove continue-on-error and gate the self-hosted job with `always() && needs.build-gh.result != 'success'` so it runs only if the GH build fails, while publish proceeds if either path succeeds.
- Enable buildx GHA cache (cache-from/cache-to) to minimize runner disk pressure and rebuild times without loading images locally.
- Drop runtime installer for uv; uv is now baked into the image via Dockerfile.
- Keep pip preflight (ensurepip) and toml preinstall to satisfy ComfyUI-Manager’s prestartup requirements.
- Retain unified torch-based GPU probe, SageAttention setup, custom node install flow, and ONNX Runtime CUDA provider guard.
- Copy uv and uvx from ghcr.io/astral-sh/uv:latest into /usr/local/bin to provide a fast package manager at build time without curl, always fetching the newest release. [web:200]
- Keeps image GPU-agnostic and improves cold-starts while entrypoint retains pip fallback for robustness in multiuser environments. [web:185]
- Add runtime guard to verify ONNX Runtime has CUDAExecutionProvider; if missing, uninstall CPU-only onnxruntime and install onnxruntime-gpu, then re-verify providers.
- Replace early gpu checks with one torch-based probe that detects devices and compute capability, sets DET_* flags, TORCH_CUDA_ARCH_LIST, and SAGE_STRATEGY, and exits fast when CC < 7.5.
- Ensure python -m pip is available (bootstrap with ensurepip if necessary) so ComfyUI-Manager can run package operations during prestartup.
- Install uv system-wide to /usr/local/bin if missing (UV_UNMANAGED_INSTALL) for a fast package manager alternative without modifying shell profiles.
- Preinstall toml if its import fails to avoid Manager import errors before Manager runs its own install steps.
- Install pkg-config, libcairo2, and libcairo2-dev so pip can build/use pycairo required by svglib/rlPyCairo, preventing meson/pkg-config “Dependency cairo not found” errors on Debian/Ubuntu bases.
- Define COMFYUI_PATH=/app/ComfyUI and both COMFYUI_MODEL_PATH=/app/ComfyUI/models and COMFYUI_MODELS_PATH=/app/ComfyUI/models to satisfy common tool conventions and silence CLI warnings, while remaining compatible with extra_model_paths.yaml for canonical model routing.
- Replace wildcard/recursive requirements scanning with a per-node loop that installs only each node’s top-level requirements.txt and runs install.py when present, aligning behavior with ComfyUI-Manager and preventing unintended subfolder or variant requirements from being applied.
- Drop automatic pyproject.toml/setup.py installs to avoid packaging nodes unnecessarily; ComfyUI loads nodes from custom_nodes directly.
- Keep user-level pip and permissions hardening so ComfyUI-Manager can later manage deps without permission errors.
- Add cupy-cuda12x to base image so CuPy installs from wheels during build without requiring a GPU, matching CUDA 12.x runtime and avoiding compilation on GitHub runners; this pairs with existing CUDA 12.9 libs and ensures CuPy is ready for GPU hosts at runtime.
- Keep PyTorch CUDA 12.9, Triton, and media libs; no features removed.
- This change follows CuPy’s guidance to install cupy-cuda12x via pip for CUDA 12.x, which expects CUDA headers present via cuda-cudart-dev-12-x (already in image) or the nvidia-cuda-runtime-cu12 PyPI package path if needed, consistent with our Debian CUDA 12.9 setup.
- Add an early runtime check that exits cleanly when no compatible NVIDIA GPU is detected, preventing unnecessary installs and builds on hosts without GPUs, which matches the repo’s requirement to target recent-gen NVIDIA GPUs and avoids work on GitHub runners.
- Mirror ComfyUI-Manager’s dependency behavior for custom nodes by: installing requirements*.txt and requirements/*.txt, building nodes with pyproject.toml using pip, and invoking node-provided install.py scripts when present, aligning with documented custom-node install flows.
- Enforce user-level pip installs (PIP_USER=1) and ensure /usr/local site-packages trees are owned and writable by the runtime user; this resolves permission-denied errors seen when Manager updates or removes packages (e.g., numpy __pycache__), improving reliability of Manager-driven installs and uninstalls.
* feature: Set the Ascend NPU to use a single one
* Enable the `--cuda-device` parameter to support both CUDA and Ascend NPUs simultaneously.
* Make the code just set the ASCENT_RT_VISIBLE_DEVICES environment variable without any other edits to master branch
---------
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* flux: math: Use _addcmul to avoid expensive VRAM intermediate
The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.
* wan: Delete the self attention before cross attention
This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
- Auto-derive TORCH_CUDA_ARCH_LIST from torch device capabilities (unique, sorted, optional +PTX) to cover all charted GPUs:
Turing 7.5, Ampere 8.0/8.6/8.7, Ada 8.9, Hopper 9.0, and Blackwell 10.0 & 12.0/12.1; add name-based fallbacks for mixed or torch-less scenarios.
- Add user-tunable build parallelism with SAGE_MAX_JOBS (preferred) and MAX_JOBS (alias) that cap PyTorch cpp_extension/ninja -j; fall back to a RAM/CPU heuristic to prevent OOM “Killed” during CUDA/C++ builds.
- Correct Sage flags: SAGE_ATTENTION_AVAILABLE only signals “built/installed,” while FORCE_SAGE_ATTENTION=1 enables Sage at startup; fix logs to reference FORCE_SAGE_ATTENTION.
- Maintain Triton install strategy by GPU generation for compatibility and performance.
- Add first-run dependency installation with COMFY_FORCE_INSTALL override; keep permissions bootstrap and minor logging/URL cleanups.
Adds installed and required workflow templates version information to the
/system_stats endpoint, allowing the frontend to detect and notify users
when their templates package is outdated.
- Add get_installed_templates_version() and get_required_templates_version()
methods to FrontendManager
- Include templates version info in system_stats response
- Add comprehensive unit tests for the new functionality
Introduce a first-run flag to install custom_nodes dependencies only on the
initial container start, with COMFY_FORCE_INSTALL=1 to override on demand;
correct Sage Attention flag semantics so SAGE_ATTENTION_AVAILABLE=1 only
indicates the build is present while FORCE_SAGE_ATTENTION=1 enables it at
startup; fix the misleading log to reference FORCE_SAGE_ATTENTION. Update
TORCH_CUDA_ARCH_LIST mapping to 7.5 (Turing), 8.6 (Ampere), 8.9 (Ada), and
10.0 (Blackwell/RTX 50); retain Triton strategy with a compatibility pin on
Turing and latest for Blackwell, including fallbacks. Clean up git clone URLs,
standardize on python -m pip, and tighten logs; preserve user remapping and
strategy-based rebuild detection via the .built flag.