Commit Graph

36 Commits

Author SHA1 Message Date
Claude
234caeed32
Restore SageAttention, fix gpus syntax, and clean up entrypoint
SageAttention research confirms it remains useful (2-5x speedup over FA,
active development through SA3, broad community adoption for video/HR).
Restore it as an opt-in startup-compiled feature (FORCE_SAGE_ATTENTION=1).

Entrypoint is cleaned up vs the original: simplified GPU probe (drops
per-arch flag exports, keeps what's needed for strategy selection),
cleaner build logic with merged clone/update paths, removed dead code.

- Dockerfile: restore SAGE_ATTENTION_AVAILABLE=0 default env var
- entrypoint.sh: simplified but functionally equivalent SageAttention
  support; removes ~150 lines of redundant per-arch flag tracking
- README: fix docker-compose gpus syntax to 'gpus: all'; restore
  SageAttention docs with accurate description of behavior

https://claude.ai/code/session_01WQc56fWdK329K11kRGnb5g
2026-03-27 12:40:33 +00:00
Claude
1bf3bfbdb3
Fix Docker build failures and workflow publish bug; remove SageAttention
- Dockerfile: fix glibc 2.41 patch path (cuda-12.9 -> cuda-12.8 to match
  installed packages); remove SAGE_ATTENTION_AVAILABLE env var
- sync-build-release.yml: add always() to publish job condition so it runs
  even when build-self is skipped (the primary GitHub runner path succeeds),
  fixing releases never being created on normal builds
- entrypoint.sh: remove SageAttention compilation and GPU detection logic;
  simplify to permissions setup, ComfyUI-Manager sync, custom node install,
  and launch
- README: update CUDA version references from 12.9/cu129 to 12.8/cu128;
  remove SageAttention documentation; fix docker-compose GPU syntax

https://claude.ai/code/session_01WQc56fWdK329K11kRGnb5g
2026-03-27 12:31:01 +00:00
clsferguson
5e33515edc
refactor(entrypoint): single-pass GPU checks, preserved env across user switch, streamlined SageAttention build/cleanup
- Add a fast GPU presence gate with `nvidia-smi -L` at container start; exit early if unavailable or zero GPUs. Perform one thorough torch-based probe only in the root phase, export probe results (GPU_COUNT, COMPAT_GE_75, TORCH_CUDA_ARCH_LIST, SAGE_STRATEGY, SAGE_BUILD_STRATEGY), and call `runuser -p` so the app-user pass skips all GPU checks/logs. Remove any post-switch probing and strategy recovery paths to prevent duplicate logs. Unify wording to “SageAttention” and avoid duplicate “build” messages by logging the compilation once. After a successful install, delete the cloned sources under `.sage_attention/SageAttention` and retain `.built`. No features removed; behavior on GPU hosts is unchanged with cleaner, more accurate logs.
2025-10-03 13:27:33 -06:00
clsferguson
39b0a0cca8
refactor(entrypoint): GPU-first probe, unify SageAttention logs, CM_* config management, remove UV/pip bootstrap
Move a detailed GPU probe to the top, logging per-device name/CC/memory and exiting early if no compatible GPUs (>=7.5), while storing a temporary SAGE_BUILD_STRATEGY for SageAttention builds; unify “SageAttention” naming and remove duplicate “Building” logs; remove UV usage and runtime pip bootstrap since deps are baked; add configure_manager_config to create or update ComfyUI-Manager’s persistent config.ini from CM_* environment variables on first and subsequent boots; keep Triton baked at 3.4.0 but switch to 3.2.0 at runtime for Turing strategies only; preserve system-wide installs and non-root ownership model.
2025-10-02 21:34:09 -06:00
clsferguson
de6351e9bf
feat(entrypoint): GPU-first single log; UV-free; CM_* driven Manager config; standardized SageAttention logs
Move GPU probe to the top with a single comprehensive device report (index, name, CC, VRAM) and early exit if no compatible GPUs (CC >= 7.5 not met); avoid duplicate logs after user switch via an internal flag. Remove uv usage and ensurepip, keeping system-wide pip installs only. Add CM_* environment variable handling to seed and reconcile ComfyUI-Manager’s persistent config.ini under user/default/ComfyUI-Manager on first and subsequent boots. Standardize “SageAttention” naming and reduce duplicate “building” messages; keep runtime Triton adjustment only when needed for Turing.
2025-10-02 20:47:07 -06:00
clsferguson
d1ba18dac8
Rollback entrypoint.sh 2025-10-02 13:00:05 -06:00
clsferguson
e5990f874b
fix(entrypoint): make python -s -m pip available to ComfyUI-Manager; single GPU probe; log cleanup
Add ensure_pip_available() that verifies python -s -m pip --version and python -s -m pip list; bootstrap with ensurepip and upgrade pip/setuptools/wheel if needed. Replace duplicate GPU probing with one torch-based probe persisted across user-switch and enumerate each GPU with name/CC/VRAM. Standardize SageAttention logs to a single “Compiling SageAttention…” headline.
2025-10-02 11:27:25 -06:00
clsferguson
03908b9b04
perf(entrypoint): probe GPUs first, log count/CC, exit early; unify installs as system-wide
Move a torch.cuda-based GPU probe to the top of the entrypoint, logging device count and compute capabilities and exiting immediately when no compatible GPU is found. Remove pip --user usage and PIP_USER so all runtime installs are system-wide (enabled by early chown of site-packages), avoiding uv’s lack of --user support while honoring the “no venvs” constraint. Keep Triton re-pin only when Turing strategy is detected; otherwise re-use baked Triton. Preserve SageAttention runtime build and Manager update behavior.
2025-10-01 21:25:12 -06:00
clsferguson
207b64dc4c
Update entrypoint.sh 2025-10-01 14:33:02 -06:00
clsferguson
a0d4cc2faf
Roll back entrypoint.sh 2025-09-30 22:03:26 -06:00
clsferguson
db5ae38c11
fix(entrypoint): probe GPU once at startup before permissions; exit fast if not compatible
- Add early GPU probe that first tries nvidia-smi, then torch, with compute capability >= 7.5 gating; write a pass flag to avoid reprobe; exit 42 otherwise to prevent unnecessary work.
- Move GPU detection before any user/permission operations to stop repeated permission logs on restarts.
- Replace bracketed Markdown URLs with plain URLs in git commands.
2025-09-30 16:11:07 -06:00
clsferguson
f4d9284f63
fix(entrypoint): remove ONNX install and resolve heredoc EOF by eliminating brace-group usage
- Drop runtime ONNX Runtime installer/check block that used a heredoc followed by a brace group, causing “unexpected end of file”.
- Keep Manager pip preflight and toml preinstall; retain unified torch-based GPU probe and SageAttention flow.
2025-09-30 14:27:17 -06:00
clsferguson
79c06245ff
chore(entrypoint): remove runtime uv installation; rely on Dockerfile-provided uv
- Drop runtime installer for uv; uv is now baked into the image via Dockerfile.
- Keep pip preflight (ensurepip) and toml preinstall to satisfy ComfyUI-Manager’s prestartup requirements.
- Retain unified torch-based GPU probe, SageAttention setup, custom node install flow, and ONNX Runtime CUDA provider guard.
2025-09-30 12:18:05 -06:00
clsferguson
893e76e908
feat(entrypoint): ensure ORT CUDA at runtime and unify GPU probe via torch; fix Manager package ops (pip/uv) and preinstall toml
- Add runtime guard to verify ONNX Runtime has CUDAExecutionProvider; if missing, uninstall CPU-only onnxruntime and install onnxruntime-gpu, then re-verify providers.
- Replace early gpu checks with one torch-based probe that detects devices and compute capability, sets DET_* flags, TORCH_CUDA_ARCH_LIST, and SAGE_STRATEGY, and exits fast when CC < 7.5.
- Ensure python -m pip is available (bootstrap with ensurepip if necessary) so ComfyUI-Manager can run package operations during prestartup.
- Install uv system-wide to /usr/local/bin if missing (UV_UNMANAGED_INSTALL) for a fast package manager alternative without modifying shell profiles.
- Preinstall toml if its import fails to avoid Manager import errors before Manager runs its own install steps.
2025-09-30 11:30:22 -06:00
clsferguson
a632e1c5be
fix(entrypoint): install only root requirements.txt and install.py per node; remove wildcards and recursion
- Replace wildcard/recursive requirements scanning with a per-node loop that installs only each node’s top-level requirements.txt and runs install.py when present, aligning behavior with ComfyUI-Manager and preventing unintended subfolder or variant requirements from being applied.
- Drop automatic pyproject.toml/setup.py installs to avoid packaging nodes unnecessarily; ComfyUI loads nodes from custom_nodes directly.
- Keep user-level pip and permissions hardening so ComfyUI-Manager can later manage deps without permission errors.
2025-09-30 10:23:29 -06:00
clsferguson
b0b95e5cc5
feat(entrypoint): fail-fast when no compatible NVIDIA GPU, mirror Manager’s dependency install steps, and harden permissions for Manager operations
- Add an early runtime check that exits cleanly when no compatible NVIDIA GPU is detected, preventing unnecessary installs and builds on hosts without GPUs, which matches the repo’s requirement to target recent-gen NVIDIA GPUs and avoids work on GitHub runners. 
- Mirror ComfyUI-Manager’s dependency behavior for custom nodes by: installing requirements*.txt and requirements/*.txt, building nodes with pyproject.toml using pip, and invoking node-provided install.py scripts when present, aligning with documented custom-node install flows. 
- Enforce user-level pip installs (PIP_USER=1) and ensure /usr/local site-packages trees are owned and writable by the runtime user; this resolves permission-denied errors seen when Manager updates or removes packages (e.g., numpy __pycache__), improving reliability of Manager-driven installs and uninstalls.
2025-09-29 22:36:35 -06:00
clsferguson
f6d49f33b7
entrypoint: derive correct arch list; add user-tunable build parallelism; fix Sage flags; first-run installs
- Auto-derive TORCH_CUDA_ARCH_LIST from torch device capabilities (unique, sorted, optional +PTX) to cover all charted GPUs:
  Turing 7.5, Ampere 8.0/8.6/8.7, Ada 8.9, Hopper 9.0, and Blackwell 10.0 & 12.0/12.1; add name-based fallbacks for mixed or torch-less scenarios.
- Add user-tunable build parallelism with SAGE_MAX_JOBS (preferred) and MAX_JOBS (alias) that cap PyTorch cpp_extension/ninja -j; fall back to a RAM/CPU heuristic to prevent OOM “Killed” during CUDA/C++ builds.
- Correct Sage flags: SAGE_ATTENTION_AVAILABLE only signals “built/installed,” while FORCE_SAGE_ATTENTION=1 enables Sage at startup; fix logs to reference FORCE_SAGE_ATTENTION.
- Maintain Triton install strategy by GPU generation for compatibility and performance.
- Add first-run dependency installation with COMFY_FORCE_INSTALL override; keep permissions bootstrap and minor logging/URL cleanups.
2025-09-26 22:37:24 -06:00
clsferguson
45b87c7c99
Refactor entrypoint: first-run installs, fix Sage flags, arch map, logs
Introduce a first-run flag to install custom_nodes dependencies only on the
initial container start, with COMFY_FORCE_INSTALL=1 to override on demand;
correct Sage Attention flag semantics so SAGE_ATTENTION_AVAILABLE=1 only
indicates the build is present while FORCE_SAGE_ATTENTION=1 enables it at
startup; fix the misleading log to reference FORCE_SAGE_ATTENTION. Update
TORCH_CUDA_ARCH_LIST mapping to 7.5 (Turing), 8.6 (Ampere), 8.9 (Ada), and
10.0 (Blackwell/RTX 50); retain Triton strategy with a compatibility pin on
Turing and latest for Blackwell, including fallbacks. Clean up git clone URLs,
standardize on python -m pip, and tighten logs; preserve user remapping and
strategy-based rebuild detection via the .built flag.
2025-09-26 20:04:35 -06:00
clsferguson
7ee4f37971
fix(bootstrap): valid git URLs, dynamic CUDA archs, +PTX fallback
Replace Markdown-style links in git clone with standard HTTPS URLs so the
repository actually clones under bash.
Derive TORCH_CUDA_ARCH_LIST from PyTorch devices and add +PTX to the
highest architecture for forward-compat extension builds.
Warn explicitly on Blackwell (sm_120) when the active torch/CUDA build
lacks support, prompting an upgrade to torch with CUDA 12.8+.
Keep pip --no-cache-dir, preserve Triton pin for Turing, and retain
idempotent ComfyUI-Manager update logic.
2025-09-26 19:11:46 -06:00
clsferguson
231082e2a6
rollback entrypoint.sh
issues with script, rollback to an older modified version,
2025-09-26 18:52:38 -06:00
clsferguson
555b7d5606
feat(entrypoint): safer builds, dynamic CUDA archs, corrected git clone, first-run override, clarified Sage flags
Cap build parallelism via MAX_JOBS (override SAGEATTENTION_MAX_JOBS) and
CMAKE_BUILD_PARALLEL_LEVEL to prevent OOM kills during nvcc/cc1plus when
ninja fanout is high in constrained containers.

Compute TORCH_CUDA_ARCH_LIST from torch.cuda device properties to target
exact GPU SMs across mixed setups; keep human-readable nvidia-smi logs.

Move PATH/PYTHONPATH exports earlier and use `python -m pip` with
`--no-cache-dir` consistently to avoid stale caches and reduce image bloat.

Fix git clone/update commands to standard HTTPS and reset against
origin/HEAD; keep shallow operations for speed and reproducibility.

Clarify Sage Attention flags: set SAGE_ATTENTION_AVAILABLE only when
module import succeeds; require FORCE_SAGE_ATTENTION=1 to enable at boot.

Keep first-run dependency installation with COMFY_AUTO_INSTALL=1 override
to re-run installs on later boots without removing the first-run flag.
2025-09-26 18:19:23 -06:00
clsferguson
30ed9ae7cf
Fix entrypoint.sh
Removed escapes in python version.
2025-09-26 15:15:58 -06:00
clsferguson
13f3f11431
feat(entrypoint): dynamic CUDA arch detection, first-run override, fix git clone, clarify Sage Attention flags
Compute TORCH_CUDA_ARCH_LIST from torch.cuda device properties to build
for the exact GPUs present, improving correctness across mixed setups.

Add first-run dependency install gate with a COMFY_AUTO_INSTALL=1
override to re-run installs on later boots without removing the flag.

Use `python -m pip` consistently with `--no-cache-dir` to avoid stale
wheels and reduce container bloat during rebuilds.

Fix git clone commands to standard HTTPS (no Markdown link syntax) and
use shallow fetch/reset against origin/HEAD for speed and reliability.

Clarify Sage Attention flags: set SAGE_ATTENTION_AVAILABLE only when the
module is importable; require FORCE_SAGE_ATTENTION=1 to enable at boot.

Keep readable GPU logs via `nvidia-smi`, while relying on torch for
compile-time arch targeting. Improve logging throughout the flow.
2025-09-26 12:10:28 -06:00
clsferguson
7af5a79577
entrypoint: build SageAttention but don’t auto‑enable; honor SAGE_ATTENTION_AVAILABLE env
The entrypoint no longer exports SAGE_ATTENTION_AVAILABLE=1 on successful builds, preventing global attention patching from being forced; instead, it builds/tests SageAttention, sets SAGE_ATTENTION_BUILT=1 for visibility, and only appends --use-sage-attention when SAGE_ATTENTION_AVAILABLE=1 is supplied by the environment, preserving user control across docker run -e/compose env usage while keeping the feature available.
2025-09-23 10:28:12 -06:00
clsferguson
976eca9326
fix(entrypoint): resolve Triton installation permission errors blocking Sage Attention
Fix critical permission issue preventing Sage Attention from building by using 
--user flag for all pip installations in the entrypoint script.

Root Cause:
- Entrypoint runs as non-root user (appuser) after privilege drop
- Triton installation with --force-reinstall tried to upgrade system setuptools
- System packages require root permissions to uninstall/upgrade
- This caused "Permission denied" errors blocking Sage Attention build

Changes Made:
- Add --user flag to all pip install commands in install_triton_version()
- Add --user flag to Sage Attention pip installation in build_sage_attention_mixed()
- Use --no-build-isolation for Sage Attention to avoid setuptools conflicts
- Maintain all existing fallback logic and error handling

Result:
- Triton installs to user site-packages (~/.local/lib/python3.12/site-packages)
- Sage Attention builds and installs successfully
- No system package conflicts or permission issues
- ComfyUI can now detect and use Sage Attention with --use-sage-attention flag

This resolves the error:
"ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied"

GPU Detection worked perfectly:
- Detected 5x RTX 3060 GPUs correctly  
- PyTorch CUDA compatibility confirmed
- Strategy: rtx30_40_optimized selected appropriately
2025-09-22 11:58:15 -06:00
clsferguson
cdac5a8b32
feat(entrypoint): add comprehensive error handling and RTX 50 series support
Enhance entrypoint script with robust error handling, PyTorch validation, and RTX 50 support

PyTorch CUDA Validation:
- Add test_pytorch_cuda() function to verify CUDA availability and enumerate devices
- Display compute capabilities for all detected GPUs during startup
- Validate PyTorch installation before attempting Sage Attention builds

Enhanced GPU Detection:
- Update RTX 50 series architecture targeting to compute capability 12.0 (sm_120)
- Improve mixed-generation GPU handling with better compatibility logic
- Add comprehensive logging for GPU detection and strategy selection

Triton Version Management:
- Add intelligent fallback system for Triton installation failures
- RTX 50 series: Try latest → pre-release → stable fallback chain
- RTX 20 series: Enforce Triton 3.2.0 for compatibility
- Enhanced error recovery when specific versions fail

Build Error Handling:
- Add proper error propagation throughout Sage Attention build process
- Implement graceful degradation when builds fail (ComfyUI still starts)
- Comprehensive logging for troubleshooting build issues
- Better cleanup and recovery from partial build failures

Architecture-Specific Optimizations:
- Proper TORCH_CUDA_ARCH_LIST targeting for mixed GPU environments
- RTX 50 series: Use sm_120 for Blackwell architecture support
- Multi-GPU compilation targeting prevents architecture mismatches
- Intelligent version selection (v1.0 for RTX 20, v2.2 for modern GPUs)

Command Line Integration:
- Enhanced argument handling preserves user-provided flags
- Automatic --use-sage-attention injection when builds succeed
- Support for both default startup and custom user commands
- SAGE_ATTENTION_AVAILABLE environment variable for external integration

This transforms the entrypoint from a basic startup script into a comprehensive
GPU optimization and build management system with enterprise-grade error handling.
2025-09-22 09:28:12 -06:00
clsferguson
b6467bd90e
feat(entrypoint): add automatic Sage Attention detection and intelligent GPU-based build system
Implement comprehensive multi-GPU Sage Attention support with automatic detection and runtime flag management

This commit transforms the entrypoint script into an intelligent Sage Attention management system that automatically detects GPU configurations, builds the appropriate version, and seamlessly integrates with ComfyUI startup.

Key features added:
- Multi-GPU generation detection (RTX 20/30/40/50 series) with mixed-generation support
- Intelligent build strategy selection based on detected GPU hardware
- Automatic Triton version management (3.2.0 for RTX 20, latest for RTX 30+)
- Dynamic CUDA architecture targeting via TORCH_CUDA_ARCH_LIST environment variable
- Build caching with rebuild detection when GPU configuration changes
- Comprehensive error handling with graceful fallback when builds fail

Sage Attention version logic:
- RTX 20 series (mixed or standalone): Sage Attention v1.0 + Triton 3.2.0 for compatibility
- RTX 30/40 series: Sage Attention v2.2 + latest Triton for optimal performance  
- RTX 50 series: Sage Attention v2.2 + latest Triton with Blackwell architecture support
- Mixed generations: Prioritizes compatibility over peak performance

Runtime integration improvements:
- Sets SAGE_ATTENTION_AVAILABLE environment variable based on successful build/test
- Automatically adds --use-sage-attention flag to ComfyUI startup when available
- Preserves user command-line arguments while injecting Sage Attention support
- Handles both default startup and custom user commands gracefully

Build optimizations:
- Parallel compilation using all available CPU cores (MAX_JOBS=nproc)
- Architecture-specific CUDA kernel compilation for optimal GPU utilization  
- Intelligent caching prevents unnecessary rebuilds on container restart
- Comprehensive import testing ensures working installation before flag activation

Performance benefits:
- RTX 20 series: 10-15% speedup with v1.0 compatibility mode
- RTX 30/40 series: 20-40% speedup with full v2.2 optimizations
- RTX 50 series: 40-50% speedup with latest Blackwell features
- Mixed setups: Maintains compatibility while maximizing performance where possible

The system provides zero-configuration Sage Attention support while maintaining full backward compatibility and graceful degradation for unsupported hardware configurations.
2025-09-22 08:48:53 -06:00
clsferguson
fb64caf236
chore(bootstrap): trace root-only setup via run()
Introduce a run() helper that shell-quotes and prints each command before execution, and use it for mkdir/chown/chmod in the /usr/local-only Python target loop. This makes permission and path fixes visible in logs for easier debugging, preserves existing error-tolerance with || true, and remains compatible with set -euo pipefail and the runuser re-exec (runs only in the root branch). No functional changes beyond added verbosity; non-/usr/local paths remain no-op.
2025-09-17 14:49:01 -06:00
clsferguson
c1451b099b
fix: escapes on quotation marks.
removed some escapes from some quotation marks that caused failure to start.
2025-09-17 13:03:09 -06:00
clsferguson
db506ae51c
fix: upgrade custom-node deps each start and shallow-update ComfyUI-Manager
This updates ComfyUI-Manager on container launch using a shallow fetch/reset pattern and cleans untracked files to ensure a fresh working tree, which is the recommended way to refresh depth‑1 clones without full history. It also installs all detected requirements.txt files with pip --upgrade and only-if-needed strategy so direct requirements are upgraded within constraints on each run, while still excluding Manager from wheel-builds to avoid setuptools flat‑layout errors.
2025-09-17 12:30:08 -06:00
clsferguson
327d7ea37f
Fix case pattern for directory ownership and permissions 2025-09-11 13:21:13 -06:00
clsferguson
18bca70c8f
Improve logging and ownership management in entrypoint.sh 2025-09-11 10:13:25 -06:00
clsferguson
d303280af5
Refactor entrypoint.sh for improved logging and ownership 2025-09-11 09:57:29 -06:00
clsferguson
c77021a965
Refactor entrypoint.sh for clarity and functionality
Updated comments for clarity and improved Python path handling.
2025-09-09 22:34:11 -06:00
clsferguson
832d31b987
Improve user mapping and permissions in entrypoint.sh
Updated entrypoint.sh to enhance user mapping and directory permissions for runtime user.
2025-09-09 21:10:20 -06:00
clsferguson
917d40a425
Add entrypoint script for ComfyUI setup 2025-09-06 21:41:32 -06:00