Adds a multi-stage Docker build that compiles SageAttention 2.2/2++ from the upstream repository head into a wheel using nvcc, then installs it into the slim runtime to keep images small. Ensures the builder installs the same Torch CUDA 12.9 stack as the runtime so the compiled extension ABI matches at load time. Shallow clones the SageAttention repo during build to always pull the latest version on each new image build. Updates the container launch to pass --use-sage-attention so ComfyUI enables SageAttention at startup when the package is present. This change keeps the runtime minimal while delivering up-to-date, high-performance attention kernels for modern NVIDIA GPUs in ComfyUI.
This adds av>=14.2 to satisfy Comfy’s API-node canary, ensuring video/audio nodes import without error, and uses the standard PyTorch CUDA 12.9 index URL syntax for reliability. It also installs nvidia-ml-py to align with the ecosystem shift away from deprecated pynvml, reducing future NVML warnings while preserving current functionality. The rest of the base remains unchanged, and existing ComfyUI requirements continue to install as before.
Update the Dockerfile to use python:3.12.11-slim-trixie to align with available cp312 wheels (notably MediaPipe) and avoid 3.13 ABI gaps, add cmake alongside build-essential to support native builds like dlib, keep the CUDA-enabled PyTorch install via the vendor index, and leave user/workdir/entrypoint/port settings unchanged to preserve runtime behavior.