Commit Graph

2 Commits

Author SHA1 Message Date
Houde
e912b910a2 fix: add no-mmap safetensors loader for >4GB files on Windows ROCm/UMA
Root cause: Strix Halo UMA ROCm init reserves ~14 GB of Windows virtual
address space for GPU. This prevents safetensors from mmap-ing files
larger than ~4 GB (SDXL fp16 ~6.5 GB), causing access violations.
SD1.5 (3.97 GB) is below the threshold and unaffected.

Fix in comfy/utils.py:
- Add _LARGE_FILE_MMAP_THRESHOLD = 4_000_000_000
- Add _load_safetensors_no_mmap(): reads tensors via open()+seek()+read()
  instead of mmap, then clones each tensor for independent ownership
- In load_torch_file(): route files >4 GB with CUDA active through
  _load_safetensors_no_mmap() automatically

Tested: RealVisXL_V4.0.safetensors (6.46 GB) loads and generates
768x1024 portrait images at ~5 it/s on AMD Radeon 8050S (gfx1151).
SD1.5 baseline unaffected (still uses original mmap path).
2026-06-22 18:52:42 +01:00
Houde
b6a730b24e chore: add ROCm stable baseline snapshot (gfx1151 / Strix Halo)
- torch 2.7.0a0 + ROCm 6.5 via scottt/rocm-TheRock gfx1151 wheels
- numpy pinned to 1.26.4 for wheel compatibility
- SD1.5 512x512 20 steps ~5 it/s confirmed stable
- Saved workflow: sd15_test_rocm_workflow.json
- AMD Radeon 8050S, 14.37 GB UMA VRAM correctly detected
2026-06-22 18:52:42 +01:00