EasyAI代码托管平台

wangbo/ComfyUI

Fork 0

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-07-03 21:20:49 +08:00

Commit Graph

Author	SHA1	Message	Date
Houde	e912b910a2	fix: add no-mmap safetensors loader for >4GB files on Windows ROCm/UMA Root cause: Strix Halo UMA ROCm init reserves ~14 GB of Windows virtual address space for GPU. This prevents safetensors from mmap-ing files larger than ~4 GB (SDXL fp16 ~6.5 GB), causing access violations. SD1.5 (3.97 GB) is below the threshold and unaffected. Fix in comfy/utils.py: - Add _LARGE_FILE_MMAP_THRESHOLD = 4_000_000_000 - Add _load_safetensors_no_mmap(): reads tensors via open()+seek()+read() instead of mmap, then clones each tensor for independent ownership - In load_torch_file(): route files >4 GB with CUDA active through _load_safetensors_no_mmap() automatically Tested: RealVisXL_V4.0.safetensors (6.46 GB) loads and generates 768x1024 portrait images at ~5 it/s on AMD Radeon 8050S (gfx1151). SD1.5 baseline unaffected (still uses original mmap path).	2026-06-22 18:52:42 +01:00
Houde	b6a730b24e	chore: add ROCm stable baseline snapshot (gfx1151 / Strix Halo) - torch 2.7.0a0 + ROCm 6.5 via scottt/rocm-TheRock gfx1151 wheels - numpy pinned to 1.26.4 for wheel compatibility - SD1.5 512x512 20 steps ~5 it/s confirmed stable - Saved workflow: sd15_test_rocm_workflow.json - AMD Radeon 8050S, 14.37 GB UMA VRAM correctly detected	2026-06-22 18:52:42 +01:00

Author

SHA1

Message

Date

Houde

e912b910a2

fix: add no-mmap safetensors loader for >4GB files on Windows ROCm/UMA

Root cause: Strix Halo UMA ROCm init reserves ~14 GB of Windows virtual
address space for GPU. This prevents safetensors from mmap-ing files
larger than ~4 GB (SDXL fp16 ~6.5 GB), causing access violations.
SD1.5 (3.97 GB) is below the threshold and unaffected.

Fix in comfy/utils.py:
- Add _LARGE_FILE_MMAP_THRESHOLD = 4_000_000_000
- Add _load_safetensors_no_mmap(): reads tensors via open()+seek()+read()
  instead of mmap, then clones each tensor for independent ownership
- In load_torch_file(): route files >4 GB with CUDA active through
  _load_safetensors_no_mmap() automatically

Tested: RealVisXL_V4.0.safetensors (6.46 GB) loads and generates
768x1024 portrait images at ~5 it/s on AMD Radeon 8050S (gfx1151).
SD1.5 baseline unaffected (still uses original mmap path).

2026-06-22 18:52:42 +01:00

Houde

b6a730b24e

chore: add ROCm stable baseline snapshot (gfx1151 / Strix Halo)

- torch 2.7.0a0 + ROCm 6.5 via scottt/rocm-TheRock gfx1151 wheels
- numpy pinned to 1.26.4 for wheel compatibility
- SD1.5 512x512 20 steps ~5 it/s confirmed stable
- Saved workflow: sd15_test_rocm_workflow.json
- AMD Radeon 8050S, 14.37 GB UMA VRAM correctly detected

2026-06-22 18:52:42 +01:00

2 Commits