mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-03-14 21:57:33 +08:00
* Implement seek and read for pins Source pins from an mmap is pad because its its a CPU->CPU copy that attempts to fully buffer the same data twice. Instead, use seek and read which avoids the mmap buffering while usually being a faster read in the first place (avoiding mmap faulting etc). * pinned_memory: Use Aimdo pinner The aimdo pinner bypasses pytorches CPU allocator which can leak windows commit charge. * ops: bypass init() of weight for embedding layer This similarly consumes large commit charge especially for TEs. It can cause a permanement leaked commit charge which can destabilize on systems close to the commit ceiling and generally confuses the RAM stats. * model_patcher: implement pinned memory counter Implement a pinned memory counter for better accounting of what volume of memory pins have. * implement touch accounting Implement accounting of touching mmapped tensors. * mm+mp: add residency mmap getter * utils: use the aimdo mmap to load sft files * model_management: Implement tigher RAM pressure semantics Implement a pressure release on entire MMAPs as windows does perform faster when mmaps are unloaded and model loads free ramp into fully unallocated RAM. Make the concept of freeing for pins a completely separate concept. Now that pins are loadable directly from original file and don' touch the mmap, tighten the freeing budget to just the current loaded model - what you have left over. This still over-frees pins, but its a lot better than before. So after the pins are freed with that algorithm, bounce entire MMAPs to free RAM based on what the model needs, deducting off any known resident-in-mmap tensors to the free quota to keep it as tight as possible. * comfy-aimdo 0.2.11 Comfy aimdo 0.2.11 * mm: Implement file_slice path for QT * ruff * ops: put meta-tensors in place to allow custom nodes to check geo
44 lines
1.3 KiB
Python
44 lines
1.3 KiB
Python
import comfy.model_management
|
|
import comfy.memory_management
|
|
import comfy_aimdo.host_buffer
|
|
import comfy_aimdo.torch
|
|
|
|
from comfy.cli_args import args
|
|
|
|
def get_pin(module):
|
|
return getattr(module, "_pin", None)
|
|
|
|
def pin_memory(module):
|
|
if module.pin_failed or args.disable_pinned_memory or get_pin(module) is not None:
|
|
return
|
|
#FIXME: This is a RAM cache trigger event
|
|
size = comfy.memory_management.vram_aligned_size([ module.weight, module.bias ])
|
|
|
|
if comfy.model_management.MAX_PINNED_MEMORY <= 0 or (comfy.model_management.TOTAL_PINNED_MEMORY + size) > comfy.model_management.MAX_PINNED_MEMORY:
|
|
module.pin_failed = True
|
|
return False
|
|
|
|
try:
|
|
hostbuf = comfy_aimdo.host_buffer.HostBuffer(size)
|
|
except RuntimeError:
|
|
module.pin_failed = True
|
|
return False
|
|
|
|
module._pin = comfy_aimdo.torch.hostbuf_to_tensor(hostbuf)
|
|
module._pin_hostbuf = hostbuf
|
|
comfy.model_management.TOTAL_PINNED_MEMORY += size
|
|
return True
|
|
|
|
def unpin_memory(module):
|
|
if get_pin(module) is None:
|
|
return 0
|
|
size = module._pin.numel() * module._pin.element_size()
|
|
|
|
comfy.model_management.TOTAL_PINNED_MEMORY -= size
|
|
if comfy.model_management.TOTAL_PINNED_MEMORY < 0:
|
|
comfy.model_management.TOTAL_PINNED_MEMORY = 0
|
|
|
|
del module._pin
|
|
del module._pin_hostbuf
|
|
return size
|