Add a --enable-asset-hashing CLI flag (action=store_true, default False)
and plumb it into the two asset-seeder call sites in main.py that
previously hardcoded compute_hashes=True (the startup scan and the
post-job output enqueue). Local runs now skip blake3 hashing unless the
user opts in, avoiding the startup/per-output cost on large models
directories while keeping hashing available for asset-portability
features.
Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com>
The _amd_vram_gtt_totals() device match compared str(pci_bus_id) against the
sysfs leaf BDF, but torch reports pci_bus_id as a decimal integer while amdgpu
names its nodes as a hex "domain🚌device.function" BDF, so the comparison
never matched. A single-GPU host was rescued by the len(candidates) == 1
fallback; a hybrid / multi-GPU host has no fallback and could fall through to
shared-heavy, demoting a dedicated GPU to SHARED (reported for a GPU sitting
behind a PCIe bridge).
Build the canonical hex BDF from torch's integer pci_domain_id / pci_bus_id /
pci_device_id and compare it against the candidate's realpath leaf BDF (PCI
function stripped). realpath already collapses any bridge chain to the leaf,
so this works for directly-attached, behind-a-bridge, and multi-GPU hosts
alike. The len(candidates) == 1 fallback is kept.
Signed-off-by: liminfei-amd <91481003+liminfei-amd@users.noreply.github.com>
#14274
* main: implement --vram-headroom
Implement --vram-headroom for dynamic vram as a hybrid debug/diagnostic
option that can be used for people who still report shared VRAM spills.
They can trial and error the setting to maintain a bit more headroom to
avoid shared VRAM spills.
* main: implement --reserve-vram
Implement --reserve-vram as extra headroom on the simple method which
is semantically as close as possible to the stated functionality and
formet behaviour of non-dynamic VRAM.
Add this option for users who know they have so much ram they want
to pin everything or have a pagefile that outruns their disk speed.
The removes the RAM pressure caps completely and pins behind the
primary model load forcing all models to be permanently comitted
to RAM.
Some custom nodes .to weights completely out of load context which
can wreak havoc if its for a model that is not active. Detect this
condition and just let it fall-through to the non-dynamic loader
straight up.
Some custom nodes try to set this true globally. It messes with dynamic
VRAM with one-off spikes that can OOM but this is also very high risk
for windows where such allocations might get serviced by shared memory
fallback.
Trump it.
cleanup_models_gc can be called once per load_models_gpu via
free_memory, which in turn can de-activate an active model via
this reset_cast_buffers.
cleanup_models_gc() could also come via obscure garbage collector
paths so limit reset_cast_buffers to the post-node callsite instead.
On AMD APUs (and other integrated GPUs) the "VRAM" reported by
torch.cuda.mem_get_info() is the GTT/shared aperture carved out of host
RAM, not a dedicated board. ComfyUI starts such devices in NORMAL_VRAM and
later sums device VRAM plus system RAM when sizing the model-load budget,
so on a UMA part the same physical RAM is counted twice and the inflated
budget triggers HIGH_VRAM / gpu-only placement that OOMs the shared pool.
Detecting integrated GPUs alone is not enough: integrated parts vary widely
in how memory is split. Some (large BIOS UMA carveout, e.g. Strix Halo)
report most memory as dedicated mem_info_vram_total, where HIGH_VRAM is
right; others report a small VRAM carveout with the bulk in GTT, where
SHARED is right. Demoting every integrated GPU to SHARED would regress the
dedicated-heavy configs.
Key the demotion on the amdgpu mem_info_vram_total vs mem_info_gtt_total
ratio: only when an integrated GPU's shared (GTT) pool is at least as large
as its dedicated VRAM do we switch it to VRAMState.SHARED. Dedicated-heavy
integrated parts and discrete GPUs keep NORMAL_VRAM. When the sysfs totals
cannot be read (e.g. NVIDIA Tegra, which has no dedicated VRAM) the device
is treated as shared-heavy, matching its true unified memory.
Fixes#14274
Signed-off-by: liminfei-amd <91481003+liminfei-amd@users.noreply.github.com>
* mm: split off registration helper to doer and headroom calc
* pinned_memory: implement registration comfy side
Move away from Aimdo buffer registrations which seem fraught with
danger and do it comfy side. Just start with the basic move.
* pinned_memory: do registrations as portable memory
* pinned_memory: discard async errors on registration fail
Like the good ol days.
* pinned_memory: implement abs shortfall retry
If pinned registration happens to fail despite the previous budget
ensures, consider the allocation shortfall, ensure it again, and
try again. This allows comfy pins to interoperate with other software
that might be doing substantive pinning.
* fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split
Problem:
Upon manual abort application hangs indefinitely.
`InterruptProcessingException` inherits from `BaseException` and bypasses MultiGPU's worker error handling block so thread dies silently, leaving the main thread waiting forever for `result_q.get()`
Fix:
Catch `comfy.model_management.InterruptProcessingException` instead of `Exception` so it's caught and passed back via `result_q` to unblock the main thread when manual abort signal fires.
* oops
* mm: re-instantate smart memory for VRAM
* mm: restore non-dynamic smart memory
By popular demand. We aren't quite ready for the deprecation as non
dynamic enabled GPUs and some high-vram custom model loader setups
prefer the old full hands on.