True reset semantics for "default":
- On first selector application, cache the loader's original
load_device / offload_device on the underlying model object (which
is shared across patcher clones) and restore those base values when
the user picks "default". Previously "default" meant "passthrough"
so SelectXDevice(gpu:1) -> SelectXDevice(default) silently kept the
gpu:1 routing.
CPU + dynamic VRAM:
- When SelectModelDevice / SelectCLIPDevice resolves to CPU on a
ModelPatcherDynamic, also call clone(disable_dynamic=True) so the
result is a plain ModelPatcher, matching ModelPatcherDynamic.__new__'s
intent that CPU loads never run through the dynamic path. Fallback to
the regular dynamic clone if disable_dynamic is unsupported on that
patcher.
MultiGPU collision pruning:
- After SelectModelDevice retargets the primary patcher, drop any
multigpu clone (from a prior MultiGPU CFG Split) whose load_device
now matches the primary; otherwise two patchers would be bound to
the same device. Logs the prune at info level.
SelectVAEDevice: reject CPU at runtime:
- The UI uses get_gpu_device_options_no_cpu(), but a workflow opened
from another machine could still pass "cpu" through validate_inputs.
Detect that case explicitly, log a "CPU is not a supported choice"
passthrough message, and leave the VAE unchanged.
Cosmetic:
- Update VAE node docstring to accurately reflect the runtime CPU
rejection rather than the older "intentionally not offered" claim.
- Demote the fallback warnings inside resolve_gpu_device_option to no
log at all; the Select*Device nodes now own a single context-rich
info-level message per failed lookup, so there is no double logging.
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
V3 io.ComfyNode subclasses use the lowercase `validate_inputs` hook for opting out of strict combo validation (execution.py line 862); the uppercase `VALIDATE_INPUTS` is the V1 spelling and is ignored on V3 nodes. The strict combo check at execution.py line 1025 is gated on `if x not in validate_function_inputs`, so renaming to `validate_inputs(cls, device='default')` lets unknown `gpu:N` values pass validation and fall through to the runtime fallback.
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
When --enable-dynamic-vram is on, every ModelPatcher is a
ModelPatcherDynamic whose underlying model has a per-device dynamic_pins
dict, initialized in __init__ for self.load_device only. If a cloned
patcher's load_device is later reassigned (as the Select{Model,CLIP,VAE}
Device nodes do), the new device key is missing and partially_unload_ram
raises KeyError: device(type='cuda', index=N).
Fix:
- Extract the per-device dynamic_pins init in ModelPatcherDynamic.__init__
into a new helper method register_load_device(device) which is now also
called from __init__.
- Each Select*Device node calls clone.patcher.register_load_device(resolved)
after retargeting load_device, guarded by hasattr so non-dynamic
patchers (plain ModelPatcher in non-dynamic-vram installs) skip it.
Caught by happy-path test where SelectCLIPDevice retargeted CLIP from
cuda:0 to cuda:1 and CLIPTextEncode then crashed in
partially_unload_ram -> dynamic_pins[cuda:1].
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
Replace the per-loader device widgets removed in the previous commit
with three small passthrough selector nodes registered under
advanced/multigpu:
- Select Model Device (MODEL in/out) - options: default / cpu / gpu:N
- Select CLIP Device (CLIP in/out) - options: default / cpu / gpu:N
- Select VAE Device (VAE in/out) - options: default / gpu:N (no cpu)
Each node clones the inbound patcher (model.clone() / clip.clone() /
copy.copy(vae)+vae.patcher.clone()) and retargets load_device (and
offload_device for cpu / vae_offload_device for VAE).
Portability across machines with different GPU counts:
- VALIDATE_INPUTS returns True so an unknown gpu:N value (e.g. a
workflow saved on a 2-GPU machine opened on a 1-GPU machine) does
not error at validation time.
- At runtime, resolve_gpu_device_option(...) returns None for
unknown options (with a warning), and each selector then logs a
per-node info message and passes through unchanged, matching the
no-op style used by MultiGPU CFG Split's
"No extra torch devices need initialization..." log.
Also adds comfy.model_management.get_gpu_device_options_no_cpu() which
the VAE selector uses; on a single-GPU box this collapses to just
["default"], which is fine.
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
* Revert "Add tiled VAE lane to MultiGPU Work Units"
This reverts commit 4d3d68e473.
The tiled VAE lane will land as part of a follow-up PR alongside the
UPSCALE_MODEL lane, separated from the threaded-loader fix PR (#14052)
to keep the upstream merge focused.
* Revert "Add UPSCALE_MODEL lane to MultiGPU CFG Split"
This reverts commit 74b0a826ea.
The UPSCALE_MODEL lane will land as part of a follow-up PR alongside the
tiled VAE lane, separated from the threaded-loader fix PR (#14052) to
keep the upstream merge focused.
---------
Co-authored-by: John Pollock <pollockjj@gmail.com>
Introduce tiled_scale_multidim_multigpu in comfy/utils.py: a tile scheduler
that dispatches per-device tile functions through the existing
MultiGPUThreadPool and merges per-device CPU output buffers in deterministic
key order. The worker only catches BaseException at the thread boundary to
funnel errors to the main thread; bare torch.cuda.set_device and
torch.cuda.synchronize calls inside the worker fail loud if the device is
not CUDA, which is part of the primitive's contract.
Add UPSCALE_MODEL input on the MultiGPU CFG Split node and an upscale-model
descriptor deepclone helper in comfy/multigpu.py. Clones stay CPU-resident
until execute time and are returned to CPU afterward.
ImageUpscaleWithModel dispatches through tiled_scale_multidim_multigpu when
a multigpu descriptor is attached; the single-device path runs unchanged
when no clones are present.
Two doc-only changes addressing minor CodeRabbit findings on PR #7063:
* cli_args.py: clarify --cuda-device help text to document the required comma-separated format ('0' or '0,1'), matching how the value is consumed by CUDA_VISIBLE_DEVICES in main.py.
* nodes_multigpu.py: add a docstring NOTE on the (currently unregistered) MultiGPUOptionsNode explaining that its relative_speed input is plumbed through to model_options['multigpu_options'] but is not yet consulted by the cond scheduler, which still uses uniform round-robin via next_available_device(). Wire relative_speed into the scheduler before re-enabling the node.
Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
GPUOptionsGroup.clone() returns a new instance, but the return value was discarded, causing the node to mutate the upstream caller's group in-place. When multiple MultiGPU Options nodes share an input group, each node's additions would leak into earlier siblings. Assign the clone result back to gpu_options so each node owns its own copy.
Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>