Replace the per-loader device widgets removed in the previous commit
with three small passthrough selector nodes registered under
advanced/multigpu:
- Select Model Device (MODEL in/out) - options: default / cpu / gpu:N
- Select CLIP Device (CLIP in/out) - options: default / cpu / gpu:N
- Select VAE Device (VAE in/out) - options: default / gpu:N (no cpu)
Each node clones the inbound patcher (model.clone() / clip.clone() /
copy.copy(vae)+vae.patcher.clone()) and retargets load_device (and
offload_device for cpu / vae_offload_device for VAE).
Portability across machines with different GPU counts:
- VALIDATE_INPUTS returns True so an unknown gpu:N value (e.g. a
workflow saved on a 2-GPU machine opened on a 1-GPU machine) does
not error at validation time.
- At runtime, resolve_gpu_device_option(...) returns None for
unknown options (with a warning), and each selector then logs a
per-node info message and passes through unchanged, matching the
no-op style used by MultiGPU CFG Split's
"No extra torch devices need initialization..." log.
Also adds comfy.model_management.get_gpu_device_options_no_cpu() which
the VAE selector uses; on a single-GPU box this collapses to just
["default"], which is fine.
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
* Revert "Add tiled VAE lane to MultiGPU Work Units"
This reverts commit 4d3d68e473.
The tiled VAE lane will land as part of a follow-up PR alongside the
UPSCALE_MODEL lane, separated from the threaded-loader fix PR (#14052)
to keep the upstream merge focused.
* Revert "Add UPSCALE_MODEL lane to MultiGPU CFG Split"
This reverts commit 74b0a826ea.
The UPSCALE_MODEL lane will land as part of a follow-up PR alongside the
tiled VAE lane, separated from the threaded-loader fix PR (#14052) to
keep the upstream merge focused.
---------
Co-authored-by: John Pollock <pollockjj@gmail.com>
Introduce tiled_scale_multidim_multigpu in comfy/utils.py: a tile scheduler
that dispatches per-device tile functions through the existing
MultiGPUThreadPool and merges per-device CPU output buffers in deterministic
key order. The worker only catches BaseException at the thread boundary to
funnel errors to the main thread; bare torch.cuda.set_device and
torch.cuda.synchronize calls inside the worker fail loud if the device is
not CUDA, which is part of the primitive's contract.
Add UPSCALE_MODEL input on the MultiGPU CFG Split node and an upscale-model
descriptor deepclone helper in comfy/multigpu.py. Clones stay CPU-resident
until execute time and are returned to CPU afterward.
ImageUpscaleWithModel dispatches through tiled_scale_multidim_multigpu when
a multigpu descriptor is attached; the single-device path runs unchanged
when no clones are present.
Two doc-only changes addressing minor CodeRabbit findings on PR #7063:
* cli_args.py: clarify --cuda-device help text to document the required comma-separated format ('0' or '0,1'), matching how the value is consumed by CUDA_VISIBLE_DEVICES in main.py.
* nodes_multigpu.py: add a docstring NOTE on the (currently unregistered) MultiGPUOptionsNode explaining that its relative_speed input is plumbed through to model_options['multigpu_options'] but is not yet consulted by the cond scheduler, which still uses uniform round-robin via next_available_device(). Wire relative_speed into the scheduler before re-enabling the node.
Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
GPUOptionsGroup.clone() returns a new instance, but the return value was discarded, causing the node to mutate the upstream caller's group in-place. When multiple MultiGPU Options nodes share an input group, each node's additions would leak into earlier siblings. Assign the clone result back to gpu_options so each node owns its own copy.
Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>