Commit Graph

18 Commits

Author SHA1 Message Date
Jedrzej Kosinski
d7706091ae Add Select Model/CLIP/VAE Device passthrough nodes
Replace the per-loader device widgets removed in the previous commit
with three small passthrough selector nodes registered under
advanced/multigpu:

- Select Model Device  (MODEL  in/out)  - options: default / cpu / gpu:N
- Select CLIP Device   (CLIP   in/out)  - options: default / cpu / gpu:N
- Select VAE Device    (VAE    in/out)  - options: default / gpu:N (no cpu)

Each node clones the inbound patcher (model.clone() / clip.clone() /
copy.copy(vae)+vae.patcher.clone()) and retargets load_device (and
offload_device for cpu / vae_offload_device for VAE).

Portability across machines with different GPU counts:
- VALIDATE_INPUTS returns True so an unknown gpu:N value (e.g. a
  workflow saved on a 2-GPU machine opened on a 1-GPU machine) does
  not error at validation time.
- At runtime, resolve_gpu_device_option(...) returns None for
  unknown options (with a warning), and each selector then logs a
  per-node info message and passes through unchanged, matching the
  no-op style used by MultiGPU CFG Split's
  "No extra torch devices need initialization..." log.

Also adds comfy.model_management.get_gpu_device_options_no_cpu() which
the VAE selector uses; on a single-GPU box this collapses to just
["default"], which is fine.

Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
2026-05-22 21:39:18 -07:00
Jedrzej Kosinski
5dc4e38b89
Defer @pollockjj's tiled-VAE and UPSCALE_MODEL MultiGPU lanes (#14066)
* Revert "Add tiled VAE lane to MultiGPU Work Units"

This reverts commit 4d3d68e473.

The tiled VAE lane will land as part of a follow-up PR alongside the
UPSCALE_MODEL lane, separated from the threaded-loader fix PR (#14052)
to keep the upstream merge focused.

* Revert "Add UPSCALE_MODEL lane to MultiGPU CFG Split"

This reverts commit 74b0a826ea.

The UPSCALE_MODEL lane will land as part of a follow-up PR alongside the
tiled VAE lane, separated from the threaded-loader fix PR (#14052) to
keep the upstream merge focused.

---------

Co-authored-by: John Pollock <pollockjj@gmail.com>
2026-05-22 16:44:29 -07:00
John Pollock
4d3d68e473 Add tiled VAE lane to MultiGPU Work Units
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
2026-05-22 13:42:21 -05:00
John Pollock
74b0a826ea Add UPSCALE_MODEL lane to MultiGPU CFG Split
Introduce tiled_scale_multidim_multigpu in comfy/utils.py: a tile scheduler
that dispatches per-device tile functions through the existing
MultiGPUThreadPool and merges per-device CPU output buffers in deterministic
key order. The worker only catches BaseException at the thread boundary to
funnel errors to the main thread; bare torch.cuda.set_device and
torch.cuda.synchronize calls inside the worker fail loud if the device is
not CUDA, which is part of the primitive's contract.

Add UPSCALE_MODEL input on the MultiGPU CFG Split node and an upscale-model
descriptor deepclone helper in comfy/multigpu.py. Clones stay CPU-resident
until execute time and are returned to CPU afterward.

ImageUpscaleWithModel dispatches through tiled_scale_multidim_multigpu when
a multigpu descriptor is attached; the single-device path runs unchanged
when no clones are present.
2026-05-22 13:41:48 -05:00
Jedrzej Kosinski
4d9106dced Document --cuda-device comma format and MultiGPU Options relative_speed gap
Two doc-only changes addressing minor CodeRabbit findings on PR #7063:

* cli_args.py: clarify --cuda-device help text to document the required comma-separated format ('0' or '0,1'), matching how the value is consumed by CUDA_VISIBLE_DEVICES in main.py.

* nodes_multigpu.py: add a docstring NOTE on the (currently unregistered) MultiGPUOptionsNode explaining that its relative_speed input is plumbed through to model_options['multigpu_options'] but is not yet consulted by the cond scheduler, which still uses uniform round-robin via next_available_device(). Wire relative_speed into the scheduler before re-enabling the node.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 20:48:59 -07:00
Jedrzej Kosinski
50d1dd6273 Fix MultiGPU Options node discarding cloned GPUOptionsGroup
GPUOptionsGroup.clone() returns a new instance, but the return value was discarded, causing the node to mutate the upstream caller's group in-place. When multiple MultiGPU Options nodes share an input group, each node's additions would leak into earlier siblings. Assign the clone result back to gpu_options so each node owns its own copy.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 16:46:23 -07:00
Jedrzej Kosinski
1d8e379f41 Rename MultiGPU Work Units to MultiGPU CFG Split
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ee9-19d5-767a-9d7a-e50cbbef815b
Co-authored-by: Amp <amp@ampcode.com>
2026-03-30 08:00:20 -07:00
Jedrzej Kosinski
5f4fcd19e7 Simplify multigpu nodes: default max_gpus=2, remove gpu_options input, disable Options node
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ee9-19d5-767a-9d7a-e50cbbef815b
Co-authored-by: Amp <amp@ampcode.com>
2026-03-30 07:30:32 -07:00
Jedrzej Kosinski
d52dcbc88f Rewrite multigpu nodes to V3 format
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ee9-19d5-767a-9d7a-e50cbbef815b
Co-authored-by: Amp <amp@ampcode.com>
2026-03-30 07:23:13 -07:00
Jedrzej Kosinski
6dca17bd2d Satisfy ruff linting 2025-03-03 23:08:29 -06:00
Jedrzej Kosinski
093914a247 Made MultiGPU Work Units node more robust by forcing ModelPatcher clones to match at sample time, reuse loaded MultiGPU clones, finalize MultiGPU Work Units node ID and name, small refactors/cleanup of logging and multigpu-related code 2025-03-03 22:56:13 -06:00
Jedrzej Kosinski
eda866bf51 Extracted multigpu core code into multigpu.py, added load_balance_devices to get subdivision of work based on available devices and splittable work item count, added MultiGPU Options nodes to set relative_speed of specific devices; does not change behavior yet 2025-01-27 06:25:48 -06:00
Jedrzej Kosinski
e3298b84de Create proper MultiGPU Initialize node, create gpu_options to create scaffolding for asymmetrical GPU support 2025-01-26 09:34:20 -06:00
Jedrzej Kosinski
02a4d0ad7d Added unload_model_and_clones to model_management.py to allow unloading only relevant models 2025-01-23 01:20:00 -06:00
Jedrzej Kosinski
328d4f16a9 Make WeightHooks compatible with MultiGPU, clean up some code 2025-01-20 04:34:26 -06:00
Jedrzej Kosinski
bfce723311 Initial work on multigpu_clone function, which will account for additional_models getting cloned 2025-01-17 03:31:28 -06:00
Jedrzej Kosinski
25818dc848 Added a 'max_gpus' input 2025-01-14 13:45:14 -06:00
Jedrzej Kosinski
d5088072fb Make test node for multigpu instead of storing it in just a local __init__.py 2025-01-13 20:20:25 -06:00