Commit Graph

21 Commits

Author SHA1 Message Date
Jedrzej Kosinski
b319c8088b SelectXDevice: address code-review follow-ups
True reset semantics for "default":
- On first selector application, cache the loader's original
  load_device / offload_device on the underlying model object (which
  is shared across patcher clones) and restore those base values when
  the user picks "default". Previously "default" meant "passthrough"
  so SelectXDevice(gpu:1) -> SelectXDevice(default) silently kept the
  gpu:1 routing.

CPU + dynamic VRAM:
- When SelectModelDevice / SelectCLIPDevice resolves to CPU on a
  ModelPatcherDynamic, also call clone(disable_dynamic=True) so the
  result is a plain ModelPatcher, matching ModelPatcherDynamic.__new__'s
  intent that CPU loads never run through the dynamic path. Fallback to
  the regular dynamic clone if disable_dynamic is unsupported on that
  patcher.

MultiGPU collision pruning:
- After SelectModelDevice retargets the primary patcher, drop any
  multigpu clone (from a prior MultiGPU CFG Split) whose load_device
  now matches the primary; otherwise two patchers would be bound to
  the same device. Logs the prune at info level.

SelectVAEDevice: reject CPU at runtime:
- The UI uses get_gpu_device_options_no_cpu(), but a workflow opened
  from another machine could still pass "cpu" through validate_inputs.
  Detect that case explicitly, log a "CPU is not a supported choice"
  passthrough message, and leave the VAE unchanged.

Cosmetic:
- Update VAE node docstring to accurately reflect the runtime CPU
  rejection rather than the older "intentionally not offered" claim.
- Demote the fallback warnings inside resolve_gpu_device_option to no
  log at all; the Select*Device nodes now own a single context-rich
  info-level message per failed lookup, so there is no double logging.

Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
2026-05-22 22:29:45 -07:00
Jedrzej Kosinski
9ee1540882 SelectXDevice: use lowercase validate_inputs for V3 combo bypass
V3 io.ComfyNode subclasses use the lowercase `validate_inputs` hook for opting out of strict combo validation (execution.py line 862); the uppercase `VALIDATE_INPUTS` is the V1 spelling and is ignored on V3 nodes. The strict combo check at execution.py line 1025 is gated on `if x not in validate_function_inputs`, so renaming to `validate_inputs(cls, device='default')` lets unknown `gpu:N` values pass validation and fall through to the runtime fallback.

Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
2026-05-22 21:50:29 -07:00
Jedrzej Kosinski
4e650055d0 SelectXDevice nodes: register new load_device with ModelPatcherDynamic
When --enable-dynamic-vram is on, every ModelPatcher is a
ModelPatcherDynamic whose underlying model has a per-device dynamic_pins
dict, initialized in __init__ for self.load_device only. If a cloned
patcher's load_device is later reassigned (as the Select{Model,CLIP,VAE}
Device nodes do), the new device key is missing and partially_unload_ram
raises KeyError: device(type='cuda', index=N).

Fix:
- Extract the per-device dynamic_pins init in ModelPatcherDynamic.__init__
  into a new helper method register_load_device(device) which is now also
  called from __init__.
- Each Select*Device node calls clone.patcher.register_load_device(resolved)
  after retargeting load_device, guarded by hasattr so non-dynamic
  patchers (plain ModelPatcher in non-dynamic-vram installs) skip it.

Caught by happy-path test where SelectCLIPDevice retargeted CLIP from
cuda:0 to cuda:1 and CLIPTextEncode then crashed in
partially_unload_ram -> dynamic_pins[cuda:1].

Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
2026-05-22 21:46:07 -07:00
Jedrzej Kosinski
d7706091ae Add Select Model/CLIP/VAE Device passthrough nodes
Replace the per-loader device widgets removed in the previous commit
with three small passthrough selector nodes registered under
advanced/multigpu:

- Select Model Device  (MODEL  in/out)  - options: default / cpu / gpu:N
- Select CLIP Device   (CLIP   in/out)  - options: default / cpu / gpu:N
- Select VAE Device    (VAE    in/out)  - options: default / gpu:N (no cpu)

Each node clones the inbound patcher (model.clone() / clip.clone() /
copy.copy(vae)+vae.patcher.clone()) and retargets load_device (and
offload_device for cpu / vae_offload_device for VAE).

Portability across machines with different GPU counts:
- VALIDATE_INPUTS returns True so an unknown gpu:N value (e.g. a
  workflow saved on a 2-GPU machine opened on a 1-GPU machine) does
  not error at validation time.
- At runtime, resolve_gpu_device_option(...) returns None for
  unknown options (with a warning), and each selector then logs a
  per-node info message and passes through unchanged, matching the
  no-op style used by MultiGPU CFG Split's
  "No extra torch devices need initialization..." log.

Also adds comfy.model_management.get_gpu_device_options_no_cpu() which
the VAE selector uses; on a single-GPU box this collapses to just
["default"], which is fine.

Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
2026-05-22 21:39:18 -07:00
Jedrzej Kosinski
5dc4e38b89
Defer @pollockjj's tiled-VAE and UPSCALE_MODEL MultiGPU lanes (#14066)
* Revert "Add tiled VAE lane to MultiGPU Work Units"

This reverts commit 4d3d68e473.

The tiled VAE lane will land as part of a follow-up PR alongside the
UPSCALE_MODEL lane, separated from the threaded-loader fix PR (#14052)
to keep the upstream merge focused.

* Revert "Add UPSCALE_MODEL lane to MultiGPU CFG Split"

This reverts commit 74b0a826ea.

The UPSCALE_MODEL lane will land as part of a follow-up PR alongside the
tiled VAE lane, separated from the threaded-loader fix PR (#14052) to
keep the upstream merge focused.

---------

Co-authored-by: John Pollock <pollockjj@gmail.com>
2026-05-22 16:44:29 -07:00
John Pollock
4d3d68e473 Add tiled VAE lane to MultiGPU Work Units
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
2026-05-22 13:42:21 -05:00
John Pollock
74b0a826ea Add UPSCALE_MODEL lane to MultiGPU CFG Split
Introduce tiled_scale_multidim_multigpu in comfy/utils.py: a tile scheduler
that dispatches per-device tile functions through the existing
MultiGPUThreadPool and merges per-device CPU output buffers in deterministic
key order. The worker only catches BaseException at the thread boundary to
funnel errors to the main thread; bare torch.cuda.set_device and
torch.cuda.synchronize calls inside the worker fail loud if the device is
not CUDA, which is part of the primitive's contract.

Add UPSCALE_MODEL input on the MultiGPU CFG Split node and an upscale-model
descriptor deepclone helper in comfy/multigpu.py. Clones stay CPU-resident
until execute time and are returned to CPU afterward.

ImageUpscaleWithModel dispatches through tiled_scale_multidim_multigpu when
a multigpu descriptor is attached; the single-device path runs unchanged
when no clones are present.
2026-05-22 13:41:48 -05:00
Jedrzej Kosinski
4d9106dced Document --cuda-device comma format and MultiGPU Options relative_speed gap
Two doc-only changes addressing minor CodeRabbit findings on PR #7063:

* cli_args.py: clarify --cuda-device help text to document the required comma-separated format ('0' or '0,1'), matching how the value is consumed by CUDA_VISIBLE_DEVICES in main.py.

* nodes_multigpu.py: add a docstring NOTE on the (currently unregistered) MultiGPUOptionsNode explaining that its relative_speed input is plumbed through to model_options['multigpu_options'] but is not yet consulted by the cond scheduler, which still uses uniform round-robin via next_available_device(). Wire relative_speed into the scheduler before re-enabling the node.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 20:48:59 -07:00
Jedrzej Kosinski
50d1dd6273 Fix MultiGPU Options node discarding cloned GPUOptionsGroup
GPUOptionsGroup.clone() returns a new instance, but the return value was discarded, causing the node to mutate the upstream caller's group in-place. When multiple MultiGPU Options nodes share an input group, each node's additions would leak into earlier siblings. Assign the clone result back to gpu_options so each node owns its own copy.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 16:46:23 -07:00
Jedrzej Kosinski
1d8e379f41 Rename MultiGPU Work Units to MultiGPU CFG Split
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ee9-19d5-767a-9d7a-e50cbbef815b
Co-authored-by: Amp <amp@ampcode.com>
2026-03-30 08:00:20 -07:00
Jedrzej Kosinski
5f4fcd19e7 Simplify multigpu nodes: default max_gpus=2, remove gpu_options input, disable Options node
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ee9-19d5-767a-9d7a-e50cbbef815b
Co-authored-by: Amp <amp@ampcode.com>
2026-03-30 07:30:32 -07:00
Jedrzej Kosinski
d52dcbc88f Rewrite multigpu nodes to V3 format
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ee9-19d5-767a-9d7a-e50cbbef815b
Co-authored-by: Amp <amp@ampcode.com>
2026-03-30 07:23:13 -07:00
Jedrzej Kosinski
6dca17bd2d Satisfy ruff linting 2025-03-03 23:08:29 -06:00
Jedrzej Kosinski
093914a247 Made MultiGPU Work Units node more robust by forcing ModelPatcher clones to match at sample time, reuse loaded MultiGPU clones, finalize MultiGPU Work Units node ID and name, small refactors/cleanup of logging and multigpu-related code 2025-03-03 22:56:13 -06:00
Jedrzej Kosinski
eda866bf51 Extracted multigpu core code into multigpu.py, added load_balance_devices to get subdivision of work based on available devices and splittable work item count, added MultiGPU Options nodes to set relative_speed of specific devices; does not change behavior yet 2025-01-27 06:25:48 -06:00
Jedrzej Kosinski
e3298b84de Create proper MultiGPU Initialize node, create gpu_options to create scaffolding for asymmetrical GPU support 2025-01-26 09:34:20 -06:00
Jedrzej Kosinski
02a4d0ad7d Added unload_model_and_clones to model_management.py to allow unloading only relevant models 2025-01-23 01:20:00 -06:00
Jedrzej Kosinski
328d4f16a9 Make WeightHooks compatible with MultiGPU, clean up some code 2025-01-20 04:34:26 -06:00
Jedrzej Kosinski
bfce723311 Initial work on multigpu_clone function, which will account for additional_models getting cloned 2025-01-17 03:31:28 -06:00
Jedrzej Kosinski
25818dc848 Added a 'max_gpus' input 2025-01-14 13:45:14 -06:00
Jedrzej Kosinski
d5088072fb Make test node for multigpu instead of storing it in just a local __init__.py 2025-01-13 20:20:25 -06:00