V3 io.ComfyNode subclasses use the lowercase `validate_inputs` hook for opting out of strict combo validation (execution.py line 862); the uppercase `VALIDATE_INPUTS` is the V1 spelling and is ignored on V3 nodes. The strict combo check at execution.py line 1025 is gated on `if x not in validate_function_inputs`, so renaming to `validate_inputs(cls, device='default')` lets unknown `gpu:N` values pass validation and fall through to the runtime fallback.
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
When --enable-dynamic-vram is on, every ModelPatcher is a
ModelPatcherDynamic whose underlying model has a per-device dynamic_pins
dict, initialized in __init__ for self.load_device only. If a cloned
patcher's load_device is later reassigned (as the Select{Model,CLIP,VAE}
Device nodes do), the new device key is missing and partially_unload_ram
raises KeyError: device(type='cuda', index=N).
Fix:
- Extract the per-device dynamic_pins init in ModelPatcherDynamic.__init__
into a new helper method register_load_device(device) which is now also
called from __init__.
- Each Select*Device node calls clone.patcher.register_load_device(resolved)
after retargeting load_device, guarded by hasattr so non-dynamic
patchers (plain ModelPatcher in non-dynamic-vram installs) skip it.
Caught by happy-path test where SelectCLIPDevice retargeted CLIP from
cuda:0 to cuda:1 and CLIPTextEncode then crashed in
partially_unload_ram -> dynamic_pins[cuda:1].
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
Replace the per-loader device widgets removed in the previous commit
with three small passthrough selector nodes registered under
advanced/multigpu:
- Select Model Device (MODEL in/out) - options: default / cpu / gpu:N
- Select CLIP Device (CLIP in/out) - options: default / cpu / gpu:N
- Select VAE Device (VAE in/out) - options: default / gpu:N (no cpu)
Each node clones the inbound patcher (model.clone() / clip.clone() /
copy.copy(vae)+vae.patcher.clone()) and retargets load_device (and
offload_device for cpu / vae_offload_device for VAE).
Portability across machines with different GPU counts:
- VALIDATE_INPUTS returns True so an unknown gpu:N value (e.g. a
workflow saved on a 2-GPU machine opened on a 1-GPU machine) does
not error at validation time.
- At runtime, resolve_gpu_device_option(...) returns None for
unknown options (with a warning), and each selector then logs a
per-node info message and passes through unchanged, matching the
no-op style used by MultiGPU CFG Split's
"No extra torch devices need initialization..." log.
Also adds comfy.model_management.get_gpu_device_options_no_cpu() which
the VAE selector uses; on a single-GPU box this collapses to just
["default"], which is fine.
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
Remove the device-selection widgets that were added directly to existing
loader nodes (and the new CheckpointLoaderDevice / ImageOnlyCheckpointLoaderDevice
variants):
- nodes.py:
- delete CheckpointLoaderDevice class and its NODE_CLASS_MAPPINGS /
NODE_DISPLAY_NAME_MAPPINGS entries
- remove the optional `device` input + VALIDATE_INPUTS + resolve logic
from UNETLoader, VAELoader, CLIPLoader, DualCLIPLoader
- restore CLIPLoader/DualCLIPLoader `device` options to ["default", "cpu"]
- comfy_extras/nodes_video_model.py:
- delete ImageOnlyCheckpointLoaderDevice class + its mapping entries
- comfy_extras/nodes_lt_audio.py:
- restore LTXAVTextEncoderLoader `device` options to ["default", "cpu"]
and revert the resolve logic back to the simple `if device == "cpu"`
branch
The replacement approach is a small set of passthrough Select*Device
nodes (added in the next commit) that retarget MODEL/CLIP/VAE devices
without bloating every loader's UI or duplicating loaders.
The cuda_device_context helper and the model_management helpers
(get_gpu_device_options / resolve_gpu_device_option) from #13483 are
kept; they are still used by the new selector nodes.
Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13
Co-authored-by: Amp <amp@ampcode.com>
* Revert "Add tiled VAE lane to MultiGPU Work Units"
This reverts commit 4d3d68e473.
The tiled VAE lane will land as part of a follow-up PR alongside the
UPSCALE_MODEL lane, separated from the threaded-loader fix PR (#14052)
to keep the upstream merge focused.
* Revert "Add UPSCALE_MODEL lane to MultiGPU CFG Split"
This reverts commit 74b0a826ea.
The UPSCALE_MODEL lane will land as part of a follow-up PR alongside the
tiled VAE lane, separated from the threaded-loader fix PR (#14052) to
keep the upstream merge focused.
---------
Co-authored-by: John Pollock <pollockjj@gmail.com>
Introduce tiled_scale_multidim_multigpu in comfy/utils.py: a tile scheduler
that dispatches per-device tile functions through the existing
MultiGPUThreadPool and merges per-device CPU output buffers in deterministic
key order. The worker only catches BaseException at the thread boundary to
funnel errors to the main thread; bare torch.cuda.set_device and
torch.cuda.synchronize calls inside the worker fail loud if the device is
not CUDA, which is part of the primitive's contract.
Add UPSCALE_MODEL input on the MultiGPU CFG Split node and an upscale-model
descriptor deepclone helper in comfy/multigpu.py. Clones stay CPU-resident
until execute time and are returned to CPU afterward.
ImageUpscaleWithModel dispatches through tiled_scale_multidim_multigpu when
a multigpu descriptor is attached; the single-device path runs unchanged
when no clones are present.
Brings in 18 commits from master so worksplit-multigpu does not regress
fixes that landed on main since the last sync:
- #13699 Hunyuan 3D 2.1 batch-size fixes (overlap with our own backport;
conflict resolved in favor of the shape>=2 gate that binds
swap_cfg_halves once and reuses it for the output swap-back)
- #14031 ModelPatcherDynamic lora reshape / backup restore fix
- #13802 Multi-threaded model load (memory_management / pinned_memory /
model_management / aimdo plumbing)
- #12679 lanczos single-channel tensor fix
- #14010 Stable Audio 3 support
- assorted partner-node, openapi, workflow-template, and tooling updates
Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
Two doc-only changes addressing minor CodeRabbit findings on PR #7063:
* cli_args.py: clarify --cuda-device help text to document the required comma-separated format ('0' or '0,1'), matching how the value is consumed by CUDA_VISIBLE_DEVICES in main.py.
* nodes_multigpu.py: add a docstring NOTE on the (currently unregistered) MultiGPUOptionsNode explaining that its relative_speed input is plumbed through to model_options['multigpu_options'] but is not yet consulted by the cond scheduler, which still uses uniform round-robin via next_available_device(). Wire relative_speed into the scheduler before re-enabling the node.
Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
GPUOptionsGroup.clone() returns a new instance, but the return value was discarded, causing the node to mutate the upstream caller's group in-place. When multiple MultiGPU Options nodes share an input group, each node's additions would leak into earlier siblings. Assign the clone result back to gpu_options so each node owns its own copy.
Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
* Move detection category under image category
* Add missing categories
* Move detection nodes to detection category
* Move save nodes to image root catefory
* Rename postprocessors
* Move mask category under image
* Move guiders category to parent level at root of sampling category
* Move custom_sampling category to parent level at the root of sampling category
* Modify description of LoRA loaders
* Fix node id SolidMask
* Move VOID Quadmask under image/mask
* Group compositing nodes under image/compositing
* Move load image as mask to image category for consistency with other load image nodes
* Align display name with Load Checkpoint
* Move dataset category under training category
* Rename Number Convert to Conver Number (verb first)
* Rename Canny node
* Revert wanBlockSwap + description
* Add description to RemoveBackground node
* Revert category update of dataset
Split GLB save logic out of nodes_hunyuan3d.py into a new nodes_save_3d.py, and extend the writer to support UVs, per-vertex colors, and embedded baseColor textures.
Extend the MESH type with optional uvs, vertex_colors, and texture fields so meshes can carry texture data through the graph.
Add pack_variable_mesh_batch / get_mesh_batch_item helpers and switch VoxelToMesh / VoxelToMeshBasic to use them so batches with differing vertex/face counts no longer fail at torch.stack.
* Initial HiDream01-image support
* Cleanup nodes
* Cleaner handling of empty placeholder models
* Remove snap_to_predefined, prefer tooltip for the trained resolutions
* Add model and block wrappers
* Fix shift tooltip
* Add node to work around the patch tile issue
Experimental, runs multiple passes with the patch grid offset and blends with various different methods.
* Qwen35 vision rotary_pos_emb cast fix
* Fix embedding layout type
* Some small optimizations
* Cleanup, don't need this fallback
* Prefix KV cache, cleanup
Bit of speed, reduce redundant code
* Get rid of redundant custom sampler, refactor noise scaling
Our existing lcm sampler is mathematically same, just added the missing options to it instead and a node to control them. Refactored the noise scaling and fix it for the stochastic samplers, add a generic node to control the initial noise scale.
* Update nodes_hidream_o1.py
* Fix some cache validation cases
* Keep existing sampling params
* Remove redundant video vision path
* Replace some numpy ops with torch
* Fx RoPE index for batch size > 1
* Prefer torch preprocessing
* Rename block_type to be compatible with existing patch nodes
* Fixes and tweaks
* Add Boolean support to math expressions
* Change boolean result test to assert values
---------
Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com>
* initial WanDancer support
* nodes_wandancer: Add list form of chunker.
Create an alternate list form of the node so the chunk gens can be
trivially looped by the comfy executor.
* Closer match to original soxr resampling
* Remove librosa node
* Cleanup
---------
Co-authored-by: Rattus <rattus128@gmail.com>