ComfyUI/comfy_extras
Jedrzej Kosinski aa464b36b3
Multi-GPU device selection for loader nodes + CUDA context fixes (#13483)
* Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2)

Amp-Thread-ID: https://ampcode.com/threads/T-019da964-2cc8-77f9-9aae-23f65da233db
Co-authored-by: Amp <amp@ampcode.com>

* Add GPU device selection to all loader nodes

- Add get_gpu_device_options() and resolve_gpu_device_option() helpers
  in model_management.py for vendor-agnostic GPU device selection
- Add device widget to CheckpointLoaderSimple, UNETLoader, VAELoader
- Expand device options in CLIPLoader, DualCLIPLoader, LTXAVTextEncoderLoader
  from [default, cpu] to include gpu:0, gpu:1, etc. on multi-GPU systems
- Wire load_diffusion_model_state_dict and load_state_dict_guess_config
  to respect model_options['load_device']
- Graceful fallback: unrecognized devices (e.g. gpu:1 on single-GPU)
  silently fall back to default

Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>

* Add VALIDATE_INPUTS to skip device combo validation for workflow portability

When a workflow saved on a 2-GPU machine (with device=gpu:1) is loaded
on a 1-GPU machine, the combo validation would reject the unknown value.
VALIDATE_INPUTS with the device parameter bypasses combo validation for
that input only, allowing resolve_gpu_device_option to handle the
graceful fallback at runtime.

Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>

* Set CUDA device context in outer_sample to match model load_device

Custom CUDA kernels (comfy_kitchen fp8 quantization) use
torch.cuda.current_device() for DLPack tensor export. When a model is
loaded on a non-default GPU (e.g. cuda:1), the CUDA context must match
or the kernel fails with 'Can't export tensors on a different CUDA
device index'. Save and restore the previous device around sampling.

Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>

* Fix code review bugs: negative index guard, CPU offload_device, checkpoint te_model_options

- resolve_gpu_device_option: reject negative indices (gpu:-1)
- UNETLoader: set offload_device when cpu is selected
- CheckpointLoaderSimple: pass te_model_options for CLIP device,
  set offload_device for cpu, pass load_device to VAE
- load_diffusion_model_state_dict: respect offload_device from model_options
- load_state_dict_guess_config: respect offload_device, pass load_device to VAE

Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>

* Fix CUDA device context for CLIP encoding and VAE encode/decode

Add torch.cuda.set_device() calls to match model's load device in:
- CLIP.encode_from_tokens: fixes 'Can't export tensors on a different
  CUDA device index' when CLIP is loaded on a non-default GPU
- CLIP.encode_from_tokens_scheduled: same fix for the hooks code path
- CLIP.generate: same fix for text generation
- VAE.decode: fixes VAE decoding on non-default GPU
- VAE.encode: fixes VAE encoding on non-default GPU

Same pattern as the existing outer_sample fix in samplers.py - saves
and restores previous CUDA device in a try/finally block.

Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57
Co-authored-by: Amp <amp@ampcode.com>

* Extract cuda_device_context manager, fix tiled VAE methods

Add model_management.cuda_device_context() — a context manager that
saves/restores torch.cuda.current_device when operating on a non-default
GPU. Replaces 6 copies of the manual save/set/restore boilerplate.

Refactored call sites:
- CLIP.encode_from_tokens
- CLIP.encode_from_tokens_scheduled (hooks path)
- CLIP.generate
- VAE.decode
- VAE.encode
- samplers.outer_sample

Bug fixes (newly wrapped):
- VAE.decode_tiled: was missing device context entirely, would fail
  on non-default GPU when called from 'VAE Decode (Tiled)' node
- VAE.encode_tiled: same issue for 'VAE Encode (Tiled)' node

Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57
Co-authored-by: Amp <amp@ampcode.com>

* Restore CheckpointLoaderSimple, add CheckpointLoaderDevice

Revert CheckpointLoaderSimple to its original form (no device input)
so it remains the simple default loader.

Add new CheckpointLoaderDevice node (advanced/loaders) with separate
model_device, clip_device, and vae_device inputs for per-component
GPU placement in multi-GPU setups.

Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57
Co-authored-by: Amp <amp@ampcode.com>

---------

Co-authored-by: Amp <amp@ampcode.com>
2026-04-23 19:10:33 -07:00
..
chainner_models Replace print with logging (#6138) 2024-12-20 16:24:55 -05:00
nodes_ace.py Ace step empty latent nodes follow intermediate dtype. (#13313) 2026-04-06 18:12:16 -07:00
nodes_advanced_samplers.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_align_your_steps.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_apg.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_attention_multiply.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_audio_encoder.py convert nodes_audio_encoder.py to V3 schema (#10123) 2025-09-30 23:00:22 -07:00
nodes_audio.py feat: add essentials_category to nodes and blueprints for Essentials tab (#12573) 2026-03-15 16:18:04 -07:00
nodes_camera_trajectory.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_canny.py Fix canny node not working with fp16. (#13085) 2026-03-20 23:15:50 -04:00
nodes_cfg.py convert CFG nodes to V3 schema (#9717) 2025-09-12 17:39:55 -04:00
nodes_chroma_radiance.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_clip_sdxl.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_color.py Add color type and Color to RGB Int node (#12145) 2026-01-30 15:01:33 -08:00
nodes_compositing.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_cond.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_context_windows.py Add slice_cond and per-model context window cond resizing (#12645) 2026-03-19 20:42:42 -07:00
nodes_controlnet.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_cosmos.py convert Cosmos nodes to V3 schema (#9721) 2025-09-12 17:38:46 -04:00
nodes_curve.py image histogram node (#13153) 2026-04-06 14:54:02 -07:00
nodes_custom_sampler.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_dataset.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_differential_diffusion.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_easycache.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_edit_model.py convert nodes_edit_model.py to V3 schema (#10147) 2025-10-03 13:24:42 -07:00
nodes_eps.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_flux.py Lower kv cache memory usage. (#12909) 2026-03-12 16:54:38 -04:00
nodes_freelunch.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_fresca.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_gits.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_glsl.py Add has_intermediate_output flag for nodes with interactive UI (#13048) 2026-03-27 21:06:38 -04:00
nodes_hidream.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_hooks.py Disable dynamic_vram when weight hooks applied (#12653) 2026-02-28 16:50:18 -05:00
nodes_hunyuan3d.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_hunyuan.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_hypernetwork.py convert nodes_hypernetwork.py to V3 schema (#10583) 2025-11-03 00:21:47 -08:00
nodes_hypertile.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_image_compare.py feat: add essentials_category to nodes and blueprints for Essentials tab (#12573) 2026-03-15 16:18:04 -07:00
nodes_images.py Add has_intermediate_output flag for nodes with interactive UI (#13048) 2026-03-27 21:06:38 -04:00
nodes_ip2p.py convert nodes_ip2p.pt to V3 schema (#10097) 2025-10-01 12:20:30 -07:00
nodes_kandinsky5.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_latent.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_load_3d.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_logic.py add support for kwargs inputs to allow arbitrary inputs from frontend (#12063) 2026-01-24 17:30:40 -08:00
nodes_lora_debug.py Move nodes from previous PR into their own file. (#12066) 2026-01-24 23:02:32 -05:00
nodes_lora_extract.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_lotus.py convert nodes_lotus.py to V3 schema (#10057) 2025-09-27 19:11:36 -07:00
nodes_lt_audio.py Multi-GPU device selection for loader nodes + CUDA context fixes (#13483) 2026-04-23 19:10:33 -07:00
nodes_lt_upsampler.py Support the LTXV 2 model. (#11632) 2026-01-05 01:58:59 -05:00
nodes_lt.py feat: LTX2: Support reference audio (ID-LoRA) (#13111) 2026-03-23 18:22:24 -04:00
nodes_lumina2.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_mahiro.py refactor: rename Mahiro CFG to Similarity-Adaptive Guidance (#12172) 2026-02-28 20:59:24 -08:00
nodes_mask.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_math.py feat: add Math Expression node with simpleeval evaluation (#12687) 2026-03-05 18:51:28 -08:00
nodes_mochi.py convert nodes_mochi.py to V3 schema (#10069) 2025-09-29 12:03:35 -07:00
nodes_model_advanced.py initial FlowRVS support (#12637) 2026-02-25 23:38:46 -05:00
nodes_model_downscale.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_model_merging_model_specific.py Qwen Image model merging node. (#9202) 2025-08-06 04:07:04 -04:00
nodes_model_merging.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_model_patch.py feat: SUPIR model support (CORE-17) (#13250) 2026-04-18 23:02:01 -04:00
nodes_morphology.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_multigpu.py Rename MultiGPU Work Units to MultiGPU CFG Split 2026-03-30 08:00:20 -07:00
nodes_nag.py Add category to Normalized Attention Guidance node (#12565) 2026-02-21 19:51:21 -05:00
nodes_nop.py Native block swap custom nodes considered harmful. (#10783) 2025-11-18 00:26:44 -05:00
nodes_number_convert.py fix(number-convert): preserve int precision for large numbers (#13147) 2026-03-25 18:06:34 -04:00
nodes_optimalsteps.py convert nodes_optimalsteps.py to V3 schema (#10074) 2025-10-01 12:18:04 -07:00
nodes_pag.py convert nodes_pag.py to V3 schema (#10080) 2025-10-01 12:18:49 -07:00
nodes_painter.py Add has_intermediate_output flag for nodes with interactive UI (#13048) 2026-03-27 21:06:38 -04:00
nodes_perpneg.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_photomaker.py convert nodes_photomaker.py to V3 schema (#10017) 2025-09-27 02:36:43 -07:00
nodes_pixart.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_post_processing.py feat: SUPIR model support (CORE-17) (#13250) 2026-04-18 23:02:01 -04:00
nodes_preview_any.py Add string output to preview text node. (#13406) 2026-04-14 14:42:23 -04:00
nodes_primitive.py fix: swap essentials_category from CLIPTextEncode to PrimitiveStringMultiline (#12553) 2026-02-20 23:46:46 -08:00
nodes_qwen.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_rebatch.py convert nodes_rebatch.py to V3 schema (#9945) 2025-09-26 14:10:49 -07:00
nodes_replacements.py Node Replacement API (#12014) 2026-02-15 02:12:30 -08:00
nodes_resolution.py refactor: use AspectRatio enum members as ASPECT_RATIOS dict keys (#12689) 2026-02-27 20:53:46 -08:00
nodes_rope.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_rtdetr.py SDPose: resize input always (#13349) 2026-04-10 11:26:55 -10:00
nodes_sag.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_sd3.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_sdpose.py SDPose: resize input always (#13349) 2026-04-10 11:26:55 -10:00
nodes_sdupscale.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_slg.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_stable3d.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_stable_cascade.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_string.py Add JsonExtractString node. (#13435) 2026-04-17 00:20:16 -04:00
nodes_tcfg.py convert nodes_tcfg.py to V3 schema (#9942) 2025-09-26 14:13:05 -07:00
nodes_textgen.py nodes_textgen: Implement use_default_template for LTX (#13451) 2026-04-17 12:20:09 -04:00
nodes_tomesd.py convert nodes_tomesd.py to V3 schema (#10180) 2025-10-03 11:50:38 -07:00
nodes_toolkit.py Add a Create List node (#12173) 2026-02-05 01:18:21 -05:00
nodes_torch_compile.py Disable dynamic_vram when using torch compiler (#12612) 2026-02-24 19:13:46 -05:00
nodes_train.py Fix Train LoRA crash when training_dtype is "none" with bfloat16 LoRA weights (#13145) 2026-03-24 23:53:44 -04:00
nodes_upscale_model.py Make ImageUpscaleWithModel node work with intermediate device and dtype. (#13357) 2026-04-10 21:48:26 -04:00
nodes_video_model.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_video.py fix: move essentials_category to correct replacement nodes (#12568) 2026-02-26 01:00:32 -08:00
nodes_wan.py feat: Support SCAIL WanVideo model (#12614) 2026-02-28 16:49:12 -05:00
nodes_wanmove.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00
nodes_webcam.py add search aliases to all nodes (#12035) 2026-01-22 18:36:58 -08:00
nodes_zimage.py feat: mark 429 widgets as advanced for collapsible UI (#12197) 2026-02-19 19:20:02 -08:00