ComfyUI/comfy
Jedrzej Kosinski 7d958e18ad multigpu: fix CPU SelectModelDevice slowness + MGCS reuse stripping is_multigpu_base_clone
Two issues surfaced while testing the worksplit-multigpu PR:

1. Select Model Device -> CPU sampled at roughly 0.01 it/s, looking
   like an indefinite hang. PyTorch's CPU conv2d kernels do not have
   native fp16/bf16 paths and software-emulate at ~500-600x slower
   than fp32. Force fp32 compute via set_model_compute_dtype when the
   target is CPU; this keeps weights fp16 in memory and casts at use
   so peak memory does not double.

2. After running SelectModelDevice(gpu:N) and then activating
   MultiGPU CFG Split, only one GPU did real work even though both
   were loaded. create_multigpu_deepclones' reuse_loaded path matched
   the prior SelectModelDevice patcher (same clone_base_uuid, same
   device) but never set is_multigpu_base_clone, so the cond
   scheduler later filtered it out. Restrict reuse to clones that
   already carry the flag and always set it on the chosen patcher.

   Also fix a related sharp edge: extra-device selection used
   get_all_torch_devices(exclude_current=True), which assumes the
   primary lives on the process's current CUDA device. After
   SelectModelDevice(gpu:N) that is not true. Exclude the primary
   model's actual load_device instead.

Amp-Thread-ID: https://ampcode.com/threads/T-019e6131-7175-719e-ad94-df5d65507375
Co-authored-by: Amp <amp@ampcode.com>
2026-05-25 18:13:20 -07:00
..
audio_encoders Fix fp16 audio encoder models (#12811) 2026-03-06 18:20:07 -05:00
background_removal Add support for BiRefNet background remove model (CORE-46) (#12747) 2026-05-08 17:59:24 +08:00
cldm Add better error message for common error. (#10846) 2025-11-23 04:55:22 -05:00
comfy_types fix: use frontend-compatible format for Float gradient_stops (#12789) 2026-03-12 10:14:28 -07:00
extra_samplers Uni pc sampler now works with audio and video models. 2025-01-18 05:27:58 -05:00
image_encoders feat: Support MoGe (CORE-168) (#13878) 2026-05-15 10:34:56 +08:00
k_diffusion feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
ldm Merge branch 'master' into worksplit-multigpu 2026-05-24 17:46:43 -07:00
sd1_tokenizer Silence clip tokenizer warning. (#8934) 2025-07-16 14:42:07 -04:00
t2i_adapter Controlnet refactor. 2024-06-27 18:43:11 -04:00
taesd Add high quality preview support for Flux2 latents (#13496) 2026-04-29 19:37:30 -04:00
text_encoders Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
weight_adapter MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446) 2026-02-15 20:30:09 -05:00
bg_removal_model.py Fix BiRefNet issue (#13966) 2026-05-19 05:03:22 +08:00
cli_args.py Merge branch 'master' into worksplit-multigpu 2026-05-24 17:46:43 -07:00
clip_config_bigg.json Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
clip_model.py Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_config_g.json Add support for clip g vision model to CLIPVisionLoader. 2023-08-18 11:13:29 -04:00
clip_vision_config_h.json Add support for unCLIP SD2.x models. 2023-04-01 23:19:15 -04:00
clip_vision_config_vitl_336_llava.json Support llava clip vision model. 2025-03-06 00:24:43 -05:00
clip_vision_config_vitl_336.json support clip-vit-large-patch14-336 (#4042) 2024-07-17 13:12:50 -04:00
clip_vision_config_vitl.json Add support for unCLIP SD2.x models. 2023-04-01 23:19:15 -04:00
clip_vision_siglip2_base_naflex.json Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_siglip_384.json Support new flux model variants. 2024-11-21 08:38:23 -05:00
clip_vision_siglip_512.json Support 512 siglip model. 2025-04-05 07:01:01 -04:00
clip_vision.py Reduce RAM usage, fix VRAM OOMs, and fix Windows shared memory spilling with adaptive model loading (#11845) 2026-02-01 01:01:11 -05:00
conds.py Cleanups to the last PR. (#12646) 2026-02-26 01:30:31 -05:00
context_windows.py feat: Context windows - add causal_window_fix to improve blending of context windows (CORE-100) (#13563) 2026-05-05 16:40:53 -07:00
controlnet.py Free QwenFunControlNet base_model reference in cleanup 2026-05-21 11:35:54 -07:00
deploy_environment.py Add deploy environment header (Comfy-Env) to partner node API calls (#13425) 2026-05-04 20:17:56 -07:00
diffusers_convert.py Remove useless code. 2025-01-24 06:15:54 -05:00
diffusers_load.py load_unet -> load_diffusion_model with a model_options argument. 2024-08-12 23:20:57 -04:00
float.py feat: Support mxfp8 (#12907) 2026-03-14 18:36:29 -04:00
gligen.py Remove some useless code. (#8812) 2025-07-06 07:07:39 -04:00
hooks.py Fix typos (#10986) 2026-05-08 17:14:45 +08:00
latent_formats.py Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
lora_convert.py Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
lora.py Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802) 2026-05-20 17:03:58 -07:00
memory_management.py memory_management: replace thread refusal with mutex 2026-05-23 01:00:30 +10:00
model_base.py Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
model_detection.py Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
model_management.py Merge branch 'master' into worksplit-multigpu 2026-05-24 17:46:43 -07:00
model_patcher.py multigpu: drop unused copy import; sync requirements.txt with master 2026-05-24 17:17:08 -07:00
model_prefetch.py prefetch: guard against no offload (#13703) 2026-05-04 12:56:05 -07:00
model_sampling.py feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
multigpu.py multigpu: fix CPU SelectModelDevice slowness + MGCS reuse stripping is_multigpu_base_clone 2026-05-25 18:13:20 -07:00
nested_tensor.py WIP way to support multi multi dimensional latents. (#10456) 2025-10-23 21:21:14 -04:00
ops.py Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802) 2026-05-20 17:03:58 -07:00
options.py Only parse command line args when main.py is called. 2023-09-13 11:38:20 -04:00
patcher_extension.py Merge branch 'master' into worksplit-multigpu 2025-10-15 17:33:02 -07:00
pinned_memory.py Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802) 2026-05-20 17:03:58 -07:00
pixel_space_convert.py Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
quant_ops.py Enable triton comfy kitchen via cli-arg (#12730) 2026-05-03 14:07:21 -04:00
rmsnorm.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
sample.py Initial work to make downscale_ratio_temporal work. (#13972) 2026-05-18 23:01:43 -04:00
sampler_helpers.py Merge remote-tracking branch 'origin/master' into merge-master-into-worksplit-multigpu 2026-05-19 21:43:51 -07:00
samplers.py Merge branch 'master' into worksplit-multigpu 2026-05-22 23:05:58 -07:00
sd1_clip_config.json Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
sd1_clip.py feat: Support Qwen3.5 text generation models (#12771) 2026-03-25 22:48:28 -04:00
sd.py multigpu: refactor deepclone_multigpu + register cached_patcher_init for CLIP/VAE; Select*Device retargets via deepclone 2026-05-23 19:11:48 -07:00
sdxl_clip.py Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803) 2025-04-25 19:36:00 -04:00
supported_models_base.py Fix some custom nodes. (#11134) 2025-12-05 18:25:31 -05:00
supported_models.py Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
utils.py Defer @pollockjj's tiled-VAE and UPSCALE_MODEL MultiGPU lanes (#14066) 2026-05-22 16:44:29 -07:00