mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-04-25 18:02:37 +08:00
* Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2) Amp-Thread-ID: https://ampcode.com/threads/T-019da964-2cc8-77f9-9aae-23f65da233db Co-authored-by: Amp <amp@ampcode.com> * Add GPU device selection to all loader nodes - Add get_gpu_device_options() and resolve_gpu_device_option() helpers in model_management.py for vendor-agnostic GPU device selection - Add device widget to CheckpointLoaderSimple, UNETLoader, VAELoader - Expand device options in CLIPLoader, DualCLIPLoader, LTXAVTextEncoderLoader from [default, cpu] to include gpu:0, gpu:1, etc. on multi-GPU systems - Wire load_diffusion_model_state_dict and load_state_dict_guess_config to respect model_options['load_device'] - Graceful fallback: unrecognized devices (e.g. gpu:1 on single-GPU) silently fall back to default Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a Co-authored-by: Amp <amp@ampcode.com> * Add VALIDATE_INPUTS to skip device combo validation for workflow portability When a workflow saved on a 2-GPU machine (with device=gpu:1) is loaded on a 1-GPU machine, the combo validation would reject the unknown value. VALIDATE_INPUTS with the device parameter bypasses combo validation for that input only, allowing resolve_gpu_device_option to handle the graceful fallback at runtime. Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a Co-authored-by: Amp <amp@ampcode.com> * Set CUDA device context in outer_sample to match model load_device Custom CUDA kernels (comfy_kitchen fp8 quantization) use torch.cuda.current_device() for DLPack tensor export. When a model is loaded on a non-default GPU (e.g. cuda:1), the CUDA context must match or the kernel fails with 'Can't export tensors on a different CUDA device index'. Save and restore the previous device around sampling. Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a Co-authored-by: Amp <amp@ampcode.com> * Fix code review bugs: negative index guard, CPU offload_device, checkpoint te_model_options - resolve_gpu_device_option: reject negative indices (gpu:-1) - UNETLoader: set offload_device when cpu is selected - CheckpointLoaderSimple: pass te_model_options for CLIP device, set offload_device for cpu, pass load_device to VAE - load_diffusion_model_state_dict: respect offload_device from model_options - load_state_dict_guess_config: respect offload_device, pass load_device to VAE Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a Co-authored-by: Amp <amp@ampcode.com> * Fix CUDA device context for CLIP encoding and VAE encode/decode Add torch.cuda.set_device() calls to match model's load device in: - CLIP.encode_from_tokens: fixes 'Can't export tensors on a different CUDA device index' when CLIP is loaded on a non-default GPU - CLIP.encode_from_tokens_scheduled: same fix for the hooks code path - CLIP.generate: same fix for text generation - VAE.decode: fixes VAE decoding on non-default GPU - VAE.encode: fixes VAE encoding on non-default GPU Same pattern as the existing outer_sample fix in samplers.py - saves and restores previous CUDA device in a try/finally block. Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57 Co-authored-by: Amp <amp@ampcode.com> * Extract cuda_device_context manager, fix tiled VAE methods Add model_management.cuda_device_context() — a context manager that saves/restores torch.cuda.current_device when operating on a non-default GPU. Replaces 6 copies of the manual save/set/restore boilerplate. Refactored call sites: - CLIP.encode_from_tokens - CLIP.encode_from_tokens_scheduled (hooks path) - CLIP.generate - VAE.decode - VAE.encode - samplers.outer_sample Bug fixes (newly wrapped): - VAE.decode_tiled: was missing device context entirely, would fail on non-default GPU when called from 'VAE Decode (Tiled)' node - VAE.encode_tiled: same issue for 'VAE Encode (Tiled)' node Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57 Co-authored-by: Amp <amp@ampcode.com> * Restore CheckpointLoaderSimple, add CheckpointLoaderDevice Revert CheckpointLoaderSimple to its original form (no device input) so it remains the simple default loader. Add new CheckpointLoaderDevice node (advanced/loaders) with separate model_device, clip_device, and vae_device inputs for per-component GPU placement in multi-GPU setups. Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57 Co-authored-by: Amp <amp@ampcode.com> --------- Co-authored-by: Amp <amp@ampcode.com> |
||
|---|---|---|
| .. | ||
| chainner_models | ||
| nodes_ace.py | ||
| nodes_advanced_samplers.py | ||
| nodes_align_your_steps.py | ||
| nodes_apg.py | ||
| nodes_attention_multiply.py | ||
| nodes_audio_encoder.py | ||
| nodes_audio.py | ||
| nodes_camera_trajectory.py | ||
| nodes_canny.py | ||
| nodes_cfg.py | ||
| nodes_chroma_radiance.py | ||
| nodes_clip_sdxl.py | ||
| nodes_color.py | ||
| nodes_compositing.py | ||
| nodes_cond.py | ||
| nodes_context_windows.py | ||
| nodes_controlnet.py | ||
| nodes_cosmos.py | ||
| nodes_curve.py | ||
| nodes_custom_sampler.py | ||
| nodes_dataset.py | ||
| nodes_differential_diffusion.py | ||
| nodes_easycache.py | ||
| nodes_edit_model.py | ||
| nodes_eps.py | ||
| nodes_flux.py | ||
| nodes_freelunch.py | ||
| nodes_fresca.py | ||
| nodes_gits.py | ||
| nodes_glsl.py | ||
| nodes_hidream.py | ||
| nodes_hooks.py | ||
| nodes_hunyuan3d.py | ||
| nodes_hunyuan.py | ||
| nodes_hypernetwork.py | ||
| nodes_hypertile.py | ||
| nodes_image_compare.py | ||
| nodes_images.py | ||
| nodes_ip2p.py | ||
| nodes_kandinsky5.py | ||
| nodes_latent.py | ||
| nodes_load_3d.py | ||
| nodes_logic.py | ||
| nodes_lora_debug.py | ||
| nodes_lora_extract.py | ||
| nodes_lotus.py | ||
| nodes_lt_audio.py | ||
| nodes_lt_upsampler.py | ||
| nodes_lt.py | ||
| nodes_lumina2.py | ||
| nodes_mahiro.py | ||
| nodes_mask.py | ||
| nodes_math.py | ||
| nodes_mochi.py | ||
| nodes_model_advanced.py | ||
| nodes_model_downscale.py | ||
| nodes_model_merging_model_specific.py | ||
| nodes_model_merging.py | ||
| nodes_model_patch.py | ||
| nodes_morphology.py | ||
| nodes_multigpu.py | ||
| nodes_nag.py | ||
| nodes_nop.py | ||
| nodes_number_convert.py | ||
| nodes_optimalsteps.py | ||
| nodes_pag.py | ||
| nodes_painter.py | ||
| nodes_perpneg.py | ||
| nodes_photomaker.py | ||
| nodes_pixart.py | ||
| nodes_post_processing.py | ||
| nodes_preview_any.py | ||
| nodes_primitive.py | ||
| nodes_qwen.py | ||
| nodes_rebatch.py | ||
| nodes_replacements.py | ||
| nodes_resolution.py | ||
| nodes_rope.py | ||
| nodes_rtdetr.py | ||
| nodes_sag.py | ||
| nodes_sd3.py | ||
| nodes_sdpose.py | ||
| nodes_sdupscale.py | ||
| nodes_slg.py | ||
| nodes_stable3d.py | ||
| nodes_stable_cascade.py | ||
| nodes_string.py | ||
| nodes_tcfg.py | ||
| nodes_textgen.py | ||
| nodes_tomesd.py | ||
| nodes_toolkit.py | ||
| nodes_torch_compile.py | ||
| nodes_train.py | ||
| nodes_upscale_model.py | ||
| nodes_video_model.py | ||
| nodes_video.py | ||
| nodes_wan.py | ||
| nodes_wanmove.py | ||
| nodes_webcam.py | ||
| nodes_zimage.py | ||