EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-05-23 23:47:25 +08:00

Author	SHA1	Message	Date
John Pollock	4d3d68e473	Add tiled VAE lane to MultiGPU Work Units Some checks are pending Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details	2026-05-22 13:42:21 -05:00
John Pollock	74b0a826ea	Add UPSCALE_MODEL lane to MultiGPU CFG Split Introduce tiled_scale_multidim_multigpu in comfy/utils.py: a tile scheduler that dispatches per-device tile functions through the existing MultiGPUThreadPool and merges per-device CPU output buffers in deterministic key order. The worker only catches BaseException at the thread boundary to funnel errors to the main thread; bare torch.cuda.set_device and torch.cuda.synchronize calls inside the worker fail loud if the device is not CUDA, which is part of the primitive's contract. Add UPSCALE_MODEL input on the MultiGPU CFG Split node and an upscale-model descriptor deepclone helper in comfy/multigpu.py. Clones stay CPU-resident until execute time and are returned to CPU afterward. ImageUpscaleWithModel dispatches through tiled_scale_multidim_multigpu when a multigpu descriptor is attached; the single-device path runs unchanged when no clones are present.	2026-05-22 13:41:48 -05:00
Jedrzej Kosinski	dd85851efe	Prune inherited multigpu clones when max_gpus is lowered create_multigpu_deepclones cloned the existing 'multigpu' additional_models list verbatim and never pruned entries beyond limit_extra_devices. If a workflow was previously prepared for more GPUs, reducing max_gpus would leave stale clones attached and eligible for later scheduling. Replace the TODO block with a real prune that keeps only clones whose load_device is either the model's load_device or in limit_extra_devices, and re-match clones if anything was removed. Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5 Co-authored-by: Amp <amp@ampcode.com>	2026-05-20 16:46:45 -07:00
Jedrzej Kosinski	4b93c4360f	Implement persistent thread pool for multi-GPU CFG splitting (#13329 ) Some checks failed Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details Build package / Build Test (3.10) (push) Has been cancelled Details Build package / Build Test (3.11) (push) Has been cancelled Details Build package / Build Test (3.12) (push) Has been cancelled Details Build package / Build Test (3.13) (push) Has been cancelled Details Build package / Build Test (3.14) (push) Has been cancelled Details Replace per-step thread create/destroy in _calc_cond_batch_multigpu with a persistent MultiGPUThreadPool. Each worker thread calls torch.cuda.set_device() once at startup, preserving compiled kernel caches across diffusion steps. - Add MultiGPUThreadPool class in comfy/multigpu.py - Create pool in CFGGuider.outer_sample(), shut down in finally block - Main thread handles its own device batch directly for zero overhead - Falls back to sequential execution if no pool is available	2026-04-08 05:39:07 -07:00
Jedrzej Kosinski	407a5a656f	Rollback core of last commit due to weird behavior	2025-03-28 02:48:11 -05:00
kosinkadink1@gmail.com	9ce9ff8ef8	Allow chained MultiGPU Work Unit nodes to affect max_gpus present on ModelPatcher clone	2025-03-28 15:29:44 +08:00
Jedrzej Kosinski	6dca17bd2d	Satisfy ruff linting	2025-03-03 23:08:29 -06:00
Jedrzej Kosinski	093914a247	Made MultiGPU Work Units node more robust by forcing ModelPatcher clones to match at sample time, reuse loaded MultiGPU clones, finalize MultiGPU Work Units node ID and name, small refactors/cleanup of logging and multigpu-related code	2025-03-03 22:56:13 -06:00
Jedrzej Kosinski	eda866bf51	Extracted multigpu core code into multigpu.py, added load_balance_devices to get subdivision of work based on available devices and splittable work item count, added MultiGPU Options nodes to set relative_speed of specific devices; does not change behavior yet	2025-01-27 06:25:48 -06:00

9 Commits