Commit Graph

5430 Commits

Author SHA1 Message Date
John Pollock
4d3d68e473 Add tiled VAE lane to MultiGPU Work Units
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
2026-05-22 13:42:21 -05:00
John Pollock
74b0a826ea Add UPSCALE_MODEL lane to MultiGPU CFG Split
Introduce tiled_scale_multidim_multigpu in comfy/utils.py: a tile scheduler
that dispatches per-device tile functions through the existing
MultiGPUThreadPool and merges per-device CPU output buffers in deterministic
key order. The worker only catches BaseException at the thread boundary to
funnel errors to the main thread; bare torch.cuda.set_device and
torch.cuda.synchronize calls inside the worker fail loud if the device is
not CUDA, which is part of the primitive's contract.

Add UPSCALE_MODEL input on the MultiGPU CFG Split node and an upscale-model
descriptor deepclone helper in comfy/multigpu.py. Clones stay CPU-resident
until execute time and are returned to CPU afterward.

ImageUpscaleWithModel dispatches through tiled_scale_multidim_multigpu when
a multigpu descriptor is attached; the single-device path runs unchanged
when no clones are present.
2026-05-22 13:41:48 -05:00
Jedrzej Kosinski
b649502c9c Report all torch devices from /system_stats
Some checks failed
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Build package / Build Test (3.10) (push) Has been cancelled
Build package / Build Test (3.11) (push) Has been cancelled
Build package / Build Test (3.12) (push) Has been cancelled
Build package / Build Test (3.13) (push) Has been cancelled
Build package / Build Test (3.14) (push) Has been cancelled
The /system_stats endpoint was returning a hardcoded single-element
devices list built from get_torch_device(), which only reflects the
primary CUDA device. On multi-GPU systems this hides the additional
devices from frontends / tooling (the API surface that enables multigpu
support discovery). Switch to iterating get_all_torch_devices(), with
the primary device kept first so existing clients reading devices[0]
keep working.

(Worksplit-multigpu-only: get_all_torch_devices is the multigpu helper
introduced on this branch; master's /system_stats remains unchanged.)

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 13:04:54 -07:00
Jedrzej Kosinski
2ed396c769 Mark non-NVIDIA multigpu gaps with TODOs in _handle_batch
Two CodeRabbit findings from #7063 (#13 and #14) are deferred because
worksplit-multigpu's initial release scope is NVIDIA-only QA. Leave a
TODO at the unconditional torch.cuda.set_device call and at the
post-aggregation point so the required guards/synchronize are easy to
find when multigpu support is extended to XPU/NPU/MPS/CPU/DirectML.

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 12:47:43 -07:00
Kosinkadink
d0b9dbb5a6 Merge remote-tracking branch 'origin/master' into worksplit-multigpu
Brings in 18 commits from master so worksplit-multigpu does not regress
fixes that landed on main since the last sync:

- #13699 Hunyuan 3D 2.1 batch-size fixes (overlap with our own backport;
  conflict resolved in favor of the shape>=2 gate that binds
  swap_cfg_halves once and reuses it for the output swap-back)
- #14031 ModelPatcherDynamic lora reshape / backup restore fix
- #13802 Multi-threaded model load (memory_management / pinned_memory /
  model_management / aimdo plumbing)
- #12679 lanczos single-channel tensor fix
- #14010 Stable Audio 3 support
- assorted partner-node, openapi, workflow-template, and tooling updates

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 12:17:59 -07:00
Kosinkadink
fd79f22bdf Backport Hunyuan 3D 2.1 attention batch-size fixes from #13699
CrossAttention.kv.view and Attention.qkv_combined.view both hardcoded
batch=1 in the reshape, crashing or silently mis-shaping whenever the
actual batch dimension was greater than 1. These were fixed on master
in #13699 as part of the same patch that gated the chunk(2) swap, but
worksplit-multigpu only picked up the chunk(2) gate. Bring the two
view() fixes over so we have parity with master.

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 12:17:24 -07:00
Kosinkadink
019261ed96 Simplify Hunyuan 3D 2.1 swap_cfg_halves gate to a shape check
The previous gate (len(cond_or_uncond) == 2 and set == {0, 1}) was
intended to skip the cond/uncond swap when only one half was present
under MultiGPU CFG Split, but it was too restrictive: it also skipped
batch_size > 1 + CFG (cond_or_uncond like [0, 0, 1, 1] or [0,0,0,0,
1,1,1,1]), where chunk(2) still splits the batch cleanly into a cond
half and an uncond half and the swap is still required.

Switch to context.shape[0] >= 2, matching the parallel fix landed on
master in #13699. The swap is a permutation-invariant no-op when the
two halves don't form a CFG pair (since the output swap_cfg_halves
block immediately undoes the permutation), so the only thing the gate
actually needs to do is guard against chunk(2) on a batch of one.

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 12:14:02 -07:00
Alexander Piskun
b293f8cefd
[Partner Nodes] add widget for automatic upscaling for the ByteDance2Reference node (#14032)
Signed-off-by: bigcat88 <bigcat88@icloud.com>
2026-05-21 11:58:03 -07:00
Daxiong (Lin)
2ca1480f91
chore: update workflow templates to v0.9.82 (#14034) 2026-05-21 11:48:20 -07:00
Kosinkadink
822a3ecf73 Note _calc_cond_batch and _calc_cond_batch_multigpu must stay in sync
Per review feedback on #7063. The two functions share the conds-by-hooks
accumulation, memory-fit batching, and per-chunk output aggregation; the
multigpu variant adds per-device scheduling, .to(device) placement,
per-device patcher/control lookup, and thread-pool dispatch around the
inner loop. Documenting the relationship without extracting helpers --
extraction can land after the initial worksplit-multigpu release once
both paths have settled.

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 11:47:53 -07:00
Jedrzej Kosinski
1417b711ce
Fix CodeRabbit findings in worksplit-multigpu (#14017)
Fix CodeRabbit findings in worksplit-multigpu
2026-05-21 11:42:08 -07:00
Kosinkadink
a18dd219d5 Pass per-device model to multigpu control clones in pre_run_control
QwenFunControlNet.pre_run stashes model.diffusion_model into extra_args,
which the control_model then uses for forward passes (img_in, txt_in,
pe_embedder, time_text_embed). With multigpu, every per-device control
clone was being pre_run with the base model on GPU0, so secondary
devices would invoke those modules with parameters on GPU0 and inputs
on their own device, raising 'Expected all tensors to be on the same
device'. Build a device -> per-device BaseModel lookup from the
patcher's additional multigpu models and pass each clone the model on
its own device. Falls back to the base model when no per-device match
is found (single-GPU path and the case where cnet.multigpu_clones lags
the patcher's clone set).

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 11:40:49 -07:00
Alexander Piskun
6ecf5eca7a
[Partner Nodes] add OpenRouter LLM node (#14007)
* [Partner Nodes] add reasoning widget to Anthropic node

Signed-off-by: bigcat88 <bigcat88@icloud.com>

* [Partner Nodes] add new OpenRouterLLM node

Signed-off-by: bigcat88 <bigcat88@icloud.com>

* [Partner Nodes] fix passing images to Grok LLM

Signed-off-by: bigcat88 <bigcat88@icloud.com>

---------

Signed-off-by: bigcat88 <bigcat88@icloud.com>
2026-05-21 11:36:11 -07:00
Kosinkadink
963621603c Free QwenFunControlNet base_model reference in cleanup
QwenFunControlNet.pre_run stashes the model's diffusion_model into
self.extra_args['base_model'], but ControlBase.cleanup never clears
extra_args. The diffusion_model reference therefore lingered between
sampling runs, blocking ComfyUI's model offload/eviction logic from
freeing the UNet and -- for multigpu -- holding one such reference per
per-device control clone (defeating the max_gpus pruning added in this
PR). Override cleanup to drop the entry; super().cleanup() already
recurses into multigpu_clones so each per-device clone pops its own.

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 11:35:54 -07:00
Kosinkadink
adde1239b1 Restore prepare_state backward-compatible signature
Drop the new ignore_multigpu positional argument from prepare_state and
from the ON_PREPARE_STATE callbacks; pass the flag via model_options
instead. This restores the original 3-arg callback signature so existing
custom-node ON_PREPARE_STATE handlers keep working unchanged, while
still letting prepare_state's recursive call into multigpu_clones
short-circuit.

Amp-Thread-ID: https://ampcode.com/threads/T-019e4a00-fe3d-76bd-a2f2-a8c8c4040082
Co-authored-by: Amp <amp@ampcode.com>
2026-05-21 11:35:39 -07:00
rattus
03e511862e
Fix reshaping lora application (#14031)
* ModelPatcherDyanmic: purge stale vbar allocs on force cast

* ModelPatcherDynamic: restore backups before load

If doing a clean reload, mutative changes (lora application) could be
applied on-top of the already loaded weight. Restore from backup
unconditionally so that the new load is clean.
2026-05-21 09:47:16 -07:00
Edoardo Carmignani
aab41a9ddb
fix(lanczos): correct dimension transposition for single-channel tensors (#12679) 2026-05-21 23:47:20 +08:00
Alexis Rolland
4259a0c7c3
Update MoGe nodes display names, search aliases and descriptions (#14030)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
2026-05-21 16:50:09 +08:00
Alexis Rolland
af3d9b60af
chore: Dataset nodes clean-up (CORE-237) (#14002) 2026-05-21 15:14:16 +08:00
Alexis Rolland
7b7c5fed7c
Update MediaPipe nodes to standardize with existing code base (CORE-242) (#14025) 2026-05-21 14:39:30 +08:00
Matt Miller
1668aaf037
openapi: remove cloud-only job_ids query param from GET /api/assets (#14016)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Build package / Build Test (3.10) (push) Waiting to run
Build package / Build Test (3.11) (push) Waiting to run
Build package / Build Test (3.12) (push) Waiting to run
Build package / Build Test (3.13) (push) Waiting to run
Build package / Build Test (3.14) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
The job_ids query parameter on GET /api/assets is tagged x-runtime:
[cloud] and only exists for cloud's variant of this endpoint. Cloud
removed all consumers and the cloud-side handler/codegen/tests in
Comfy-Org/cloud#3778. With cloud no longer accepting this parameter,
the [cloud-only] documentation here is wrong — drop it so the daily
sync to cloud/services/ingest/vendor/openapi.yaml propagates the
removal.
2026-05-20 21:32:08 -07:00
Matt Miller
ea174d3f12
fix(openapi): correct POST /api/assets/import to importPublishedAssets (#14027)
The operation at POST /api/assets/import was defined as `importAssets`
with a URL-list body shape, but no runtime actually serves that
operation at this path. The cloud runtime serves a different operation
here — `importPublishedAssets` — which imports published-workflow
assets into the caller's library by ID, not by URL.

Cloud's URL-based asset ingestion lives at separate paths
(POST /assets/download + GET /assets/remote-metadata) tracked
elsewhere; nothing in this PR affects that work.

Changes:

- Replace the operation at POST /api/assets/import with
  `importPublishedAssets`, taking ImportPublishedAssetsRequest
  (published_asset_ids + optional share_id) and returning
  ImportPublishedAssetsResponse (list of AssetInfo).
- Remove the unused AssetImportRequest component schema (no other
  references in the spec).
- Operation and schemas tagged x-runtime: [cloud] with [cloud-only]
  description prefix, matching the existing convention for
  cloud-runtime-only operations elsewhere in the spec.

Spectral lint passes (0 errors); the two hint-level findings on
the spec are pre-existing and unrelated.

No FE consumer references AssetImportRequest today; this is a pure
spec correction to match what the cloud runtime actually serves.
2026-05-20 21:28:16 -07:00
Matt Miller
9f9b32ed97
feat: add OAuth 2.1 + RFC 7591 DCR endpoints to openapi.yaml (#14026)
Add the OAuth 2.1 authorization flow and RFC 7591 Dynamic Client
Registration endpoints to the shared spec, alongside the existing
auth-tagged operations (/api/auth/session, /api/auth/token,
/.well-known/jwks.json). All tagged x-runtime: [cloud] with a
[cloud-only] description prefix, following the established
convention for cloud-runtime-only operations.

Endpoints:

- GET  /.well-known/oauth-authorization-server  (RFC 8414 metadata)
- GET  /.well-known/oauth-protected-resource    (RFC 9728 metadata)
- GET  /oauth/authorize                         (consent challenge)
- POST /oauth/authorize                         (consent submission)
- POST /oauth/token                             (RFC 6749 §3.2)
- POST /oauth/register                          (RFC 7591 §3.1 DCR)

Component schemas added:

- OAuthAuthorizationServerMetadata
- OAuthProtectedResourceMetadata
- OAuthConsentChallenge, OAuthConsentChallengeWorkspace
- OAuthAuthorizeRedirectResponse
- OAuthTokenResponse, OAuthTokenError
- OAuthRegisterRequest, OAuthRegisterResponse, OAuthRegisterError

These endpoints are implemented in the cloud runtime today and
are called by browser frontends rendering the consent UI and by
MCP-spec-compliant clients (Claude Desktop, Cursor, etc.) doing
auto-discovery + self-registration. Documenting them in the
shared spec lets the cloud frontend generate types directly from
this spec instead of maintaining a parallel definition.

Spectral lints clean (0 errors). The hint-level findings on
OAuthTokenError / OAuthRegisterError ("standard error schema")
match the same hint on CloudError — these are protocol-specific
RFC-shaped errors, not generic application errors.
2026-05-20 21:22:12 -07:00
Jedrzej Kosinski
4d9106dced Document --cuda-device comma format and MultiGPU Options relative_speed gap
Two doc-only changes addressing minor CodeRabbit findings on PR #7063:

* cli_args.py: clarify --cuda-device help text to document the required comma-separated format ('0' or '0,1'), matching how the value is consumed by CUDA_VISIBLE_DEVICES in main.py.

* nodes_multigpu.py: add a docstring NOTE on the (currently unregistered) MultiGPUOptionsNode explaining that its relative_speed input is plumbed through to model_options['multigpu_options'] but is not yet consulted by the cond scheduler, which still uses uniform round-robin via next_available_device(). Wire relative_speed into the scheduler before re-enabling the node.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 20:48:59 -07:00
Jedrzej Kosinski
ac0a90c323 Use cond_shapes in multigpu memory-fit check (parity with single-GPU path)
The multigpu cond-batching loop called model.memory_required(input_shape) without conditioning shapes, while the single-GPU path at line 279 passes cond_shapes. Large conditioning tensors (e.g. video prompts, control inputs) were therefore under-counted, risking OOM at runtime when the chosen batch size was too large. Match the single-GPU pattern by building cond_shapes from each batched cond's conditioning dict and passing it to memory_required.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 19:52:03 -07:00
comfyanonymous
95fdc6cf91
Repo security stuff. (#14019) 2026-05-20 17:17:55 -07:00
rattus
5aa5ccc9e0
Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802)
* model_management: disable non-dynamic smart memory

Disable smart memory outright for non dynamic models.

This is a minor step towards deprecation of --disable-dynamic-vram
and the legacy ModelPatcher.

This is needed for estimate-free model development, where new models
can opt-out of supplying a memory estimate and not have to worry
about hard VRAM allocations due to legacy non-dynamic model patchers

This is also a general stability increase for a lot of stray use cases
where estimates may still be off and going forward we are not going
to accurately maintain such estimates.

* pinned_memory: implement with aimdo growable buffer

Use a single growable buffer so we can do threaded pre-warming on
pinned memory.

* mm: use aimdo to do transfer from disk to pin

Aimdo implements a faster threaded loader.

* Add stream host pin buffer for AIMDO casts

Introduce per-offload-stream HostBuffer reuse for pinned staging,
include it in cast buffer reset synchronization.

Defer actual casts that go via this pin path to a separate pass
such that the buffer can be allocated monolithically (to avoid
cudaHostRegister thrash).

* remove old pin path

* Implement JIT pinned memory pressure

Replace the predictive pin pressure mechanism with JIT PIN memory
pressure.

* LowVRAMPatch: change to two-phase visit

* lora: re-implement as inplace swiss-army-knife operation

* prepare for multiple pin sets

* implement pinned loras

* requirements: comfy-aimdo 0.4.0

* ops: remove unused arg

This was defeatured in aimdo iteration

* ops: sync the CPU with only the offload stream activity

This was syncing with the offload stream which itself is synced with the
compute stream, so this was syncing CPU with compute transitively. Define
the event to sync it more gently.

* pins: implement freeing intermediate for pinned memory

Pinning is more important than inactive intermediates and the stream
pin buffer is more important than even active intermediates.

* execution: implement pin eviction on RAM presure

Add back proper pin freeing on RAM pressure

* implement pin registration swaps

Uncap the windows pins from 50% by extending the pool and have a pressure
mechanism to move the pin reservations om demand.

This unfortunately implies a GPU sync to do the freeing so significant
hysterisis needs to be added to consolidate these pressure events.

* cli_args/execution: Implement lower background cache-ram threshold

Limit the amount of RAM background intermediates can use, so that
switching workflows doesn't degrade performance too much.

* make default

* bump aimdo

* model-patcher: force-cast tiny weights

Flux 2 gets crazy stalls due to a mix of tiny and giant weights
creating lopsided steam buffer rotations which creates stalls.

* ops: refactor in prep for chunking

* mm: delegate pin-on-the-way to aimdo

Aimdo is able to chunk and slice this on the way for better CPU->GPU
overlap. The main advantage is the ability to shorten the bus contention
window between previous weight transfer and the next weights vbar
fault.

* bump aimdo

* pinning updates

* specify hostbuf max allocation size

There a signs of virtual memory exhaustion on some linux systems when
throwing 128GB for every little piece. Pass the actual to save aimdo
from over-estimates

* tests: update execution tests for caching

The default caching changed to ram-cache so update these tests
accordingly.

Remove the LRU 0 test as this also falls through to RAM cache.
2026-05-20 17:03:58 -07:00
Jedrzej Kosinski
dd85851efe Prune inherited multigpu clones when max_gpus is lowered
create_multigpu_deepclones cloned the existing 'multigpu' additional_models list verbatim and never pruned entries beyond limit_extra_devices. If a workflow was previously prepared for more GPUs, reducing max_gpus would leave stale clones attached and eligible for later scheduling. Replace the TODO block with a real prune that keeps only clones whose load_device is either the model's load_device or in limit_extra_devices, and re-match clones if anything was removed.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 16:46:45 -07:00
Jedrzej Kosinski
ba417750a7 Fix get_all_torch_devices for XPU/NPU and guard remove()
torch.device(i) defaults to CUDA, so XPU/NPU branches were producing 'cuda:N' devices that don't match get_torch_device() output ('xpu:N'/'npu:N'). This caused devices.remove(get_torch_device()) to raise ValueError when exclude_current=True on non-NVIDIA hardware. Use explicit device strings, and guard the remove() with a membership check for safety.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 16:46:38 -07:00
Jedrzej Kosinski
9a681ccfc9 Guard cached_patcher_init when output_model is False
load_checkpoint_guess_config_clip_only() calls load_checkpoint_guess_config() with output_model=False, leaving out[0] as None. The subsequent unconditional assignment of cached_patcher_init crashed with AttributeError, breaking CLIP-only checkpoint loading entirely. Guard the assignment with a None check.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 16:46:31 -07:00
Jedrzej Kosinski
50d1dd6273 Fix MultiGPU Options node discarding cloned GPUOptionsGroup
GPUOptionsGroup.clone() returns a new instance, but the return value was discarded, causing the node to mutate the upstream caller's group in-place. When multiple MultiGPU Options nodes share an input group, each node's additions would leak into earlier siblings. Assign the clone result back to gpu_options so each node owns its own copy.

Amp-Thread-ID: https://ampcode.com/threads/T-019e43b8-8258-70fd-ab3a-53e4c97f85d5
Co-authored-by: Amp <amp@ampcode.com>
2026-05-20 16:46:23 -07:00
Jukka Seppänen
4d6a058bf1
feat: MediaPipe face detection (CORE-235) (#14009)
* Initial mediapipe face detection support

* Update face_geometry.py

* Account for diff sized batch input

* Model folder placeholder
2026-05-20 16:07:48 -07:00
comfyanonymous
a8d2519058 ComfyUI v0.22.0
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Build package / Build Test (3.10) (push) Waiting to run
Build package / Build Test (3.11) (push) Waiting to run
Build package / Build Test (3.12) (push) Waiting to run
Build package / Build Test (3.13) (push) Waiting to run
Build package / Build Test (3.14) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
2026-05-20 13:49:36 -04:00
Daxiong (Lin)
4efe1ddb5c
chore: update workflow templates to v0.9.79 (#14011) 2026-05-20 23:46:20 +08:00
comfyanonymous
f9c84c94b4
Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
Cezarijus Kivylius
78b5dec6b6
fix: Hunyuan3D 2.1 batch size crashes in attention and forward pass (#13699)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
2026-05-20 19:58:49 +08:00
Jedrzej Kosinski
ff766e5cfa Merge remote-tracking branch 'origin/master' into merge-master-into-worksplit-multigpu
Some checks failed
Python Linting / Run Ruff (push) Has been cancelled
Python Linting / Run Pylint (push) Has been cancelled
Build package / Build Test (3.10) (push) Has been cancelled
Build package / Build Test (3.11) (push) Has been cancelled
Build package / Build Test (3.12) (push) Has been cancelled
Build package / Build Test (3.13) (push) Has been cancelled
Build package / Build Test (3.14) (push) Has been cancelled
Amp-Thread-ID: https://ampcode.com/threads/T-019e4352-d45e-75bc-8ed7-ed3a7f6d129a
Co-authored-by: Amp <amp@ampcode.com>

# Conflicts:
#	comfy/ldm/sam3/detector.py
#	comfy/ldm/sam3/tracker.py
#	comfy/model_base.py
#	comfy/quant_ops.py
#	comfy/supported_models.py
#	comfy_api_nodes/apis/bytedance.py
#	comfy_api_nodes/nodes_bytedance.py
#	comfy_api_nodes/nodes_openai.py
#	comfy_extras/frame_interpolation_models/film_net.py
#	comfy_extras/frame_interpolation_models/ifnet.py
#	comfy_extras/nodes_ace.py
#	comfy_extras/nodes_frame_interpolation.py
#	comfy_extras/nodes_lt_audio.py
#	comfy_extras/nodes_sam3.py
#	comfy_extras/nodes_video_model.py
#	folder_paths.py
#	nodes.py
#	requirements.txt
2026-05-19 21:43:51 -07:00
Jedrzej Kosinski
819c7c0702
Refactor MultiGPU scheduler for readability and termination safety (#14001)
Behaviour-equivalent cleanup of _calc_cond_batch_multigpu device
scheduling. No change to batching decisions or memory checks for any
valid input.

Changes:

* Replace re-summed batched_to_run_length with a per-device load
  dict (device_load), so capacity checks are O(1) and use a single
  source of truth.
* Extract device selection into next_available_device(), which scans
  at most len(devices) positions and raises if no device has
  remaining capacity. This makes the 'skip a full device' rule live
  in one place instead of two and guarantees the outer while loop
  cannot spin forever on a scheduling bug.
* Drop the unused current_device assignment before the outer loop
  and the index_device % len(devices) modulo dance (now handled
  inside next_available_device).
* Minor cleanups: list comprehensions for total_conds, conds_to_batch,
  and the devices list.
2026-05-19 21:23:56 -07:00
comfyanonymous
72e3f6081c
Add downscale ratio to empty ltxv latent. (#13999)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
2026-05-19 20:28:06 -07:00
Jedrzej Kosinski
9e3ede1406
Fix MultiGPU scheduler capacity accounting (#14000)
Fixes _calc_cond_batch_multigpu so that:

1. conds_per_device uses real division before math.ceil. The previous
   expression math.ceil(total_conds // len(devices)) applied integer
   floor division first, making ceil a no-op. For 3 conds across 2
   devices this produced conds_per_device=1 instead of 2.

2. The scheduling loop skips devices that have already reached
   capacity instead of appending empty batch groups. Without this
   guard, the loop could repeatedly emit zero-length groups for a
   full device, leaving sampling stuck at 0/N until timeout.

Reproduces with an Omnigen2 image workflow that produces three
condition entries scheduled across two CUDA devices. With the fix
the scheduler assigns conds_per_device=2 and splits the batches as
2 + 1 across the two devices, allowing sampling to complete.

Original fix authored and validated by @pollockjj in
pollockjj/ComfyUI#64.

Co-authored-by: John Pollock <pollockjj@gmail.com>
2026-05-19 20:11:53 -07:00
Pauan
7ec7b6ffe9
Adding new StringFormat node (#13997) 2026-05-20 10:25:49 +08:00
Matt Miller
6887165a9d
docs(openapi): tighten workspace API key description field (BE-1004) (#13996)
Aligns the OSS spec with the cloud-side BE-1004 contract:

- createWorkspaceApiKey request body: add maxLength: 5000 to the
  description property (matches cloud's hub_profile.description
  MaxLen(5000) convention; enforced cloud-side via handler check).
- WorkspaceApiKey + WorkspaceApiKeyCreated response schemas:
  mark description as required (cloud's handler always populates
  the field, defaulting to empty string when not supplied on create),
  drop nullable: true, add maxLength: 5000 for symmetry, and clarify
  the doc string ("Always present in responses; empty string when no
  description was supplied on create").

Both schemas are tagged x-runtime: [cloud] at the schema level so the
tightening is correctly scoped — OSS-only implementations are not
required to honor the workspace API keys endpoints at all.

Related cloud PR: Comfy-Org/cloud#3747
2026-05-19 16:55:04 -07:00
Matt Miller
cc4d711eb1
feat(openapi): add optional description field to workspace API key schemas (#13993)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
* feat(openapi): add optional description field to workspace API key schemas

Add an optional `description` property (type: string) to three
workspace API key schemas in openapi.yaml:

- Inline request body of createWorkspaceApiKey (POST /api/workspace/api-keys)
- WorkspaceApiKey (list/info schema)
- WorkspaceApiKeyCreated (creation response schema)

The field is not added to any `required` array, making it fully
backward-compatible with existing clients.

Refs: BE-1005, BE-1004

Co-authored-by: Matt Miller <mattmillerai@users.noreply.github.com>

* fix(openapi): mark description nullable in workspace API key response schemas

Per CodeRabbit review on PR #13993: the underlying DB column is nullable
varchar (default ''), so the response schemas should permit null to match
stored data reality. Without nullable: true the OpenAPI contract would
require coercion on the handler side or risk a contract violation.

Request schema unchanged — clients shouldn't be sending null on create.
2026-05-19 14:48:47 -07:00
yy
626b082838
Fix typo in ops.py (#11925) 2026-05-20 05:45:04 +08:00
Matt Miller
d0328b442d
docs(openapi): remove top-level width/height fields on Asset schema (#13973)
These two fields were added recently to the Asset schema as nullable
integers, with the intent of exposing original image dimensions for FE
consumers (cloud-side thumbnailing makes naturalWidth/Height return
the wrong size for an image card's dimension label).

The implementation effort that consumes them subsequently converged on
a different shape — dimensions nested under the existing free-form
`metadata` JSON field as `{kind: "image", width, height}` — to avoid
introducing type-specific flat fields on the canonical Asset shape,
and to leave room for forward-compatible additions (video duration,
fps, etc.) without further schema churn.

This removes the now-unused top-level fields so the spec reflects the
agreed direction. No other schema definitions reference these fields
directly: AssetCreated, AssetUpdated, etc. inherit Asset via allOf and
do not redefine them.

The runtime ingest implementation that would have populated these
fields was not yet shipped, so no clients are relying on the
top-level shape.

Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com>
2026-05-19 10:00:26 -07:00
Matt Miller
6b61918a16
docs(openapi): deprecate /api/upload/mask in favor of /api/upload/image (#13968)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
Mark the uploadMask operation as deprecated and point clients at
/api/upload/image. The mask-compositing behavior the endpoint provides
(alpha-compositing the supplied mask onto an original_ref image) is now
expected to happen client-side, with the composited result uploaded
through the unified /api/upload/image path.

The endpoint continues to function for older clients; no runtime
behavior changes ship with this commit. Only the OpenAPI annotation
and the human-facing description are updated.
2026-05-19 12:19:51 +08:00
comfyanonymous
a4382e056e
Use temporal downscale to make empty audio latent nodes more reusable. (#13975) 2026-05-19 00:14:30 -04:00
Alexis Rolland
d71cc1c8f2
chore: Various QoL updates of nodes display names, descriptions and categories (CORE-190, CORE-191) (#13830)
* Move detection category under image category

* Add missing categories

* Move detection nodes to detection category

* Move save nodes to image root catefory

* Rename postprocessors

* Move mask category under image

* Move guiders category to parent level at root of sampling category

* Move custom_sampling category to parent level at the root of sampling category

* Modify description of LoRA loaders

* Fix node id SolidMask

* Move VOID Quadmask under image/mask

* Group compositing nodes under image/compositing

* Move load image as mask to image category for consistency with other load image nodes

* Align display name with Load Checkpoint

* Move dataset category under training category

* Rename Number Convert to Conver Number (verb first)

* Rename Canny node

* Revert wanBlockSwap + description

* Add description to RemoveBackground node

* Revert category update of dataset
2026-05-19 00:13:48 -04:00
comfyanonymous
990a7ae7f2
Initial work to make downscale_ratio_temporal work. (#13972) 2026-05-18 23:01:43 -04:00
Jedrzej Kosinski
df2454b47e
Reduce min for Batch Image/Mask/Latent nodes from 2 to 1 (#13721) 2026-05-19 09:50:14 +08:00