Commit Graph

4644 Commits

Author SHA1 Message Date
rattus
8646bd96ef
Merge 12e1560dcc into 26c5bbb875 2026-01-25 05:05:38 +01:00
comfyanonymous
26c5bbb875
Move nodes from previous PR into their own file. (#12066)
Some checks failed
Python Linting / Run Ruff (push) Has been cancelled
Python Linting / Run Pylint (push) Has been cancelled
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Has been cancelled
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Has been cancelled
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Has been cancelled
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Has been cancelled
Execution Tests / test (macos-latest) (push) Has been cancelled
Generate Pydantic Stubs from api.comfy.org / generate-models (push) Has been cancelled
Execution Tests / test (ubuntu-latest) (push) Has been cancelled
Execution Tests / test (windows-latest) (push) Has been cancelled
Test server launches without errors / test (push) Has been cancelled
Unit Tests / test (macos-latest) (push) Has been cancelled
Unit Tests / test (ubuntu-latest) (push) Has been cancelled
Unit Tests / test (windows-2022) (push) Has been cancelled
2026-01-24 23:02:32 -05:00
Kohaku-Blueleaf
a97c98068f
[Weight-adapter/Trainer] Bypass forward mode in Weight adapter system (#11958)
* Add API of bypass forward module

* bypass implementation

* add bypass fwd into nodes list/trainer
2026-01-24 22:56:22 -05:00
comfyanonymous
635406e283
Only enable fp16 on z image models that actually support it. (#12065) 2026-01-24 22:32:28 -05:00
pythongosssss
ed6002cb60
add support for kwargs inputs to allow arbitrary inputs from frontend (#12063)
used to output selected combo index

Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-01-24 17:30:40 -08:00
Alexander Piskun
bc72d7f8d1
[API Nodes] add TencentHunyuan3D nodes (#12026)
* feat(api-nodes): add TencentHunyuan3D nodes

* add "(Pro)" to display name

---------

Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-01-24 17:10:09 -08:00
comfyanonymous
aef4e13588
Make empty latent node work with other models. (#12062) 2026-01-24 19:23:20 -05:00
Rattus
12e1560dcc remove bad pyt2.4 versions gate 2026-01-25 09:14:52 +10:00
rattus
4e6a1b66a9
speed up and reduce VRAM of QWEN VAE and WAN (less so) (#12036)
Some checks are pending
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
* ops: introduce autopad for conv3d

This works around pytorch missing ability to causal pad as part of the
kernel and avoids massive weight duplications for padding.

* wan-vae: rework causal padding

This currently uses F.pad which takes a full deep copy and is liable to
be the VRAM peak. Instead, kick spatial padding back to the op and
consolidate the temporal padding with the cat for the cache.

* wan-vae: implement zero pad fast path

The WAN VAE is also QWEN where it is used single-image. These
convolutions are however zero padded 3d convolutions, which means the
VAE is actually just 2D down the last element of the conv weight in
the temporal dimension. Fast path this, to avoid adding zeros that
then just evaporate in convoluton math but cost computation.
2026-01-23 19:56:14 -05:00
comfyanonymous
9cf299a9f9
Make regular empty latent node work properly on flux 2 variants. (#12050) 2026-01-23 19:50:48 -05:00
ComfyUI Wiki
e89b22993a
Support ModelScope-Trainer/DiffSynth LoRA format for Flux.2 Klein models (#12042)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
2026-01-23 15:27:49 -05:00
Jukka Seppänen
55bd606e92
LTX2: Refactor forward function for better VRAM efficiency and fix spatial inpainting (#12046)
* Disable timestep embed compression when inpainting

Spatial inpainting not compatible with the compression

* Reduce crossattn peak VRAM

* LTX2: Refactor forward function for better VRAM efficiency
2026-01-23 15:26:38 -05:00
Rattus
a9bc2d884c MPDynamic: Add support for model defined dtype
If the model defines a dtype that is different to what is in the state
dict, respect that at load time. This is done as part of the casting
process.
2026-01-23 16:54:12 +10:00
Rattus
18748d4641 ops: fix __init__ return 2026-01-23 16:54:12 +10:00
Rattus
b9f6ec4ca5 archive the model defined dtypes
Scan created models and save off the dtypes as defined by the model
creation process. This is needed for assign=True, which will override
the dtypes.
2026-01-23 16:54:12 +10:00
Rattus
8371708e09 mp: big bump on the VBAR sizes
Now that the model defined dtype is decoupled from the state_dict
dtypes we need to be able to handle worst case scenario casts between
the SD and VBAR.
2026-01-23 16:54:12 +10:00
Rattus
19c9219fe4 ruff 2026-01-23 16:54:12 +10:00
Rattus
e36ffd2cee nodes_model_patch: fix copy-paste coding error 2026-01-23 16:54:12 +10:00
Rattus
b915d13e57 mp: handle blank __new__ call
This is needed for deepcopy construction. We shouldnt really have deep
copies of MP or MODynamic however this is a stay one in some controlnet
flows.
2026-01-23 16:54:12 +10:00
Rattus
7f706a01d6 mm: remove left over hooks draft code
This is phase 2
2026-01-23 16:54:12 +10:00
Rattus
ec4837c88a execution: remove per node gc.collect()
This isn't worth it and the likelyhood of inference leaving a complex
data-structure with cyclic reference behind is now. Remove it.

We would replace it with a condition on nodes that actually touch the
GPU which might be win.
2026-01-23 16:54:12 +10:00
Rattus
5bd8ec8544 implement lightweight safetensors with READ mmap
The CoW MMAP as used by safetensors is hardcoded to CoW which forcibly
consumes windows commit charge on a zero copy. RIP. Implement safetensors
in pytorch itself with a READ mmap to not get commit charged for all our
open models.
2026-01-23 16:54:12 +10:00
Rattus
1c5fc82077 ops: defer creation of the parameters until state dict load
If running on Windows, defer creation of the layer parameters until the state
dict is loaded. This avoids a massive charge in windows commit charge spike
when a model is created and not loaded.

This problem doesnt exist on Linux as linux allows RAM overcommit,
however windows does not. Before dynamic memory work this was also a non issue
as every non-quant model would just immediate RAM load and need the memory
anyway.

Make the workaround windows specific, as there may be someone out there with
some training from scratch workflow (which this might break), and assume said
someone is on Linux.
2026-01-23 16:54:12 +10:00
Rattus
441dcd2b17 remove junk arg 2026-01-23 16:54:12 +10:00
Rattus
76f94ecf9f aimdo version bump 2026-01-23 16:54:12 +10:00
Rattus
7f980124b0 main: Rework aimdo into process
Be more tolerant of unsupported platforms and fallback properly.
Fixes crash when cuda is not installed at all.
2026-01-23 16:54:12 +10:00
Rattus
a7023384ca sampling: improve progress meter accuracy for dynamic loading 2026-01-23 16:54:12 +10:00
Rattus
a310ca93d3 clip: support assign load when taking clip from a ckpt 2026-01-23 16:54:10 +10:00
Rattus
6ecbba2232 sd: empty cache on tiler fallback
This is needed for aimdo where the cache cant self recover from
fragmentation. It is however a good thing to do anyway after an OOM
so make it unconditional.
2026-01-23 16:52:31 +10:00
Rattus
61dda30171 ruff 2026-01-23 16:52:31 +10:00
Rattus
79b3fe334b misc cleanup 2026-01-23 16:52:31 +10:00
Rattus
15ae09fb19 add missing del on unpin 2026-01-23 16:52:31 +10:00
Rattus
6e852baa9a write better tx commentary 2026-01-23 16:52:31 +10:00
Rattus
a2c8f45c93 mm: fix sync
Sync before deleting anything.
2026-01-23 16:52:31 +10:00
Rattus
4d914099fb main: Go live with --fast dynamic_vram
Add the optional command line switch --fast dynamic_vram.

This is mutually exclusing --high-vram and --gpu-only which contradict
aimdos underlying feature.

Add appropriate installation warning and a startup message, match the
comfy debug level inconfiguring aimdo.

Add comfy-aimdo pip requirement. This will safely stub to a nop for
unsupported platforms.
2026-01-23 16:52:31 +10:00
Rattus
81845a9ab2 execution: add aimdo primary pytorch cache integration
We need to general pytorch cache defragmentation on an appropriate level for
aimdo. Do in here on the per node basis, which has a reasonable chance of
purging stale shapes out of the pytorch caching allocator and saving VRAM
without costing too much garbage collector thrash.

This looks like a lot of GC but because aimdo never fails from pytorch and
saves the pytorch allocator from ever need to defrag out of demand, but it
needs a oil change every now and then so we gotta do it. Doing it here also
means the pytorch temps are cleared from task manager VRAM usage so user
anxiety can go down a little when they see their vram drop back at the end
of workflows inline with inference usage (rather than assuming full VRAM
leaks).
2026-01-23 16:52:31 +10:00
Rattus
6b8f4949c4 models: Use CoreModelPatcher
Use CoreModelPatcher for all internal ModelPatcher implementations. This drives
conditional use of the aimdo feature, while making sure custom node packs get
to keep ModelPatcher unchanged for the moment.
2026-01-23 16:52:31 +10:00
Rattus
56d526c133 ops/mp: implement aimdo
Implement a model patcher and caster for aimdo.

A new ModelPatcher implementation which backs onto comfy-aimdo to implement varying model load levels that can be adjusted during model use. The patcher defers all load processes to lazily load the model during use (e.g. the first step of a ksampler) and automatically negotiates a load level during the inference to maximize VRAM usage without OOMing. If inference requires more VRAM than is available weights are offloaded to make space before the OOM happens.

As for loading the weight onto the GPU, that happens via comfy_cast_weights which is now used in all cases. cast_bias_weight checks whether the VBAR assigned to the model has space for the weight (based on the same load priority semantics as the original ModelPatcher). If it does, the VRAM as returned by the Aimdo allocator is used as the parameter GPU side. The caster is responsible for populating the weight data. This is done using the usual offload_stream (which mean we now have asynchronous load overlapping first use compute).

Pinning works a little differently. When a weight is detected during load as unable to fit, a pin is allocated at the time of casting and the weight as used by the layer is DMAd back to the the pin using the GPU DMA TX engine, also using the asynchronous offload streams. This means you get to pin the Lora modified and requantized weights which can be a major speedup for offload+quantize+lora use cases, This works around the JIT Lora + FP8 exclusion and brings FP8MM to heavy offloading users (who probably really need it with more modest GPUs). There is a performance risk in that a CPU+RAM patch has been replace with a GPU+RAM patch but my initial performance results look good. Most users as likely to have a GPU that outruns their CPU in these woods.

Some common code is written to consolidate a layers tensors for aimdo mapping, pinning, and DMA transfers. interpret_gathered_like() allows unpacking a raw buffer as a set of tensors. This is used consistently to bundle and pack weights, quantization metadata (QuantizedTensor bits) and biases into one payload for DMA in the load process reducing Cuda overhead a little. Some Quantization metadata was missing async offload is some cases which is now added. This also pins quantization metadata and consolidates the number of cuda_host_register calls (which can be expensive).
2026-01-23 16:52:31 +10:00
Rattus
1aa3386c9f mp: add mode for non comfy weight prioritization
non-comfy weights dont get async offload and a few other performance
limitations. Load them at top priority accordingly.
2026-01-23 16:52:31 +10:00
Rattus
a2e15d1117 mp/mm: APi expansions for dynamic loading
Add two api expansions, a flag for whether a model patcher is dynamic
a a very basic RAM freeing system.

Implement the semantics of the dynamic model patcher which never frees
VRAM ahead of time for the sake of another dynamic model patcher.

At the same time add an API for clearing out pins on a reservation of
model size x2 heuristic, as pins consume RAM in their own right in the
dynamic patcher.

This is actually less about OOMing RAM and more about performance, as
with assign=True load semantics there needs to be plenty headroom for
the OS to load models to dosk cache on demand so err on the side of
kicking old pins out.
2026-01-23 16:52:31 +10:00
Rattus
168dd7d6c2 mp: wrap get_free_memory
Dynamic load needs to adjust these numbers based on future movements,
so wrap this in a MP API.
2026-01-23 16:52:31 +10:00
Rattus
2bf2463ca8 pinned_memory: add python
Add a python for managing pinned memory of the weight/bias module level.
This allocates, pins and attached a tensor to a module for the pin for this
module. It does not set the weight, just allocates a singular ram buffer
for population and bulk DMA transfer.
2026-01-23 16:52:31 +10:00
Rattus
92a8183c13 move string_to_seed to utils.py
This needs to be visible by ops which may want to do stochastic rounding on
the fly.
2026-01-23 16:52:31 +10:00
Rattus
c5e0e80cb3 mm: Implement cast buffer allocations 2026-01-23 16:52:31 +10:00
Rattus
4622c0825e ops: Do bias dtype conversion on compute stream
For consistency with weights.
2026-01-23 16:52:31 +10:00
Rattus
d795a23c12 Reduce RAM and compute time in model saving with Loras
Get the model saving logic away from force_patch_weights and instead do
the patching JIT during safetensors saving.

Firstly switch off force_patch_weights in the load for save which avoids
creating CPU side tensors with loras calculated.

Then at save time, wrap the tensor to catch safetensors call to .to() and
patch it live.

This avoids having to ever have a lora-calculated copy of offloaded
weights on the CPU.

Also take advantage of the presence of the GPU when doing this Lora
calculation. The former force_patch_weights would just do eveyrthing on
the CPU. Its generally faster to go the GPU and back even if its just
a Lora application.
2026-01-23 16:52:31 +10:00
Christian Byrne
79cdbc81cb
feat: Improve ResizeImageMaskNode UX with tooltips and search aliases (#12040)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
- Add search_aliases for discoverability: resize, scale, dimensions, etc.
- Add node description for hover tooltip
- Add tooltips to all inputs explaining their behavior
- Reorder options: most common (scale dimensions) first, most technical (scale to multiple) last

Addresses user feedback that 'resize' search returned nothing useful and
options like 'match size' and 'scale to multiple' were not self-explanatory.
2026-01-22 22:04:27 -08:00
comfyanonymous
f443b9f2ca
Revert "feat: Improve ResizeImageMaskNode UX with tooltips and search aliases…" (#12038)
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
This reverts commit 4e3038114a.
2026-01-22 23:02:37 -05:00
Christian Byrne
4e3038114a
feat: Improve ResizeImageMaskNode UX with tooltips and search aliases (#12013)
- Add search_aliases for discoverability: resize, scale, dimensions, etc.
- Add node description for hover tooltip
- Add tooltips to all inputs explaining their behavior
- Reorder options: most common (scale dimensions) first, most technical (scale to multiple) last

Addresses user feedback that 'resize' search returned nothing useful and
options like 'match size' and 'scale to multiple' were not self-explanatory.
2026-01-22 18:46:55 -08:00
Christian Byrne
bbb8864778
add search aliases to all nodes (#12035)
* feat: Add search_aliases field to node schema

Adds `search_aliases` field to improve node discoverability. Users can define alternative search terms for nodes (e.g., "text concat" → StringConcatenate).

Changes:
- Add `search_aliases: list[str]` to V3 Schema
- Add `SEARCH_ALIASES` support for V1 nodes
- Include field in `/object_info` response
- Add aliases to high-priority core nodes

V1 usage:
```python
class MyNode:
    SEARCH_ALIASES = ["alt name", "synonym"]
```

V3 usage:
```python
io.Schema(
    node_id="MyNode",
    search_aliases=["alt name", "synonym"],
    ...
)
```

## Related PRs
- Frontend: Comfy-Org/ComfyUI_frontend#XXXX (draft - merge after this)
- Docs: Comfy-Org/docs#XXXX (draft - merge after stable)

* Propagate search_aliases through V3 Schema.get_v1_info to NodeInfoV1

* feat: add SEARCH_ALIASES for core nodes (#12016)

Add search aliases to 22 core nodes in nodes.py to improve node discoverability:
- Checkpoint/model loaders: CheckpointLoader, DiffusersLoader
- Conditioning nodes: ConditioningAverage, ConditioningSetArea, ConditioningSetMask, ConditioningZeroOut
- Style nodes: StyleModelApply
- Image nodes: LoadImageMask, LoadImageOutput, ImageBatch, ImageInvert, ImagePadForOutpaint
- Latent nodes: LoadLatent, SaveLatent, LatentBlend, LatentComposite, LatentCrop, LatentFlip, LatentFromBatch, LatentUpscale, LatentUpscaleBy, RepeatLatentBatch

* feat: add SEARCH_ALIASES for image, mask, and string nodes (#12017)

Add search aliases to nodes in comfy_extras for better discoverability:
- nodes_mask.py: mask manipulation nodes
- nodes_images.py: image processing nodes
- nodes_post_processing.py: post-processing effect nodes
- nodes_string.py: string manipulation nodes
- nodes_compositing.py: compositing nodes
- nodes_morphology.py: morphological operation nodes
- nodes_latent.py: latent space nodes

Uses search_aliases parameter in io.Schema() for v3 nodes.

* feat: add SEARCH_ALIASES for audio and video nodes (#12018)

Add search aliases to audio and video nodes for better discoverability:
- nodes_audio.py: audio loading, saving, and processing nodes
- nodes_video.py: video loading and processing nodes
- nodes_wan.py: WAN model nodes

Uses search_aliases parameter in io.Schema() for v3 nodes.

* feat: add SEARCH_ALIASES for model and misc nodes (#12019)

Add search aliases to model-related and miscellaneous nodes:
- Model nodes: nodes_model_merging.py, nodes_model_advanced.py, nodes_lora_extract.py
- Sampler nodes: nodes_custom_sampler.py, nodes_align_your_steps.py
- Control nodes: nodes_controlnet.py, nodes_attention_multiply.py, nodes_hooks.py
- Training nodes: nodes_train.py, nodes_dataset.py
- Utility nodes: nodes_logic.py, nodes_canny.py, nodes_differential_diffusion.py
- Architecture-specific: nodes_sd3.py, nodes_pixart.py, nodes_lumina2.py, nodes_kandinsky5.py, nodes_hidream.py, nodes_fresca.py, nodes_hunyuan3d.py
- Media nodes: nodes_load_3d.py, nodes_webcam.py, nodes_preview_any.py, nodes_wanmove.py

Uses search_aliases parameter in io.Schema() for v3 nodes, SEARCH_ALIASES class attribute for legacy nodes.
2026-01-22 18:36:58 -08:00