* fix: pin SQLAlchemy>=2.0 in requirements.txt (fixes#13036) (#13316)
* Refactor io to IO in nodes_ace.py (#13485)
* Bump comfyui-frontend-package to 1.42.12 (#13489)
* Make the ltx audio vae more native. (#13486)
* feat(api-nodes): add automatic downscaling of videos for ByteDance 2 nodes (#13465)
* Support standalone LTXV audio VAEs (#13499)
* [Partner Nodes] added 4K resolution for Veo models; added Veo 3 Lite model (#13330)
* feat(api nodes): added 4K resolution for Veo models; added Veo 3 Lite model
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* increase poll_interval from 5 to 9
---------
Signed-off-by: bigcat88 <bigcat88@icloud.com>
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* Bump comfyui-frontend-package to 1.42.14 (#13493)
* Add gpt-image-2 as version option (#13501)
* Allow logging in comfy app files. (#13505)
* chore: update workflow templates to v0.9.59 (#13507)
* fix(veo): reject 4K resolution for veo-3.0 models in Veo3VideoGenerationNode (#13504)
The tooltip on the resolution input states that 4K is not available for
veo-3.1-lite or veo-3.0 models, but the execute guard only rejected the
lite combination. Selecting 4K with veo-3.0-generate-001 or
veo-3.0-fast-generate-001 would fall through and hit the upstream API
with an invalid request.
Broaden the guard to match the documented behavior and update the error
message accordingly.
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* feat: RIFE and FILM frame interpolation model support (CORE-29) (#13258)
* initial RIFE support
* Also support FILM
* Better RAM usage, reduce FILM VRAM peak
* Add model folder placeholder
* Fix oom fallback frame loss
* Remove torch.compile for now
* Rename model input
* Shorter input type name
---------
* fix: use Parameter assignment for Stable_Zero123 cc_projection weights (fixes#13492) (#13518)
On Windows with aimdo enabled, disable_weight_init.Linear uses lazy
initialization that sets weight and bias to None to avoid unnecessary
memory allocation. This caused a crash when copy_() was called on the
None weight attribute in Stable_Zero123.__init__.
Replace copy_() with direct torch.nn.Parameter assignment, which works
correctly on both Windows (aimdo enabled) and other platforms.
* Derive InterruptProcessingException from BaseException (#13523)
* bump manager version to 4.2.1 (#13516)
* ModelPatcherDynamic: force cast stray weights on comfy layers (#13487)
the mixed_precision ops can have input_scale parameters that are used
in tensor math but arent a weight or bias so dont get proper VRAM
management. Treat these as force-castable parameters like the non comfy
weight, random params are buffers already are.
* Update logging level for invalid version format (#13526)
* [Partner Nodes] add SD2 real human support (#13509)
* feat(api-nodes): add SD2 real human support
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* fix: add validation before uploading Assets
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* Add asset_id and group_id displaying on the node
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* extend poll_op to use instead of custom async cycle
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* added the polling for the "Active" status after asset creation
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* updated tooltip for group_id
* allow usage of real human in the ByteDance2FirstLastFrame node
* add reference count limits
* corrected price in status when input assets contain video
Signed-off-by: bigcat88 <bigcat88@icloud.com>
---------
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* feat: SAM (segment anything) 3.1 support (CORE-34) (#13408)
* [Partner Nodes] GPTImage: fix price badges, add new resolutions (#13519)
* fix(api-nodes): fixed price badges, add new resolutions
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* proper calculate the total run cost when "n > 1"
Signed-off-by: bigcat88 <bigcat88@icloud.com>
---------
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* chore: update workflow templates to v0.9.61 (#13533)
* chore: update embedded docs to v0.4.4 (#13535)
* add 4K resolution to Kling nodes (#13536)
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* Fix LTXV Reference Audio node (#13531)
* comfy-aimdo 0.2.14: Hotfix async allocator estimations (#13534)
This was doing an over-estimate of VRAM used by the async allocator when lots
of little small tensors were in play.
Also change the versioning scheme to == so we can roll forward aimdo without
worrying about stable regressions downstream in comfyUI core.
* Disable sageattention for SAM3 (#13529)
Causes Nans
* execution: Add anti-cycle validation (#13169)
Currently if the graph contains a cycle, the just inifitiate recursions,
hits a catch all then throws a generic error against the output node
that seeded the validation. Instead, fail the offending cycling mode
chain and handlng it as an error in its own right.
Co-authored-by: guill <jacob.e.segal@gmail.com>
* chore: update workflow templates to v0.9.62 (#13539)
---------
Signed-off-by: bigcat88 <bigcat88@icloud.com>
Co-authored-by: Octopus <liyuan851277048@icloud.com>
Co-authored-by: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com>
Co-authored-by: Comfy Org PR Bot <snomiao+comfy-pr@gmail.com>
Co-authored-by: Alexander Piskun <13381981+bigcat88@users.noreply.github.com>
Co-authored-by: Jukka Seppänen <40791699+kijai@users.noreply.github.com>
Co-authored-by: AustinMroz <austin@comfy.org>
Co-authored-by: Daxiong (Lin) <contact@comfyui-wiki.com>
Co-authored-by: Matt Miller <matt@miller-media.com>
Co-authored-by: blepping <157360029+blepping@users.noreply.github.com>
Co-authored-by: Dr.Lt.Data <128333288+ltdrdata@users.noreply.github.com>
Co-authored-by: rattus <46076784+rattus128@users.noreply.github.com>
Co-authored-by: guill <jacob.e.segal@gmail.com>
* Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2)
Amp-Thread-ID: https://ampcode.com/threads/T-019da964-2cc8-77f9-9aae-23f65da233db
Co-authored-by: Amp <amp@ampcode.com>
* Add GPU device selection to all loader nodes
- Add get_gpu_device_options() and resolve_gpu_device_option() helpers
in model_management.py for vendor-agnostic GPU device selection
- Add device widget to CheckpointLoaderSimple, UNETLoader, VAELoader
- Expand device options in CLIPLoader, DualCLIPLoader, LTXAVTextEncoderLoader
from [default, cpu] to include gpu:0, gpu:1, etc. on multi-GPU systems
- Wire load_diffusion_model_state_dict and load_state_dict_guess_config
to respect model_options['load_device']
- Graceful fallback: unrecognized devices (e.g. gpu:1 on single-GPU)
silently fall back to default
Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>
* Add VALIDATE_INPUTS to skip device combo validation for workflow portability
When a workflow saved on a 2-GPU machine (with device=gpu:1) is loaded
on a 1-GPU machine, the combo validation would reject the unknown value.
VALIDATE_INPUTS with the device parameter bypasses combo validation for
that input only, allowing resolve_gpu_device_option to handle the
graceful fallback at runtime.
Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>
* Set CUDA device context in outer_sample to match model load_device
Custom CUDA kernels (comfy_kitchen fp8 quantization) use
torch.cuda.current_device() for DLPack tensor export. When a model is
loaded on a non-default GPU (e.g. cuda:1), the CUDA context must match
or the kernel fails with 'Can't export tensors on a different CUDA
device index'. Save and restore the previous device around sampling.
Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>
* Fix code review bugs: negative index guard, CPU offload_device, checkpoint te_model_options
- resolve_gpu_device_option: reject negative indices (gpu:-1)
- UNETLoader: set offload_device when cpu is selected
- CheckpointLoaderSimple: pass te_model_options for CLIP device,
set offload_device for cpu, pass load_device to VAE
- load_diffusion_model_state_dict: respect offload_device from model_options
- load_state_dict_guess_config: respect offload_device, pass load_device to VAE
Amp-Thread-ID: https://ampcode.com/threads/T-019daa41-f394-731a-8955-4cff4f16283a
Co-authored-by: Amp <amp@ampcode.com>
* Fix CUDA device context for CLIP encoding and VAE encode/decode
Add torch.cuda.set_device() calls to match model's load device in:
- CLIP.encode_from_tokens: fixes 'Can't export tensors on a different
CUDA device index' when CLIP is loaded on a non-default GPU
- CLIP.encode_from_tokens_scheduled: same fix for the hooks code path
- CLIP.generate: same fix for text generation
- VAE.decode: fixes VAE decoding on non-default GPU
- VAE.encode: fixes VAE encoding on non-default GPU
Same pattern as the existing outer_sample fix in samplers.py - saves
and restores previous CUDA device in a try/finally block.
Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57
Co-authored-by: Amp <amp@ampcode.com>
* Extract cuda_device_context manager, fix tiled VAE methods
Add model_management.cuda_device_context() — a context manager that
saves/restores torch.cuda.current_device when operating on a non-default
GPU. Replaces 6 copies of the manual save/set/restore boilerplate.
Refactored call sites:
- CLIP.encode_from_tokens
- CLIP.encode_from_tokens_scheduled (hooks path)
- CLIP.generate
- VAE.decode
- VAE.encode
- samplers.outer_sample
Bug fixes (newly wrapped):
- VAE.decode_tiled: was missing device context entirely, would fail
on non-default GPU when called from 'VAE Decode (Tiled)' node
- VAE.encode_tiled: same issue for 'VAE Encode (Tiled)' node
Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57
Co-authored-by: Amp <amp@ampcode.com>
* Restore CheckpointLoaderSimple, add CheckpointLoaderDevice
Revert CheckpointLoaderSimple to its original form (no device input)
so it remains the simple default loader.
Add new CheckpointLoaderDevice node (advanced/loaders) with separate
model_device, clip_device, and vae_device inputs for per-component
GPU placement in multi-GPU setups.
Amp-Thread-ID: https://ampcode.com/threads/T-019dabdc-8feb-766f-b4dc-f46ef4d8ff57
Co-authored-by: Amp <amp@ampcode.com>
---------
Co-authored-by: Amp <amp@ampcode.com>
* sd: soft_empty_cache on tiler fallback
This doesnt cost a lot and creates the expected VRAM reduction in
resource monitors when you fallback to tiler.
* wan: vae: Don't recursion in local fns (move run_up)
Moved Decoder3d’s recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
* ltx: vae: Don't recursion in local fns (move run_up)
Mov the recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
* ltx: vae: add cache state to downsample block
* ltx: vae: Add time stride awareness to causal_conv_3d
* ltx: vae: Automate truncation for encoder
Other VAEs just truncate without error. Do the same.
* sd/ltx: Make chunked_io a flag in its own right
Taking this bi-direcitonal, so make it a for-purpose named flag.
* ltx: vae: implement chunked encoder + CPU IO chunking
People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.
* ltx: vae-encode: round chunk sizes more strictly
Only powers of 2 and multiple of 8 are valid due to cache slicing.
This is an experimental WIP option that might not work in your workflow but
should lower memory usage if it does.
Currently only the VAE and the load image node will output in fp16 when
this option is turned on.
Pytorch only filters for OOMs in its own allocators however there are
paths that can OOM on allocators made outside the pytorch allocators.
These manifest as an AllocatorError as pytorch does not have universal
error translation to its OOM type on exception. Handle it. A log I have
for this also shows a double report of the error async, so call the
async discarder to cleanup and make these OOMs look like OOMs.
* sd: add support for clip model reconstruction
* nodes: SetClipHooks: Demote the dynamic model patcher
* mp: Make dynamic_disable more robust
The backup need to not be cloned. In addition add a delegate object
to ModelPatcherDynamic so that non-cloning code can do
ModelPatcherDynamic demotion
* sampler_helpers: Demote to non-dynamic model patcher when hooking
* code rabbit review comments
* mp: attach re-construction arguments to model patcher
When making a model-patcher from a unet or ckpt, attach a callable
function that can be called to replay the model construction. This
can be used to deep clone model patcher WRT the actual model.
Originally written by Kosinkadink
f4b99bc623
* mp: Add disable_dynamic clone argument
Add a clone argument that lets a caller clone a ModelPatcher but disable
dynamic to demote the clone to regular MP. This is useful for legacy
features where dynamic_vram support is missing or TBD.
* torch_compile: disable dynamic_vram
This is a bigger feature. Disable for the interim to preserve
functionality.
* make setattr safe for non existent attributes
Handle the case where the attribute doesnt exist by returning a static
sentinel (distinct from None). If the sentinel is passed in as the set
value, del the attr.
* Account for dequantization and type-casts in offload costs
When measuring the cost of offload, identify weights that need a type
change or dequantization and add the size of the conversion result
to the offload cost.
This is mutually exclusive with lowvram patches which already has
a large conservative estimate and wont overlap the dequant cost so\
dont double count.
* Set the compute type on CLIP MPs
So that the loader can know the size of weights for dequant accounting.
* Add Kandinsky5 model support
lite and pro T2V tested to work
* Update kandinsky5.py
* Fix fp8
* Fix fp8_scaled text encoder
* Add transformer_options for attention
* Code cleanup, optimizations, use fp32 for all layers originally at fp32
* ImageToVideo -node
* Fix I2V, add necessary latent post process nodes
* Support text to image model
* Support block replace patches (SLG mostly)
* Support official LoRAs
* Don't scale RoPE for lite model as that just doesn't work...
* Update supported_models.py
* Rever RoPE scaling to simpler one
* Fix typo
* Handle latent dim difference for image model in the VAE instead
* Add node to use different prompts for clip_l and qwen25_7b
* Reduce peak VRAM usage a bit
* Further reduce peak VRAM consumption by chunking ffn
* Update chunking
* Update memory_usage_factor
* Code cleanup, don't force the fp32 layers as it has minimal effect
* Allow for stronger changes with first frames normalization
Default values are too weak for any meaningful changes, these should probably be exposed as advanced node options when that's available.
* Add image model's own chat template, remove unused image2video template
* Remove hard error in ReplaceVideoLatentFrames -node
* Update kandinsky5.py
* Update supported_models.py
* Fix typos in prompt template
They were now fixed in the original repository as well
* Update ReplaceVideoLatentFrames
Add tooltips
Make source optional
Better handle negative index
* Rename NormalizeVideoLatentFrames -node
For bit better clarity what it does
* Fix NormalizeVideoLatentStart node out on non-op