Move register_output_files() out of the periodic GC branch so it runs
right after each prompt completes, using the local e.history_result and
prompt_id. This prevents stale/overwritten values when multiple prompts
finish before GC triggers.
Keep enqueue_enrich() in the GC path since it's heavier and benefits
from batching via the 10-second interval.
Amp-Thread-ID: https://ampcode.com/threads/T-019cfdf4-c52c-771c-a920-57bac15c68be
Pass user_metadata through spec['metadata'] to batch_insert_seed_assets.
Update batch_insert_seed_assets to accept raw dicts (UserMetadata) in
addition to ExtractedMetadata, passing them through as-is without
calling .to_user_metadata().
Amp-Thread-ID: https://ampcode.com/threads/T-019cfbe3-6440-724b-a17b-66ce09ecd1ed
Add ingest_existing_file() to services/ingest.py as a public wrapper for
registering on-disk files (stat, BLAKE3 hash, MIME detection, path-based
tag derivation).
After each prompt execution in the main loop, iterate
history_result['outputs'] and register files with type 'output' as
assets. Runs while the asset seeder is paused, gated behind
asset_seeder.is_disabled(). Populates job_id on the asset_references
table for provenance tracking.
Ingest uses a two-phase approach: insert a stub record (hash=NULL) first
for instant visibility, then defer hashing to the background seeder
enrich phase to avoid blocking the prompt worker thread.
When multiple enrich scans are enqueued while the seeder is busy, roots
are now unioned and compute_hashes uses sticky-true (OR) logic so no
queued work is silently dropped.
Extract _reset_to_idle helper in the asset seeder to deduplicate the
state reset pattern shared by _run_scan and mark_missing_outside_prefixes.
Separate history parsing from output file registration: move generic
file registration logic into register_output_files() in
app/assets/services/ingest.py, keeping only the ComfyUI history format
parsing (_collect_output_absolute_paths) in main.py.
Amp-Thread-ID: https://ampcode.com/threads/T-019cf842-5199-71f0-941d-b420b5cf4d57
* Add Number Convert node for unified numeric type conversion
Consolidates fragmented IntToFloat/FloatToInt nodes (previously only
available via third-party packs like ComfyMath, FillNodes, etc.) into
a single core node.
- Single input accepting INT, FLOAT, STRING, and BOOL types
- Two outputs: FLOAT and INT
- Conversion: bool→0/1, string→parsed number, float↔int standard cast
- Follows Math Expression node patterns (comfy_api, io.Schema, etc.)
Refs: COM-16925
* Register nodes_number_convert.py in extras_files list
Without this entry in nodes.py, the Number Convert node file
would not be discovered and loaded at startup.
* Add isfinite guard, exception chaining, and unit tests for Number Convert node
- Add math.isfinite() check to prevent int() crash on inf/nan string inputs
- Use 'from None' for cleaner exception chaining on string parse failure
- Add 21 unit tests covering all input types and error paths
* CURVE node
* remove curve to sigmas node
* feat: add CurveInput ABC with MonotoneCubicCurve implementation (#12986)
CurveInput is an abstract base class so future curve representations
(bezier, LUT-based, analytical functions) can be added without breaking
downstream nodes that type-check against CurveInput.
MonotoneCubicCurve is the concrete implementation that:
- Mirrors frontend createMonotoneInterpolator (curveUtils.ts) exactly
- Pre-computes slopes as numpy arrays at construction time
- Provides vectorised interp_array() using numpy for batch evaluation
- interp() for single-value evaluation
- to_lut() for generating lookup tables
CurveEditor node wraps raw widget points in MonotoneCubicCurve.
* linear curve
* refactor: move CurveEditor to comfy_extras/nodes_curve.py with V3 schema
* feat: add HISTOGRAM type and histogram support to CurveEditor
* code improve
---------
Co-authored-by: Christian Byrne <cbyrne@comfy.org>
There was an issue where the resample split was too early and dropped one
of the rolling convolutions a frame early. This is most noticable as a
lighting/color change between pixel frames 5->6 (latent 2->3), or as a
lighting change between the first and last frame in an FLF wan flow.
The recent PR that added resize_cond_for_context_window methods to
model classes used inline 'import comfy.context_windows' in each
method body. This moves that import to the top-level import section,
replacing 4 duplicate inline imports with a single top-level one.
* Add slice_cond and per-model context window cond resizing
* Fix cond_value.size() call in context window cond resizing
* Expose additional advanced inputs for ContextWindowsManualNode
Necessary for WanAnimate context windows workflow, which needs cond_retain_index_list = 0 to work properly with its reference input.
---------
* chore(api-nodes): mark seedream-3-0-t2i and seedance-1-0-lite models as deprecated
* fix(api-nodes): fixed old regression in the ByteDanceImageReference node
---------
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* sd: soft_empty_cache on tiler fallback
This doesnt cost a lot and creates the expected VRAM reduction in
resource monitors when you fallback to tiler.
* wan: vae: Don't recursion in local fns (move run_up)
Moved Decoder3d’s recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
* ltx: vae: Don't recursion in local fns (move run_up)
Mov the recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
* ltx: vae: add cache state to downsample block
* ltx: vae: Add time stride awareness to causal_conv_3d
* ltx: vae: Automate truncation for encoder
Other VAEs just truncate without error. Do the same.
* sd/ltx: Make chunked_io a flag in its own right
Taking this bi-direcitonal, so make it a for-purpose named flag.
* ltx: vae: implement chunked encoder + CPU IO chunking
People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.
* ltx: vae-encode: round chunk sizes more strictly
Only powers of 2 and multiple of 8 are valid due to cache slicing.
* ltx: vae: add cache state to downsample block
* ltx: vae: Add time stride awareness to causal_conv_3d
* ltx: vae: Automate truncation for encoder
Other VAEs just truncate without error. Do the same.
* sd/ltx: Make chunked_io a flag in its own right
Taking this bi-direcitonal, so make it a for-purpose named flag.
* ltx: vae: implement chunked encoder + CPU IO chunking
People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.
* ltx: vae-encode: round chunk sizes more strictly
Only powers of 2 and multiple of 8 are valid due to cache slicing.
On Apple Silicon, `vram_state` is set to `VRAMState.SHARED` because
CPU and GPU share unified memory. However, `text_encoder_device()`
only checked for `HIGH_VRAM` and `NORMAL_VRAM`, causing all text
encoders to fall back to CPU on MPS devices.
Adding `VRAMState.SHARED` to the condition allows non-quantized text
encoders (e.g. bf16 Gemma 3 12B) to run on the MPS GPU, providing
significant speedup for text encoding and prompt generation.
Note: quantized models (fp4/fp8) that use float8_e4m3fn internally
will still fall back to CPU via the `supports_cast()` check in
`CLIP.__init__()`, since MPS does not support fp8 dtypes.
* wan: vae: encoder: Add feature cache layer that corks singles
If a downsample only gives you a single frame, save it to the feature
cache and return nothing to the top level. This increases the
efficiency of cacheability, but also prepares support for going two
by two rather than four by four on the frames.
* wan: remove all concatentation with the feature cache
The loopers are now responsible for ensuring that non-final frames are
processes at least two-by-two, elimiating the need for this cat case.
* wan: vae: recurse and chunk for 2+2 frames on decode
Avoid having to clone off slices of 4 frame chunks and reduce the size
of the big 6 frame convolutions down to 4. Save the VRAMs.
* wan: encode frames 2x2.
Reduce VRAM usage greatly by encoding frames 2 at a time rather than
4.
* wan: vae: remove cloning
The loopers now control the chunking such there is noever more than 2
frames, so just cache these slices directly and avoid the clone
allocations completely.
* wan: vae: free consumer caller tensors on recursion
* wan: vae: restyle a little to match LTX
* ltx: vae: scale the chunk size with the users VRAM
Scale this linearly down for users with low VRAM.
* ltx: vae: free non-chunking recursive intermediates
* ltx: vae: cleanup some intermediates
The conv layer can be the VRAM peak and it does a torch.cat. So cleanup
the pieces of the cat. Also clear our the cache ASAP as each layer detect
its end as this VAE surges in VRAM at the end due to the ended padding
increasing the size of the final frame convolutions off-the-books to
the chunker. So if all the earlier layers free up their cache it can
offset that surge.
Its a fragmentation nightmare, and the chance of it having to recache the
pyt allocator is very high, but you wont OOM.
Mark the weight_dtype parameter in UNETLoader (Load Diffusion Model) as
an advanced input to reduce UI complexity for new users. The parameter
is now hidden behind an expandable Advanced section, matching the
pattern used for other advanced inputs like device, tile_size, and
overlap.
Amp-Thread-ID: https://ampcode.com/threads/T-019cbaf1-d3c0-718e-a325-318baba86dec
Write to a temp file in the same directory then os.replace() onto the
target path. If the process crashes mid-write, the original file is
left intact instead of being truncated to zero bytes.
Fixes#11298