* model_management: disable non-dynamic smart memory
Disable smart memory outright for non dynamic models.
This is a minor step towards deprecation of --disable-dynamic-vram
and the legacy ModelPatcher.
This is needed for estimate-free model development, where new models
can opt-out of supplying a memory estimate and not have to worry
about hard VRAM allocations due to legacy non-dynamic model patchers
This is also a general stability increase for a lot of stray use cases
where estimates may still be off and going forward we are not going
to accurately maintain such estimates.
* pinned_memory: implement with aimdo growable buffer
Use a single growable buffer so we can do threaded pre-warming on
pinned memory.
* mm: use aimdo to do transfer from disk to pin
Aimdo implements a faster threaded loader.
* Add stream host pin buffer for AIMDO casts
Introduce per-offload-stream HostBuffer reuse for pinned staging,
include it in cast buffer reset synchronization.
Defer actual casts that go via this pin path to a separate pass
such that the buffer can be allocated monolithically (to avoid
cudaHostRegister thrash).
* remove old pin path
* Implement JIT pinned memory pressure
Replace the predictive pin pressure mechanism with JIT PIN memory
pressure.
* LowVRAMPatch: change to two-phase visit
* lora: re-implement as inplace swiss-army-knife operation
* prepare for multiple pin sets
* implement pinned loras
* requirements: comfy-aimdo 0.4.0
* ops: remove unused arg
This was defeatured in aimdo iteration
* ops: sync the CPU with only the offload stream activity
This was syncing with the offload stream which itself is synced with the
compute stream, so this was syncing CPU with compute transitively. Define
the event to sync it more gently.
* pins: implement freeing intermediate for pinned memory
Pinning is more important than inactive intermediates and the stream
pin buffer is more important than even active intermediates.
* execution: implement pin eviction on RAM presure
Add back proper pin freeing on RAM pressure
* implement pin registration swaps
Uncap the windows pins from 50% by extending the pool and have a pressure
mechanism to move the pin reservations om demand.
This unfortunately implies a GPU sync to do the freeing so significant
hysterisis needs to be added to consolidate these pressure events.
* cli_args/execution: Implement lower background cache-ram threshold
Limit the amount of RAM background intermediates can use, so that
switching workflows doesn't degrade performance too much.
* make default
* bump aimdo
* model-patcher: force-cast tiny weights
Flux 2 gets crazy stalls due to a mix of tiny and giant weights
creating lopsided steam buffer rotations which creates stalls.
* ops: refactor in prep for chunking
* mm: delegate pin-on-the-way to aimdo
Aimdo is able to chunk and slice this on the way for better CPU->GPU
overlap. The main advantage is the ability to shorten the bus contention
window between previous weight transfer and the next weights vbar
fault.
* bump aimdo
* pinning updates
* specify hostbuf max allocation size
There a signs of virtual memory exhaustion on some linux systems when
throwing 128GB for every little piece. Pass the actual to save aimdo
from over-estimates
* tests: update execution tests for caching
The default caching changed to ram-cache so update these tests
accordingly.
Remove the LRU 0 test as this also falls through to RAM cache.
Smoke test through the real HTTP upload + tag-add path exposed two
ordering bugs the unit-layer tests missed:
1. add_tags_to_reference did `to_add = sorted(want - current)` — an
alphabetical pre-sort defeating the microsecond-stagger fix from the
previous commit. The stagger was encoding alphabetical positions,
not the caller's insertion order. Fix: build to_add by walking the
already-normalized caller list and filtering against the current
set, so the staggered added_at timestamps reflect what the caller
actually requested.
2. get_reference_tags used .order_by(tag_name.asc()) — alphabetical.
It's called by the upload response path; meanwhile
list_references_page and fetch_reference_asset_and_tags were already
updated to order by added_at. The mismatch meant POST /api/assets
returned tags in alphabetical order but a subsequent GET returned
them in insertion order. Fix: order get_reference_tags by added_at
too, so all three response-path helpers agree.
New tests-unit/assets_test/test_user_tag_http_smoke.py exercises the
full HTTP layer: POST /api/assets to upload, POST /api/assets/{id}/tags
to add a user tag (using tag names like "aaa-user-tag" that would jump
to position 0 under alphabetical), GET /api/assets/{id} to verify
ordering. Catches the bugs above in CI going forward.
Full assets suite: 340 passed, 10 pre-existing skipped.
Cursor-reviews follow-up on PR #13994:
1. set_reference_tags / add_tags_to_reference now apply the same
microsecond stagger as batch_insert_seed_assets. Per-tag get_utc_now()
calls can collide at microsecond resolution on fast machines, dropping
retrieval to the tag_name alphabetical tiebreaker. Using a single
base_ts + timedelta(microseconds=i) preserves insertion order for any
batch.
2. Docstring on get_name_and_tags_from_asset_path corrected: only the
subpath is lowercased in code; the root category is lowercase by
construction in get_asset_category_and_relative_path.
3. resolve_destination_from_tags docstring now states explicitly that
hybrid shapes (mix of legacy multi-tag + new slash-joined within a
single call) are accepted and resolve to the same destination.
4. New TestTagRetrievalOrder class in test_asset_info.py exercises the
public write paths (set_reference_tags, add_tags_to_reference,
remove_tags_from_reference) and asserts the public read paths
(list_references_page, fetch_reference_asset_and_tags) return tags
in insertion order rather than alphabetical. Tag names are chosen
to fail loudly under alphabetical regression — "checkpoints" sorts
before "models", "aaa-user-tag" sorts before every path tag, etc.
Full assets suite: 338 passed, 10 pre-existing skipped.
Three bugs surfaced by an end-to-end smoke test of the read+write
round-trip; all in this PR's scope.
1. FK violation on uppercase paths
get_name_and_tags_from_asset_path was preserving case on the
subpath (e.g. "diffusers/Kolors/text_encoder"). ensure_tags_exist
lowercases via normalize_tags before inserting into the tags
table, so the asset_reference_tags.tag_name FK to tags.name
failed for any path containing uppercase letters — including
the diffusers case the PR was designed to support.
Fix: lowercase the slash-joined subpath in
get_name_and_tags_from_asset_path to match the canonicalization
ensure_tags_exist applies. Providers keyed on original-case
subpaths need to normalize their lookup key to lowercase.
2. resolve_destination_from_tags rejected the new tag shape
The inverse function only accepted the legacy one-tag-per-dir
shape (["models", "diffusers", "Kolors", "text_encoder"]).
An upload using the slash-joined shape returned by /api/assets
raised "unknown model category" or "invalid path component".
Fix: pre-split every entry after tags[0] on "/" so both shapes
resolve identically. For models, the first expanded segment is
the category and the rest are subdirs; for input/output the
full expansion becomes the subdirs.
3. Within-batch tag order was lost
bulk_ingest wrote every tag in a single batch with the same
added_at = current_time. The retrieval ORDER BY added_at, tag_name
then fell back to the tag_name tiebreaker, sorting the path-derived
pair alphabetically — putting "checkpoints/..." ahead of "models"
since "c" < "m". The tags[0] = root contract was lost on bulk-
ingested rows.
Fix: stagger added_at by microseconds per tag index within a
reference so the retrieval order matches the input list order.
Path-derived tags now consistently land in position-0 = root,
position-1 = subpath.
Tests
- TestGetNameAndTagsFromAssetPath updated: subpath is now lowercase.
- New TestResolveDestinationFromTags covers both tag shapes, the
unknown-category case for slash-joined input, traversal rejection,
and input/output paths.
- Full suite: 333 passed, 10 pre-existing skipped.
The /api/assets response previously sorted tags alphabetically via
.order_by(Tag.name.asc()). That breaks the structurally meaningful
"root category first, then subpath" invariant the path-collapsing
change relies on: alphabetical sort puts a custom user tag (or even
the bare "models" root) at unpredictable positions, so positional
access like tags[1] is not reliable on local.
Cloud already preserves insertion order — its Ent WithTags() eager-
load has no explicit ORDER BY, so Postgres returns rows in physical
insertion order. Local's composite primary key on
(asset_reference_id, tag_name) means SQLite walks the index in
tag_name order even without an explicit ORDER BY, so just dropping
the clause isn't enough.
Switching to ORDER BY added_at ASC, tag_name ASC keeps the path
tags inserted via set_reference_tags in their original order
(microsecond-resolution timestamps disambiguate same-batch inserts;
tag_name is a deterministic tiebreaker for the rare collision case).
Custom tags added later via add_tags_to_reference land after the
path tags in their own added_at bucket.
Applies to both response-shaping queries:
- list_references_page (GET /api/assets, tag_map join)
- fetch_reference_asset_and_tags (GET /api/assets/{id})
Catalog/histogram queries in app/assets/database/queries/tags.py
keep their alphabetical sort — those endpoints are listing all tags,
not per-asset tags, and alphabetical is the right shape there.
Aligns the OSS spec with the cloud-side BE-1004 contract:
- createWorkspaceApiKey request body: add maxLength: 5000 to the
description property (matches cloud's hub_profile.description
MaxLen(5000) convention; enforced cloud-side via handler check).
- WorkspaceApiKey + WorkspaceApiKeyCreated response schemas:
mark description as required (cloud's handler always populates
the field, defaulting to empty string when not supplied on create),
drop nullable: true, add maxLength: 5000 for symmetry, and clarify
the doc string ("Always present in responses; empty string when no
description was supplied on create").
Both schemas are tagged x-runtime: [cloud] at the schema level so the
tightening is correctly scoped — OSS-only implementations are not
required to honor the workspace API keys endpoints at all.
Related cloud PR: Comfy-Org/cloud#3747
normalize_tags lowercased every tag, which would have stripped case from
the slash-joined subpath (e.g. "diffusers/Kolors/text_encoder" ->
"diffusers/kolors/text_encoder") and broken consumer lookups keyed on
the original-case path. The refactored implementation inlines a strip +
dedup so the import is no longer needed.
The /api/assets response previously emitted one tag per parent directory
between the root category and the filename. For nested categories like
diffusers, this produced ["models", "diffusers", "Kolors", "text_encoder"]
where consumers that look up a category via tags[1] would only see the
top-level bucket name and miss the model-specific sub-path that uniquely
identifies the component.
This collapses the parent subpath into a single slash-joined tag so the
result is ["models", "diffusers/Kolors/text_encoder"]. Consumers can now
read tags[1] as a stable category identifier regardless of how deep the
file lives in the bucket. Case is preserved on the subpath so providers
keyed on the original-case path (e.g. "diffusers/Kolors/text_encoder")
resolve correctly.
Same shape applies uniformly:
- input/foo.png -> ["input"]
- output/00001.png -> ["output"]
- models/checkpoints/flux.safetensors -> ["models", "checkpoints"]
- models/diffusers/Kolors/text_encoder/m.sft -> ["models", "diffusers/Kolors/text_encoder"]
- models/loras/my/custom/path/v1.safetensors -> ["models", "loras/my/custom/path"]
Integration tests that filtered by individual subdirectory tags
(`include_tags=unit-tests,scope`) updated to use the new slash-joined
shape (`include_tags=unit-tests/scope`). Unit tests cover flat input,
flat output, flat models, diffusers-style nested, and deep user-subpath
cases.
* feat(openapi): add optional description field to workspace API key schemas
Add an optional `description` property (type: string) to three
workspace API key schemas in openapi.yaml:
- Inline request body of createWorkspaceApiKey (POST /api/workspace/api-keys)
- WorkspaceApiKey (list/info schema)
- WorkspaceApiKeyCreated (creation response schema)
The field is not added to any `required` array, making it fully
backward-compatible with existing clients.
Refs: BE-1005, BE-1004
Co-authored-by: Matt Miller <mattmillerai@users.noreply.github.com>
* fix(openapi): mark description nullable in workspace API key response schemas
Per CodeRabbit review on PR #13993: the underlying DB column is nullable
varchar (default ''), so the response schemas should permit null to match
stored data reality. Without nullable: true the OpenAPI contract would
require coercion on the handler side or risk a contract violation.
Request schema unchanged — clients shouldn't be sending null on create.
These two fields were added recently to the Asset schema as nullable
integers, with the intent of exposing original image dimensions for FE
consumers (cloud-side thumbnailing makes naturalWidth/Height return
the wrong size for an image card's dimension label).
The implementation effort that consumes them subsequently converged on
a different shape — dimensions nested under the existing free-form
`metadata` JSON field as `{kind: "image", width, height}` — to avoid
introducing type-specific flat fields on the canonical Asset shape,
and to leave room for forward-compatible additions (video duration,
fps, etc.) without further schema churn.
This removes the now-unused top-level fields so the spec reflects the
agreed direction. No other schema definitions reference these fields
directly: AssetCreated, AssetUpdated, etc. inherit Asset via allOf and
do not redefine them.
The runtime ingest implementation that would have populated these
fields was not yet shipped, so no clients are relying on the
top-level shape.
Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com>
Mark the uploadMask operation as deprecated and point clients at
/api/upload/image. The mask-compositing behavior the endpoint provides
(alpha-compositing the supplied mask onto an original_ref image) is now
expected to happen client-side, with the composited result uploaded
through the unified /api/upload/image path.
The endpoint continues to function for older clients; no runtime
behavior changes ship with this commit. Only the OpenAPI annotation
and the human-facing description are updated.
* Move detection category under image category
* Add missing categories
* Move detection nodes to detection category
* Move save nodes to image root catefory
* Rename postprocessors
* Move mask category under image
* Move guiders category to parent level at root of sampling category
* Move custom_sampling category to parent level at the root of sampling category
* Modify description of LoRA loaders
* Fix node id SolidMask
* Move VOID Quadmask under image/mask
* Group compositing nodes under image/compositing
* Move load image as mask to image category for consistency with other load image nodes
* Align display name with Load Checkpoint
* Move dataset category under training category
* Rename Number Convert to Conver Number (verb first)
* Rename Canny node
* Revert wanBlockSwap + description
* Add description to RemoveBackground node
* Revert category update of dataset