Cursor-reviews follow-up on PR #13994:
1. set_reference_tags / add_tags_to_reference now apply the same
microsecond stagger as batch_insert_seed_assets. Per-tag get_utc_now()
calls can collide at microsecond resolution on fast machines, dropping
retrieval to the tag_name alphabetical tiebreaker. Using a single
base_ts + timedelta(microseconds=i) preserves insertion order for any
batch.
2. Docstring on get_name_and_tags_from_asset_path corrected: only the
subpath is lowercased in code; the root category is lowercase by
construction in get_asset_category_and_relative_path.
3. resolve_destination_from_tags docstring now states explicitly that
hybrid shapes (mix of legacy multi-tag + new slash-joined within a
single call) are accepted and resolve to the same destination.
4. New TestTagRetrievalOrder class in test_asset_info.py exercises the
public write paths (set_reference_tags, add_tags_to_reference,
remove_tags_from_reference) and asserts the public read paths
(list_references_page, fetch_reference_asset_and_tags) return tags
in insertion order rather than alphabetical. Tag names are chosen
to fail loudly under alphabetical regression — "checkpoints" sorts
before "models", "aaa-user-tag" sorts before every path tag, etc.
Full assets suite: 338 passed, 10 pre-existing skipped.
Three bugs surfaced by an end-to-end smoke test of the read+write
round-trip; all in this PR's scope.
1. FK violation on uppercase paths
get_name_and_tags_from_asset_path was preserving case on the
subpath (e.g. "diffusers/Kolors/text_encoder"). ensure_tags_exist
lowercases via normalize_tags before inserting into the tags
table, so the asset_reference_tags.tag_name FK to tags.name
failed for any path containing uppercase letters — including
the diffusers case the PR was designed to support.
Fix: lowercase the slash-joined subpath in
get_name_and_tags_from_asset_path to match the canonicalization
ensure_tags_exist applies. Providers keyed on original-case
subpaths need to normalize their lookup key to lowercase.
2. resolve_destination_from_tags rejected the new tag shape
The inverse function only accepted the legacy one-tag-per-dir
shape (["models", "diffusers", "Kolors", "text_encoder"]).
An upload using the slash-joined shape returned by /api/assets
raised "unknown model category" or "invalid path component".
Fix: pre-split every entry after tags[0] on "/" so both shapes
resolve identically. For models, the first expanded segment is
the category and the rest are subdirs; for input/output the
full expansion becomes the subdirs.
3. Within-batch tag order was lost
bulk_ingest wrote every tag in a single batch with the same
added_at = current_time. The retrieval ORDER BY added_at, tag_name
then fell back to the tag_name tiebreaker, sorting the path-derived
pair alphabetically — putting "checkpoints/..." ahead of "models"
since "c" < "m". The tags[0] = root contract was lost on bulk-
ingested rows.
Fix: stagger added_at by microseconds per tag index within a
reference so the retrieval order matches the input list order.
Path-derived tags now consistently land in position-0 = root,
position-1 = subpath.
Tests
- TestGetNameAndTagsFromAssetPath updated: subpath is now lowercase.
- New TestResolveDestinationFromTags covers both tag shapes, the
unknown-category case for slash-joined input, traversal rejection,
and input/output paths.
- Full suite: 333 passed, 10 pre-existing skipped.
normalize_tags lowercased every tag, which would have stripped case from
the slash-joined subpath (e.g. "diffusers/Kolors/text_encoder" ->
"diffusers/kolors/text_encoder") and broken consumer lookups keyed on
the original-case path. The refactored implementation inlines a strip +
dedup so the import is no longer needed.
The /api/assets response previously emitted one tag per parent directory
between the root category and the filename. For nested categories like
diffusers, this produced ["models", "diffusers", "Kolors", "text_encoder"]
where consumers that look up a category via tags[1] would only see the
top-level bucket name and miss the model-specific sub-path that uniquely
identifies the component.
This collapses the parent subpath into a single slash-joined tag so the
result is ["models", "diffusers/Kolors/text_encoder"]. Consumers can now
read tags[1] as a stable category identifier regardless of how deep the
file lives in the bucket. Case is preserved on the subpath so providers
keyed on the original-case path (e.g. "diffusers/Kolors/text_encoder")
resolve correctly.
Same shape applies uniformly:
- input/foo.png -> ["input"]
- output/00001.png -> ["output"]
- models/checkpoints/flux.safetensors -> ["models", "checkpoints"]
- models/diffusers/Kolors/text_encoder/m.sft -> ["models", "diffusers/Kolors/text_encoder"]
- models/loras/my/custom/path/v1.safetensors -> ["models", "loras/my/custom/path"]
Integration tests that filtered by individual subdirectory tags
(`include_tags=unit-tests,scope`) updated to use the new slash-joined
shape (`include_tags=unit-tests/scope`). Unit tests cover flat input,
flat output, flat models, diffusers-style nested, and deep user-subpath
cases.