This change solves the basename collision bug by using UNIQUE(file_path) on the
unified asset_references table. Key changes:
Database:
- Migration 0005 merges asset_cache_states and asset_infos into asset_references
- AssetReference now contains: cache state fields (file_path, mtime_ns, needs_verify,
is_missing, enrichment_level) plus info fields (name, owner_id, preview_id, etc.)
- AssetReferenceMeta replaces AssetInfoMeta
- AssetReferenceTag replaces AssetInfoTag
- UNIQUE constraint on file_path prevents duplicate entries for same file
Code:
- New unified query module: asset_reference.py (replaces asset_info.py, cache_state.py)
- Updated scanner, seeder, and services to use AssetReference
- Updated API routes to use reference_id instead of asset_info_id
Tests:
- All 175 unit tests updated and passing
- Integration tests require server environment (not run here)
Amp-Thread-ID: https://ampcode.com/threads/T-019c4fe8-9dcb-75ce-bea8-ea786343a581
Co-authored-by: Amp <amp@ampcode.com>
- Added disable(), enable(), and is_disabled() methods to AssetSeeder
- start() now checks is_disabled() and returns early if disabled
- Updated main.py to call asset_seeder.disable() when CLI flag is set
- Fixes bypass where /object_info would trigger scans regardless of flag
Amp-Thread-ID: https://ampcode.com/threads/T-019c4f66-6773-72d2-bdfe-b55f5aa76021
Co-authored-by: Amp <amp@ampcode.com>
- Add log when scanner start is requested and when skipped due to already running
- Remove noisy 'no mime_type' info log (expected during fast stub phase)
Amp-Thread-ID: https://ampcode.com/threads/T-019c4f56-3fe1-72cb-888a-3ac4ac99b3d7
Co-authored-by: Amp <amp@ampcode.com>
- Add custom MIME type registrations for model files (.safetensors, .pt, .ckpt, .gguf, .yaml)
- Pass mime_type through SeedAssetSpec to bulk_ingest
- Re-register types before use since server.py mimetypes.init() resets them
- Add tests for bulk ingest mime_type handling
Amp-Thread-ID: https://ampcode.com/threads/T-019c3626-c6ad-7139-a570-62da4e656a1a
Co-authored-by: Amp <amp@ampcode.com>
- Make blake3 import lazy in hashing.py (only imported when needed)
- Add compute_hashes parameter to AssetSeeder.start(), build_asset_specs(), and seed_assets()
- Fix missing tag clearing: include is_missing states in sync when update_missing_tags=True
- Clear is_missing flag on cache states when files are restored with matching mtime/size
- Fix validation error serialization in routes.py (use json.loads(ve.json()))
Amp-Thread-ID: https://ampcode.com/threads/T-019c3614-56d4-74a8-a717-19922d6dbbee
Co-authored-by: Amp <amp@ampcode.com>
Since size_bytes is declared as non-nullable (nullable=False, default=0) in
the Asset model, simplify the conditional checks:
- Use 'if item.asset else None' when the asset relationship might be None
- Access size_bytes directly when asset is guaranteed to exist (create endpoints)
Amp-Thread-ID: https://ampcode.com/threads/T-019c354e-cbfb-77d8-acdd-0d066c16006e
Co-authored-by: Amp <amp@ampcode.com>
- Fix race in mark_missing_outside_prefixes: set state to RUNNING inside
lock before operations, restore to IDLE in finally block to prevent
concurrent start() calls
- Fix timing consistency: capture perf_counter before _update_progress
for consistent event timing
Amp-Thread-ID: https://ampcode.com/threads/T-019c354b-e7d7-7309-aa0e-79e5e7dff2b7
Co-authored-by: Amp <amp@ampcode.com>
- Add bulk_update_is_missing() to efficiently update is_missing flag
- Update sync_cache_states_with_filesystem() to mark non-existent files as is_missing=True
- Call restore_cache_states_by_paths() in batch_insert_seed_assets() to restore
previously-missing states when files reappear during scanning
Amp-Thread-ID: https://ampcode.com/threads/T-019c3177-e591-7666-ac6b-7e05c71c8ebf
Co-authored-by: Amp <amp@ampcode.com>
- Add is_missing column to AssetCacheState for soft-delete
- Replace hard-delete pruning with mark_cache_states_missing_outside_prefixes
- Auto-restore missing cache states when files are re-scanned
- Filter out missing cache states from queries by default
- Rename functions for clarity:
- mark_cache_states_missing_outside_prefixes (was delete_cache_states_outside_prefixes)
- get_unreferenced_unhashed_asset_ids (was get_orphaned_seed_asset_ids)
- mark_assets_missing_outside_prefixes (was prune_orphaned_assets)
- mark_missing_outside_prefixes_safely (was prune_orphans_safely)
- Add restore_cache_states_by_paths for explicit restoration
- Add cleanup_unreferenced_assets for explicit hard-delete when needed
- Update API endpoint /api/assets/prune to use new soft-delete behavior
This preserves user metadata (tags, etc.) when base directories change,
allowing assets to be restored when the original paths become available again.
Amp-Thread-ID: https://ampcode.com/threads/T-019c3114-bf28-73a9-a4d2-85b208fd5462
Co-authored-by: Amp <amp@ampcode.com>
Rename _sync_root_safely, _prune_orphans_safely, _collect_paths_for_roots,
_build_asset_specs, and _insert_asset_specs to remove underscore prefix
since they are used by seeder.py as part of the public API.
Amp-Thread-ID: https://ampcode.com/threads/T-019c3037-df32-7138-99d8-b4b824d896b3
Co-authored-by: Amp <amp@ampcode.com>
- Remove automatic pruning from scan loop to prevent partial scans from
deleting assets belonging to other roots
- Add get_all_known_prefixes() helper to get prefixes for all root types
- Add prune_orphans() method to AssetSeeder for explicit pruning
- Add prune_first parameter to start() for optional pre-scan pruning
- Add POST /api/assets/prune endpoint for explicit pruning via API
- Update main.py startup to use prune_first=True for full startup scans
- Add tests for new prune_orphans functionality
Fixes issue where a models-only scan would delete all input/output assets.
Amp-Thread-ID: https://ampcode.com/threads/T-019c2ba0-e004-7229-81bf-452b2f7f57a1
Co-authored-by: Amp <amp@ampcode.com>
- Add AssetSeeder singleton class with thread management and cancellation
- Support IDLE/RUNNING/CANCELLING state machine with thread-safe access
- Emit WebSocket events for scan progress (started, progress, completed, cancelled, error)
- Update main.py to use non-blocking asset_seeder.start() at startup
- Add shutdown() call in finally block for graceful cleanup
- Update POST /api/assets/seed to return 202 Accepted, support ?wait=true
- Add GET /api/assets/seed/status and POST /api/assets/seed/cancel endpoints
- Update test helper to use ?wait=true for synchronous behavior
- Add 17 unit tests covering state transitions, cancellation, and thread safety
- Log scan configuration (models directory, input/output paths) at scan start
Amp-Thread-ID: https://ampcode.com/threads/T-019c2b45-e6e8-740a-b38b-b11daea8d094
Co-authored-by: Amp <amp@ampcode.com>
Catch ValueError from resolve_destination_from_tags in the upload
endpoint so that invalid path components like '..' return a 400
BAD_REQUEST error instead of falling through to the 500 handler.
Amp-Thread-ID: https://ampcode.com/threads/T-019c2af2-7c87-7263-88b0-9feca1c31b3c
Co-authored-by: Amp <amp@ampcode.com>
- Create file_utils.py with shared file utilities:
- get_mtime_ns() - extract mtime in nanoseconds from stat
- get_size_and_mtime_ns() - get both size and mtime
- verify_file_unchanged() - check file matches DB mtime/size
- list_files_recursively() - recursive directory listing
- Create bulk_ingest.py for bulk operations:
- BulkInsertResult dataclass
- batch_insert_seed_assets() - batch insert with conflict handling
- prune_orphaned_assets() - clean up orphaned assets
- Update scanner.py to use new service modules instead of
calling database queries directly
- Update ingest.py to use shared get_size_and_mtime_ns()
- Export new functions from services/__init__.py
Amp-Thread-ID: https://ampcode.com/threads/T-019c2ae7-f701-716a-a0dd-1feb988732fb
Co-authored-by: Amp <amp@ampcode.com>
- Delete app/assets/manager.py
- Move upload logic (upload_from_temp_path, create_from_hash) to ingest service
- Add HashMismatchError and DependencyMissingError to ingest service
- Add UploadResult schema for upload responses
- Update routes.py to import services directly and do schema conversion inline
- Add asset lookup/listing service functions to asset_management.py
Routes now call the service layer directly, removing an unnecessary
layer of indirection. The manager was only converting between service
dataclasses and Pydantic response schemas.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add typed result dataclasses: IngestResult, AddTagsResult,
RemoveTagsResult, SetTagsResult, TagUsage
- Add UserMetadata type alias for user_metadata parameters
- Type helper functions with Session parameters
- Use TypedDicts at query layer to avoid circular imports
- Update manager.py and tests to use attribute access
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Scanner is used externally by main.py and server.py for startup/maintenance,
not as part of the regular service layer. Moving it to app/assets/scanner.py
makes the public API clearer.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add update_asset_info_name and update_asset_info_updated_at query functions
and update asset_management.py to use them instead of modifying ORM objects
directly. This ensures the service layer only uses explicit operations from
the queries package.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace dict/ORM object returns with explicit dataclasses to fix
DetachedInstanceError when accessing ORM attributes after session closes.
- Add app/assets/services/schemas.py with AssetData, AssetInfoData,
AssetDetailResult, and RegisterAssetResult dataclasses
- Update asset_management.py and ingest.py to return dataclasses
- Update manager.py to use attribute access on dataclasses
- Fix created_new to be False in create_asset_from_hash (content exists)
- Add DependencyMissingError for better blake3 missing error handling
- Update tests to use attribute access instead of dict subscripting
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make blake3 an optional import that fails gracefully at import time,
with a clear error message when hashing functions are actually called.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract focused helper functions to eliminate the try-finally block that
wrapped ~50 lines just for logging. The new helpers (_collect_paths_for_roots,
_build_asset_specs, _insert_asset_specs) make seed_assets a simple linear flow.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract helper functions to eliminate nested try-except blocks in scanner.py
and remove duplicated type-checking logic in asset_info.py. Simplify nested
conditionals in asset_management.py for clearer control flow.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>