Commit Graph

13 Commits

Author SHA1 Message Date
Luke Mino-Altherr
6b48144751 refactor(assets): merge AssetInfo and AssetCacheState into AssetReference
This change solves the basename collision bug by using UNIQUE(file_path) on the
unified asset_references table. Key changes:

Database:
- Migration 0005 merges asset_cache_states and asset_infos into asset_references
- AssetReference now contains: cache state fields (file_path, mtime_ns, needs_verify,
  is_missing, enrichment_level) plus info fields (name, owner_id, preview_id, etc.)
- AssetReferenceMeta replaces AssetInfoMeta
- AssetReferenceTag replaces AssetInfoTag
- UNIQUE constraint on file_path prevents duplicate entries for same file

Code:
- New unified query module: asset_reference.py (replaces asset_info.py, cache_state.py)
- Updated scanner, seeder, and services to use AssetReference
- Updated API routes to use reference_id instead of asset_info_id

Tests:
- All 175 unit tests updated and passing
- Integration tests require server environment (not run here)

Amp-Thread-ID: https://ampcode.com/threads/T-019c4fe8-9dcb-75ce-bea8-ea786343a581
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
fd30787e98 Add disable/enable methods to AssetSeeder to respect --disable-assets-autoscan flag
- Added disable(), enable(), and is_disabled() methods to AssetSeeder
- start() now checks is_disabled() and returns early if disabled
- Updated main.py to call asset_seeder.disable() when CLI flag is set
- Fixes bypass where /object_info would trigger scans regardless of flag

Amp-Thread-ID: https://ampcode.com/threads/T-019c4f66-6773-72d2-bdfe-b55f5aa76021
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
4f29877939 Add lifecycle logging to asset seeder
Log pause, resume, cancel, and restart events

Amp-Thread-ID: https://ampcode.com/threads/T-019c4f56-3fe1-72cb-888a-3ac4ac99b3d7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
b407a80d5a Improve asset scanner logging
- Add log when scanner start is requested and when skipped due to already running
- Remove noisy 'no mime_type' info log (expected during fast stub phase)

Amp-Thread-ID: https://ampcode.com/threads/T-019c4f56-3fe1-72cb-888a-3ac4ac99b3d7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
b89e5de40e Add pause/resume/stop/restart controls to AssetSeeder
- Add PAUSED state to state machine
- Add pause() method - blocks scan at next checkpoint
- Add resume() method - unblocks paused scan
- Add stop() method - alias for cancel()
- Add restart() method - cancel + wait + start with same/overridden params
- Add _check_pause_and_cancel() helper for checkpoint locations
- Emit assets.seed.paused and assets.seed.resumed WebSocket events
- Update get_object_info to use async seeder instead of blocking seed_assets
- Scan all roots (models, input, output) on object_info, not just models

Amp-Thread-ID: https://ampcode.com/threads/T-019c4f2b-5801-711c-8d47-bd1525808d77
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
c7368205e3 feat: implement two-phase scanning architecture (fast + enrich)
Phase 1 (FAST): Creates stub records with filesystem metadata only
- path, size, mtime - no file content reading
- Populates asset database quickly on startup

Phase 2 (ENRICH): Extracts metadata and computes hashes
- Safetensors header parsing, MIME types
- Optional blake3 hash computation
- Updates existing stub records

Changes:
- Add ScanPhase enum (FAST, ENRICH, FULL)
- Add enrichment_level column to AssetCacheState (0=stub, 1=metadata, 2=hashed)
- Add build_stub_specs() for fast scanning without metadata extraction
- Add get_unenriched_cache_states(), enrich_asset(), enrich_assets_batch()
- Add start_fast(), start_enrich() convenience methods to AssetSeeder
- Update start() to accept phase parameter (defaults to FULL)
- Split _run_scan() into _run_fast_phase() and _run_enrich_phase()
- Add migration 0003_add_enrichment_level.py
- Update tests for new architecture

Amp-Thread-ID: https://ampcode.com/threads/T-019c4eef-1568-778f-aede-38254728f848
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
bd17ee3dc9 Fix ruff linting issues
- Remove debug print statements
- Remove trailing whitespace on blank lines
- Remove unused pytest import

Amp-Thread-ID: https://ampcode.com/threads/T-019c3a8d-3b4f-75b4-8513-1c77914782f7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
a2d26dece5 Add optional blake3 hashing during asset scanning
- Make blake3 import lazy in hashing.py (only imported when needed)
- Add compute_hashes parameter to AssetSeeder.start(), build_asset_specs(), and seed_assets()
- Fix missing tag clearing: include is_missing states in sync when update_missing_tags=True
- Clear is_missing flag on cache states when files are restored with matching mtime/size
- Fix validation error serialization in routes.py (use json.loads(ve.json()))

Amp-Thread-ID: https://ampcode.com/threads/T-019c3614-56d4-74a8-a717-19922d6dbbee
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
93068c0d6d Fix concurrency issues in AssetSeeder
- Fix race in mark_missing_outside_prefixes: set state to RUNNING inside
  lock before operations, restore to IDLE in finally block to prevent
  concurrent start() calls

- Fix timing consistency: capture perf_counter before _update_progress
  for consistent event timing

Amp-Thread-ID: https://ampcode.com/threads/T-019c354b-e7d7-7309-aa0e-79e5e7dff2b7
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
a51bbd0b25 feat: non-destructive asset pruning with is_missing flag
- Add is_missing column to AssetCacheState for soft-delete
- Replace hard-delete pruning with mark_cache_states_missing_outside_prefixes
- Auto-restore missing cache states when files are re-scanned
- Filter out missing cache states from queries by default
- Rename functions for clarity:
  - mark_cache_states_missing_outside_prefixes (was delete_cache_states_outside_prefixes)
  - get_unreferenced_unhashed_asset_ids (was get_orphaned_seed_asset_ids)
  - mark_assets_missing_outside_prefixes (was prune_orphaned_assets)
  - mark_missing_outside_prefixes_safely (was prune_orphans_safely)
- Add restore_cache_states_by_paths for explicit restoration
- Add cleanup_unreferenced_assets for explicit hard-delete when needed
- Update API endpoint /api/assets/prune to use new soft-delete behavior

This preserves user metadata (tags, etc.) when base directories change,
allowing assets to be restored when the original paths become available again.

Amp-Thread-ID: https://ampcode.com/threads/T-019c3114-bf28-73a9-a4d2-85b208fd5462
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
b4f5bb2faa refactor: make scanner helper functions public
Rename _sync_root_safely, _prune_orphans_safely, _collect_paths_for_roots,
_build_asset_specs, and _insert_asset_specs to remove underscore prefix
since they are used by seeder.py as part of the public API.

Amp-Thread-ID: https://ampcode.com/threads/T-019c3037-df32-7138-99d8-b4b824d896b3
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
4b3ad132cf Decouple orphan pruning from asset seeding
- Remove automatic pruning from scan loop to prevent partial scans from
  deleting assets belonging to other roots
- Add get_all_known_prefixes() helper to get prefixes for all root types
- Add prune_orphans() method to AssetSeeder for explicit pruning
- Add prune_first parameter to start() for optional pre-scan pruning
- Add POST /api/assets/prune endpoint for explicit pruning via API
- Update main.py startup to use prune_first=True for full startup scans
- Add tests for new prune_orphans functionality

Fixes issue where a models-only scan would delete all input/output assets.

Amp-Thread-ID: https://ampcode.com/threads/T-019c2ba0-e004-7229-81bf-452b2f7f57a1
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00
Luke Mino-Altherr
3fa52b37ef feat(assets): add background asset seeder for non-blocking startup
- Add AssetSeeder singleton class with thread management and cancellation
- Support IDLE/RUNNING/CANCELLING state machine with thread-safe access
- Emit WebSocket events for scan progress (started, progress, completed, cancelled, error)
- Update main.py to use non-blocking asset_seeder.start() at startup
- Add shutdown() call in finally block for graceful cleanup
- Update POST /api/assets/seed to return 202 Accepted, support ?wait=true
- Add GET /api/assets/seed/status and POST /api/assets/seed/cancel endpoints
- Update test helper to use ?wait=true for synchronous behavior
- Add 17 unit tests covering state transitions, cancellation, and thread safety
- Log scan configuration (models directory, input/output paths) at scan start

Amp-Thread-ID: https://ampcode.com/threads/T-019c2b45-e6e8-740a-b38b-b11daea8d094
Co-authored-by: Amp <amp@ampcode.com>
2026-02-24 11:34:44 -08:00