Use the RAM right up to the wire as the community is bit accustomed too.
This trades off headroom for the case where large chunky intermediates
arrive and potenitally hits pagefile/swap, but a lot of people have
"it just fits" workflows out there, so strike a compromise with
75->90%.
Disable the incative cache for all but the very high RAM users.
* openapi: add enum values + FeedbackRequest schema for cloud cutover (PR E)
Adds missing cloud-runtime enum values to vendor schemas that the
cloud runtime emits but vendor declared as plain strings.
Changes:
- JobEntry.status: enum [pending, in_progress, completed, failed, cancelled]
- JobDetailResponse.status: same enum
- BillingStatus: enum [awaiting_payment_method, pending_payment, paid,
payment_failed, inactive]
- FeedbackRequest schema added (with type enum)
- /api/feedback POST: requestBody now $refs FeedbackRequest
All cloud-runtime-emitted; no impact on OSS-local semantics.
Identified via Comfy-Org/cloud's TestCutoverSafe gate (BE-1106) as
the remaining schema-level divergences after PRs A-D landed and got
synced.
* openapi: add type enum to Workspace schema (cutover follow-up)
Cloud's Workspace runtime shape includes a 'type' field with enum
[personal, team] that vendor's Workspace was missing. Cloud handlers
reference the generated ingest.WorkspaceType Go enum.
Same kind of surgical addition as JobEntry.status / BillingStatus /
JobDetailResponse.status in this PR — adds cloud-runtime field to
existing vendor schema.
* openapi: rename cloud-side response schemas to match runtime (PR D)
Follow-up to the BE-1106 stack (#14060/61/63). Cloud's Go handlers
reference response schemas by name (e.g., ingest.WorkflowResponse,
ingest.SubscribeResponse), but vendor's matching operations were
declaring those responses against differently-named vendor-side
schemas (CloudWorkflow, BillingSubscription, etc.). After the stack
landed, schemas like WorkflowResponse exist in vendor but weren't
referenced by any path, so codegen pruned the unreferenced types.
This PR:
1. Updates 34 operation $refs in cloud-runtime paths to point to
the schema names cloud's handlers expect (e.g., CloudWorkflow →
WorkflowResponse on /api/workflows/{workflow_id}).
2. Adds 12 cloud-only schemas that weren't in vendor yet but are
referenced by these renames (e.g., SubscribeResponse,
CancelSubscriptionResponse, BillingOpStatusResponse). Each
copied verbatim from Comfy-Org/cloud's hand-written ingest spec
and tagged x-runtime: [cloud] with a [cloud-only] description
prefix.
Schema renames span the same domains as the operationId renames in
PR A: billing/subscriptions (7 schemas), workflows (5), userdata (3),
jobs (2), hub (2), history (2), auth/workspace (4), and misc cloud
endpoints (9).
Convergent safety check after this lands (against cloud's
TestCutoverSafe gate, BE-1106):
Pre-PR D: 205 missing handler refs
Post-PR D: 105 missing handler refs (-49%)
Cumulative since the original 938-ref baseline: -89%
The remaining 105 are a Phase 3 follow-up (response headers,
text/plain responses, codegen-derived enum sub-types, and a small
set of inline-response-schema operations that vendor declares
inline where cloud has named-schema $refs).
* openapi: drop PR-label comment from new schemas block
PR-internal labels don't belong in committed code — future readers
won't know what 'PR D' means and the marker stops being useful the
moment this PR merges.
* openapi: rename 55 cloud-side operationIds to match runtime handlers
For the 55 operations below, vendor's operationId did not match the
name cloud's runtime handlers expect. Generated types from vendor
therefore had different names (e.g. CreateSubscription200JSONResponse)
than what cloud handlers reference (Subscribe200JSONResponse), which
blocks the post-cutover combined-spec codegen.
All 55 renames target the cloud-runtime-authoritative name. Several
of these endpoints are shared concepts (queue, settings, userdata,
object_info) that OSS local also serves — the rename aligns vendor
with the longstanding cloud handler-side convention to unblock the
shared codegen. No request/response *shape* changes in this PR; only
operationId labels.
Notable categories:
- Billing/subscriptions: 7 renames (subscribe, getBillingPlans, ...)
- Workspace + workflows: 13 renames (createWorkflow, ...)
- Hub: 3 renames
- Auth/users: 5 renames
- Shared OSS surface (settings, queue, view, userdata): 12 renames
- Misc cloud-only: 15 renames
Identified via Comfy-Org/cloud's TestCutoverSafe build-safety gate
(BE-1106), which compares handler type references against codegen
output from the combined spec.
* fix(openapi): resolve getHistory operationId collision
Spectral flagged: both /api/history (OSS local) and /api/history_v2
(cloud) had operationId 'getHistory' after the rename. Rename vendor's
/api/history to 'getPromptHistory' to disambiguate. Cloud's runtime
denies /api/history at the overlay level so combined codegen is
unaffected by this change.
* openapi: add 41 cloud-runtime schemas to components.schemas (PR B of 3) (#14061)
* openapi: add 41 cloud-runtime schemas to components.schemas (cutover prep)
Adds schemas that exist in Comfy-Org/cloud's hand-written ingest spec
but not yet in this vendored OSS spec. All tagged x-runtime: [cloud]
per the field-drift convention and prefixed with [cloud-only] in the
description.
These schemas are referenced by cloud's Go handlers via the generated
ingest.<Schema> Go type names. Codegen from the vendored spec didn't
produce those types because the schemas weren't declared here. Adding
them unblocks the post-cutover combined-spec codegen.
Schemas added (alphabetical):
AssetDownloadResponse, AssetMetadataResponse, BillingBalanceResponse,
BillingPlansResponse, BillingStatusResponse, GetUserDataResponseFull,
HistoryDetailEntry, HistoryDetailResponse, HistoryResponse,
HubLabelInfo, HubProfileSummary, HubWorkflowListResponse,
HubWorkflowStatus, HubWorkflowSummary, HubWorkflowTemplateEntry,
JobStatusResponse, JobsListResponse, LabelRef, LogsResponse, Member,
OAuthRegisterBadRequestResponse, PendingInvite, Plan, PlanAvailability,
PlanAvailabilityReason, PlanSeatSummary, PreviewPlanInfo,
PreviewSubscribeResponse, PublishedWorkflowDetail, SecretResponse,
SubscriptionDuration, SubscriptionTier, UserDataResponseFull,
ValidationError, ValidationResult, WorkflowForkedFrom, WorkflowResponse,
WorkflowVersionContentResponse, WorkspaceAPIKeyInfo, WorkspaceSummary,
WorkspaceWithRole
Identified via Comfy-Org/cloud's TestCutoverSafe build-safety gate
(BE-1106). Companion to PR #14060 (operationId renames).
* fix(openapi): add BindingErrorResponse schema
OAuthRegisterBadRequestResponse references BindingErrorResponse but
that schema wasn't in the original add. Adding it now as a cloud-only
schema matching the cloud runtime's binding-error shape (single
'message' string field).
* openapi: add missing 4xx/5xx response bodies for cloud-emitting endpoints (#14063)
Vendor declares shared endpoints (e.g. /api/queue, /api/settings,
/api/assets/*, /api/billing/*) with success responses but is missing
many of the 4xx/5xx error response bodies that Comfy-Org/cloud's
runtime actually emits. Cloud's Go handlers reference the generated
ingest.Op<StatusCode>JSONResponse types for these missing statuses,
which currently fail to resolve when codegen runs against the
vendored spec.
This PR adds 237 response entries across 117 operations, restoring
the documented error responses that cloud emits. Bodies are copied
verbatim from Comfy-Org/cloud's hand-written ingest spec
(services/ingest/openapi.yaml) and reference a new ErrorResponse
schema also added in this PR (matches cloud's {code, message} runtime
shape, tagged x-runtime: [cloud]).
ErrorResponse is intentionally separate from the existing CloudError
schema. CloudError's shape ({error}) describes one runtime; cloud
emits a different shape ({code, message}). Existing CloudError refs
in vendor are untouched; new cloud-emitting error references use
ErrorResponse.
Identified via Comfy-Org/cloud's TestCutoverSafe build-safety gate
(BE-1106). Companion to PR #14060 (operationId renames) and PR #14061
(cloud-only schema additions).
* openapi: align response declarations with implementation (5 endpoints)
- POST /api/assets/download: replace 200 with 202 + tracking-task body
(endpoint runs asynchronously and returns task_id/status/message).
- POST /api/assets/export: same 200 → 202 + tracking-task body.
- POST /api/assets/from-workflow: change 201 → 200 (handler responds 200,
not 201; no Location header emitted).
- POST /api/feedback: change 200 → 201 (creates a feedback record).
- /api/jobs and /api/jobs/{job_id}: change timestamp fields from
type: number to type: integer + format: int64. Values are Unix
milliseconds — number causes oapi-codegen to emit float64, losing
precision and producing the wrong Go type. Affected fields:
create_time, update_time, execution_start_time, execution_end_time.
Verification: each change reflects what the endpoint observably returns;
no handler changes required. Backwards-compatible for existing clients
(integer is a subset of number; status code shifts within 2xx).
* openapi: align asset download/export 202 status enum with runtime + sibling schemas
CodeRabbit caught a vocabulary mismatch: the two new 202 response schemas
declared `[pending, running, completed, failed]` while the rest of the same
spec uses `[created, running, completed, failed]` for the identical task
lifecycle (download/export progress WebSocket events, /api/tasks, TaskEntry,
TaskResponse — 4 sites total). Cloud's runtime emits `created` on initial
creation (AssetDownloadResponseStatusCreated; task.Status sourced from the
DB enum whose initial value is Created). `pending` would have introduced a
fifth, contradictory vocabulary for the same lifecycle and pushed the spec
further from the implementation it is meant to align with.
Followup tracked separately: extract a shared TaskStatus enum so all five
sites move in lockstep instead of needing per-site edits.
Char-count guards on value/id can still let multibyte or escape-heavy
inputs blow past MAX_ENCODED_CURSOR_LENGTH once UTF-8 + escape expansion
+ base64url runs. A 512-character name of 'é' (2 bytes UTF-8) or '<'
(serializes to the 6-byte '<' escape) passes the char check, mints
a ~1500-byte cursor, then 400s when handed back on the next request.
Compute the final encoded form and reject it before returning if it
exceeds the wire cap. Adds regression tests for both inflation paths.
Six fact-checked findings from the multi-model review pass:
- Encoder/decoder length asymmetry: encode_cursor now rejects empty id,
oversized id (>128), oversized value (>512), and invalid order tokens
symmetrically with decode_cursor. Prevents the same server from minting
a cursor it then 400s on the next request (e.g. a filesystem-scanned
asset name >512 chars). The bad-order path now raises InvalidCursorError
(still subclasses ValueError) so route-layer handling stays uniform.
- Raw U+2028/U+2029 in cursor.py source: ripgrep treated those lines as
line-terminators, confirming the bytes were the actual separators. Any
editor save / autoformat / git tooling that normalizes invisibles would
silently break the encoder. Replaced with explicit /
Python escape sequences.
- set(seen) == set(names) hid ordering regressions: a cursor walk that
dropped a row at a page boundary or returned duplicates could pass.
Reworked the assertion to (1) reject duplicates, (2) require full
coverage, and (3) assert strict positional order for size sort, the
only field with a clock-independent ordering.
- Flaky time.sleep(0.05) between inserts: Windows CI clock resolution is
~15ms, so back-to-back inserts under load could collide and exercise
the tiebreaker instead of the documented path. Removed the sleep and
let the strengthened assertion above carry coverage / no-duplicates,
with size sort carrying strict order.
- Cursor error envelope diverged from the rest of routes.py: cursor 400s
emitted {error: {code, message}} while every other 400 in the file
emits {error: {code, message, details}} via _build_error_response.
Switched to _build_error_response and added the details field to the
AssetsApiError schema in openapi.yaml.
- "Byte-identity fixtures" only checked substring containment, defeating
the test class's stated purpose of pinning the wire format. Switched
to exact-bytes equality against an inline expected payload string per
fixture, so any whitespace / key-order / escape drift fails loudly.
Also dropped Go / json.Marshal references from docstrings — the byte
format is the contract, not the runtime that mints it.
The boundary test was building a cursor without the required `o` key, so
decode failed on the missing-order branch before reaching the µs-overflow
path the test is asserting. Both paths return 400 INVALID_CURSOR so the
assertion passed for the wrong reason. Add `o` to the payload and matching
`order=` to the request so the decode reaches the intended branch.
* ModelPatcherDyanmic: purge stale vbar allocs on force cast
* ModelPatcherDynamic: restore backups before load
If doing a clean reload, mutative changes (lora application) could be
applied on-top of the already loaded weight. Restore from backup
unconditionally so that the new load is clean.
Strip prose references to sibling Go implementations and external
ticket IDs from cursor.py, the cursor tests, the keyset integration
tests, asset_management's sort-field comment, and the legacy
prompt_id alias comment. Pure docstring/comment scrub — no behavior
or wire-format changes. x-runtime: [cloud] field annotations in
openapi.yaml are unchanged; those are the spec's structural
cross-runtime convention, not internal references.
The job_ids query parameter on GET /api/assets is tagged x-runtime:
[cloud] and only exists for cloud's variant of this endpoint. Cloud
removed all consumers and the cloud-side handler/codegen/tests in
Comfy-Org/cloud#3778. With cloud no longer accepting this parameter,
the [cloud-only] documentation here is wrong — drop it so the daily
sync to cloud/services/ingest/vendor/openapi.yaml propagates the
removal.
The operation at POST /api/assets/import was defined as `importAssets`
with a URL-list body shape, but no runtime actually serves that
operation at this path. The cloud runtime serves a different operation
here — `importPublishedAssets` — which imports published-workflow
assets into the caller's library by ID, not by URL.
Cloud's URL-based asset ingestion lives at separate paths
(POST /assets/download + GET /assets/remote-metadata) tracked
elsewhere; nothing in this PR affects that work.
Changes:
- Replace the operation at POST /api/assets/import with
`importPublishedAssets`, taking ImportPublishedAssetsRequest
(published_asset_ids + optional share_id) and returning
ImportPublishedAssetsResponse (list of AssetInfo).
- Remove the unused AssetImportRequest component schema (no other
references in the spec).
- Operation and schemas tagged x-runtime: [cloud] with [cloud-only]
description prefix, matching the existing convention for
cloud-runtime-only operations elsewhere in the spec.
Spectral lint passes (0 errors); the two hint-level findings on
the spec are pre-existing and unrelated.
No FE consumer references AssetImportRequest today; this is a pure
spec correction to match what the cloud runtime actually serves.
Add the OAuth 2.1 authorization flow and RFC 7591 Dynamic Client
Registration endpoints to the shared spec, alongside the existing
auth-tagged operations (/api/auth/session, /api/auth/token,
/.well-known/jwks.json). All tagged x-runtime: [cloud] with a
[cloud-only] description prefix, following the established
convention for cloud-runtime-only operations.
Endpoints:
- GET /.well-known/oauth-authorization-server (RFC 8414 metadata)
- GET /.well-known/oauth-protected-resource (RFC 9728 metadata)
- GET /oauth/authorize (consent challenge)
- POST /oauth/authorize (consent submission)
- POST /oauth/token (RFC 6749 §3.2)
- POST /oauth/register (RFC 7591 §3.1 DCR)
Component schemas added:
- OAuthAuthorizationServerMetadata
- OAuthProtectedResourceMetadata
- OAuthConsentChallenge, OAuthConsentChallengeWorkspace
- OAuthAuthorizeRedirectResponse
- OAuthTokenResponse, OAuthTokenError
- OAuthRegisterRequest, OAuthRegisterResponse, OAuthRegisterError
These endpoints are implemented in the cloud runtime today and
are called by browser frontends rendering the consent UI and by
MCP-spec-compliant clients (Claude Desktop, Cursor, etc.) doing
auto-discovery + self-registration. Documenting them in the
shared spec lets the cloud frontend generate types directly from
this spec instead of maintaining a parallel definition.
Spectral lints clean (0 errors). The hint-level findings on
OAuthTokenError / OAuthRegisterError ("standard error schema")
match the same hint on CloudError — these are protocol-specific
RFC-shaped errors, not generic application errors.
Cursor pagination hasn't shipped on either runtime yet — this PR is
still draft and cloud's mirror is just behind it — so there are no
legacy no-o cursors in the wild. Make o mandatory from day one
rather than landing permissive and tightening later.
decode_cursor now rejects any payload without o (or with a non-string
o) as malformed. CursorPayload.order becomes a required str. Tests
that constructed CursorPayload directly now pass order="desc";
test_legacy_cursor_without_order_accepted flips to
test_cursor_without_order_rejected.
* model_management: disable non-dynamic smart memory
Disable smart memory outright for non dynamic models.
This is a minor step towards deprecation of --disable-dynamic-vram
and the legacy ModelPatcher.
This is needed for estimate-free model development, where new models
can opt-out of supplying a memory estimate and not have to worry
about hard VRAM allocations due to legacy non-dynamic model patchers
This is also a general stability increase for a lot of stray use cases
where estimates may still be off and going forward we are not going
to accurately maintain such estimates.
* pinned_memory: implement with aimdo growable buffer
Use a single growable buffer so we can do threaded pre-warming on
pinned memory.
* mm: use aimdo to do transfer from disk to pin
Aimdo implements a faster threaded loader.
* Add stream host pin buffer for AIMDO casts
Introduce per-offload-stream HostBuffer reuse for pinned staging,
include it in cast buffer reset synchronization.
Defer actual casts that go via this pin path to a separate pass
such that the buffer can be allocated monolithically (to avoid
cudaHostRegister thrash).
* remove old pin path
* Implement JIT pinned memory pressure
Replace the predictive pin pressure mechanism with JIT PIN memory
pressure.
* LowVRAMPatch: change to two-phase visit
* lora: re-implement as inplace swiss-army-knife operation
* prepare for multiple pin sets
* implement pinned loras
* requirements: comfy-aimdo 0.4.0
* ops: remove unused arg
This was defeatured in aimdo iteration
* ops: sync the CPU with only the offload stream activity
This was syncing with the offload stream which itself is synced with the
compute stream, so this was syncing CPU with compute transitively. Define
the event to sync it more gently.
* pins: implement freeing intermediate for pinned memory
Pinning is more important than inactive intermediates and the stream
pin buffer is more important than even active intermediates.
* execution: implement pin eviction on RAM presure
Add back proper pin freeing on RAM pressure
* implement pin registration swaps
Uncap the windows pins from 50% by extending the pool and have a pressure
mechanism to move the pin reservations om demand.
This unfortunately implies a GPU sync to do the freeing so significant
hysterisis needs to be added to consolidate these pressure events.
* cli_args/execution: Implement lower background cache-ram threshold
Limit the amount of RAM background intermediates can use, so that
switching workflows doesn't degrade performance too much.
* make default
* bump aimdo
* model-patcher: force-cast tiny weights
Flux 2 gets crazy stalls due to a mix of tiny and giant weights
creating lopsided steam buffer rotations which creates stalls.
* ops: refactor in prep for chunking
* mm: delegate pin-on-the-way to aimdo
Aimdo is able to chunk and slice this on the way for better CPU->GPU
overlap. The main advantage is the ability to shorten the bus contention
window between previous weight transfer and the next weights vbar
fault.
* bump aimdo
* pinning updates
* specify hostbuf max allocation size
There a signs of virtual memory exhaustion on some linux systems when
throwing 128GB for every little piece. Pass the actual to save aimdo
from over-estimates
* tests: update execution tests for caching
The default caching changed to ram-cache so update these tests
accordingly.
Remove the LRU 0 test as this also falls through to RAM cache.
- Soften offset param prose: it's not deprecated, just not preferred for
sequential walks. Random-access UIs (jump-to-page, item count displays)
legitimately still want offset, so dropping the 'deprecated' framing
rather than promoting it to a machine-readable deprecated:true flag.
- Add explicit HTTP status assertions before every json() / next_cursor
read in test_list_cursor.py so a failing request surfaces as an HTTP
error instead of a confusing KeyError on a 4xx/5xx body.
Address three needs-judgment items from the cursor-review judge synthesis:
1. Cursor wire format now includes an "o" key carrying the sort
direction ("asc" / "desc") it was minted under. A request that
replays the cursor with a flipped `order` parameter is rejected
with 400 INVALID_CURSOR instead of silently walking the wrong
direction. Legacy cursors without "o" still decode (the binding
is best-effort until cloud mirrors the field — follow-up filed
separately).
2. JSON serialization now escapes `<`, `>`, `&`, U+2028, U+2029
to mirror Go's default `json.Marshal` behavior. Without this, an
asset name containing those characters produced different bytes on
Python vs cloud Go. The escaped form is what both runtimes emit.
3. Add direct query-layer tests for the keyset tiebreaker — the secondary
ORDER BY id branch was previously unexercised. Two scenarios: all
rows share a primary sort value, and mixed ties straddle page
boundaries. Both assert no row is dropped or duplicated across the
walk.
Wire-format note: Python cursors now differ from current cloud cursors
by exactly the "o" key. Cloud follow-up will bring the two back into
byte alignment.
- Mint next_cursor on every cursor-supported sort, not only when 'after'
was supplied. A first request (no 'after') previously returned
next_cursor=None, leaving cursor mode unreachable from a clean start.
- Over-fetch limit+1 so an exactly-full terminal page doesn't mint a
spurious cursor pointing at a phantom next page.
- Map crafted out-of-range microsecond cursors (OverflowError / OSError
in datetime construction) to 400 INVALID_CURSOR instead of leaking 500.
- Bump MAX_CURSOR_VALUE_LENGTH 256 -> 512 to match the AssetReference
name column max; without this, a long-named asset minted a cursor the
same server then refused on the next request. Cross-runtime byte
identity with cloud is unaffected because no cloud cursor ever carries
a value > 256 (cloud schema doesn't permit it).
- Return None from _encode_next_cursor when the boundary row carries a
NULL sort value (e.g. an Asset without size_bytes backfilled), instead
of silently encoding 0 and mis-positioning the keyset.
- Fix schemas_in.py comment so it matches actual handler behavior
(last_access_time + 'after' raises 400, does not fall back).
- Add AssetsApiError schema + 400 response to GET /api/assets in
openapi.yaml so generated clients know the INVALID_CURSOR envelope.
- Extend integration coverage: first-page mint, exact-multiple terminal
page, cursor walks for created_at/updated_at/size sorts, datetime
overflow surfaces as 400 not 500.
- Add unit coverage for datetime overflow and 512-char round-trip.
Adds integration tests for: full cursor walk, invalid-cursor 400,
sort/cursor mismatch 400, cursor-wins-over-offset, absent next_cursor
when no more results, and pagination stability across deletes.
list_assets_page accepts an opaque 'after' cursor and returns
next_cursor when more pages are available. The query applies a keyset
WHERE clause and a secondary ORDER BY id for deterministic tiebreak.
Cursor sort field is validated against the request sort, and a
last_access_time sort (OSS-only) falls back to offset/limit. Offset is
ignored whenever a cursor is supplied.
Port of cloud common/pagination/cursor.go. Wire format is base64url of
{"s", "v", "id"} JSON; times are Unix microseconds UTC to match
PostgreSQL timestamp precision.
Includes a byte-identity fixture pinned against the cloud Go wire
format so cross-runtime FE pagination can't silently drift.
Add 'after' query param and 'next_cursor' response field for keyset
pagination. Matches the cloud Go implementation (BE-893) so frontend
sees a unified contract across runtimes. Offset/limit remain as a
deprecated fallback.