Commit Graph

3 Commits

Author SHA1 Message Date
Matt Miller
22b25fcd26 fix: make job cancel atomic and best-effort
Addresses two cancel races/edges raised in review.

Targeted, atomic interrupt. cancel_job's interrupt callback now takes the
prompt id and returns whether it fired; the single-cancel route backs it
with the new PromptQueue.interrupt_if_running, which checks the running set
and signals the interrupt under the queue mutex. This closes the TOCTOU
where a pending job that starts executing between the snapshot and dequeue
(or a running job that finishes between the snapshot and interrupt) could be
missed or, worse, cause an unrelated prompt to be interrupted. The per-prompt
interrupt-flag reset in execute_async keeps a finished job from leaking the
interrupt onto its successor.

Best-effort batch cancel. POST /api/jobs/cancel no longer fails the whole
batch with 404 when one id is unknown/finished; such ids are treated as
no-ops, so "cancel all" still cancels the in-progress jobs even if some
finished between the client's snapshot and the request. Malformed ids are
still rejected with 400.
2026-06-19 16:18:39 -07:00
Matt Miller
dabe0d56a4 fix: resolve review feedback on cancel endpoints
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
- Guard cancel_job() against TOCTOU: when dequeue() returns False the
  pending job left the queue between snapshot and delete; return
  CANCEL_UNKNOWN so callers never report cancelled=True for a remove
  that did not happen.
- Validate each job_ids element in the batch cancel endpoint before
  any queue access; unhashable or non-UUID values now return 400
  instead of raising TypeError (500).
- Update batch HTTP tests to use canonical UUID ids (required now that
  the endpoint validates id format) and add tests for the new guards.
2026-06-18 11:04:09 -07:00
Matt Miller
f982d011d9 Add jobs-namespace cancel endpoints
Add two cancel endpoints under the jobs namespace so a job can be
cancelled by id without the caller needing to know whether the job is
running or pending, or branching between /interrupt and /queue.

- POST /api/jobs/{job_id}/cancel cancels one job by id. Idempotent: an
  already-finished or unknown id returns 200 {"cancelled": false} rather
  than an error.
- POST /api/jobs/cancel takes {"job_ids": [...]} and cancels a batch.
  Fail-fast: if any id is unknown the request returns 404 listing the
  unknown ids and cancels nothing (no partial side effects).

Both are state-agnostic and map onto the existing queue mechanics: a
running job is interrupted (same path as /interrupt), a pending job is
dequeued (same path as /queue {"delete": [...]}). The cancel logic lives
in comfy_execution.jobs as pure, unit-tested helpers; the server handlers
are thin wrappers. openapi.yaml documents both routes.
2026-06-15 18:25:38 -07:00