Defense-in-depth over GET→POST alone: reject the three CORS-safelisted simple-form Content-Types (x-www-form-urlencoded, multipart/form-data, text/plain) on 16 no-body POST handlers (glob + legacy) to block <form method=POST> CSRF that bypasses method-only gating. Move comfyui_switch_version to a JSON body so the preflight requirement applies. Split db_mode/policy/update/channel_url_list into GET(read) + POST(write). Tighten do_fix (high → high+) and gate three previously-ungated config setters at middle. Resynchronize openapi.yaml (27 paths, 30 operations, ComfyUISwitchVersionParams as a shared $ref component). Add E2E harness variants, Playwright config, CSRF/secgate suites, 39-endpoint coverage, and a CHANGELOG. Breaking: legacy per-op POST routes (install/uninstall/fix/disable/update/ reinstall/abort_current) are removed; callers already use queue/batch. Legacy /manager/notice (v1) is removed; /v2/manager/notice is retained. Reported-by: XlabAI Team of Tencent Xuanwu Lab CVSS: 8.1 (AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H)
15 KiB
GUIDE_E2E_TEST — ComfyUI-Manager E2E Test Guide
Auto-generated by pair-scaffold (test-guide mode). Domain: api (primary) with mixed web (Playwright UI) + cli (cm-cli subprocess) elements. Date: 2026-04-21.
This guide consolidates session-wide E2E knowledge accumulated through the
WI-A..WI-ZZ sweep. It is the entry point for writing, running, and auditing
end-to-end tests against the ComfyUI-Manager backend (glob + legacy) and
the legacy management UI. Companion registry of reusable test patterns:
.claude/scaffold/e2e-methods.yml.
1. Testing Strategy
Three test surfaces cover different contract layers. Pick the surface that matches the contract you are validating — do not default to "everything".
| Surface | Location | Transport | Authoritative for |
|---|---|---|---|
| pytest E2E | tests/e2e/*.py |
Direct HTTP via requests against running aiohttp server |
Endpoint contract (status, schema, side-effect, CSRF) |
| Playwright UI (real) | tests/playwright/legacy-ui-*.spec.ts |
Browser click → real backend | UI↔backend wiring against actual server state |
| Playwright UI (mock) | tests/playwright/legacy-ui-mock-*.spec.ts |
Browser click → page.route() intercept |
UI→API request shape (URL / method / payload) without backend mutation |
| CLI subprocess | tests/cli/test_uv_compile.py |
subprocess.run() against cm-cli |
CLI command contract (args, exit code, stdout/stderr) |
Layering rule: when both pytest and Playwright can cover a behavior, use
pytest for backend contract and Playwright for UI wiring. Mock Playwright is
acceptable ONLY when real execution is infeasible (security gate, destructive
operation, long-running network) — flag it in the test name suffix
(-mock / WI-WW-mock) and document the honesty boundary (§6 below).
2. Test Categories
| Category | Description | Location | Tools |
|---|---|---|---|
| pytest E2E direct | Endpoint contract via HTTP | tests/e2e/ |
requests + pytest |
| pytest E2E negative | CSRF / 400 / input validation | tests/e2e/test_e2e_csrf*.py |
requests |
| Playwright real | UI click → real backend | tests/playwright/legacy-ui-*.spec.ts |
@playwright/test |
| Playwright mock | UI click → page.route mocked response |
tests/playwright/legacy-ui-mock-*.spec.ts |
@playwright/test page.route |
| CLI subprocess | cm-cli end-to-end | tests/cli/ |
pytest + subprocess.run |
3. Critical Environment Requirements
MUST read before writing any E2E test. These are not optional; every one cost at least one debugging cycle in the sweep.
3.1 Editable install requirement
After editing source under comfyui_manager/, re-run uv pip install -e .
inside the E2E venv. A running server uses whatever source the venv
registered at startup — stale source means stale behaviour. After commit
99caef55 (CSRF state-changing-endpoint conversion), skipping this step
caused POST routes to return 405 because the test venv still had the
pre-conversion routing table.
3.2 Legacy vs glob mutex
comfyui_manager/__init__.py:14-45 loads EITHER the glob manager OR the
legacy manager — never both. Use the fixture script that matches the
contract under test:
| Script | Manager | Extra flags |
|---|---|---|
tests/e2e/scripts/start_comfyui.sh |
glob | none |
tests/e2e/scripts/start_comfyui_legacy.sh |
legacy | --enable-manager-legacy-ui |
tests/e2e/scripts/start_comfyui_strict.sh |
glob | patches config.ini to security_level = strong (for negative-path 403 tests on middle / middle+ gates) |
tests/e2e/scripts/start_comfyui_permissive.sh |
legacy + relaxed | patches config.ini to security_level = normal- and sets ENABLE_LEGACY_UI=1 (for positive-path tests on high+ gates: comfyui_switch_version, install/git_url, install/pip) |
Mixing them in the same suite is an error — pytest modules that exercise
legacy-only endpoints MUST live in test_e2e_legacy_*.py so the fixture can
select the legacy script.
3.3 Port namespace + PID files
All fixture scripts write $LOG_DIR/comfyui.${PORT}.pid. Default ports:
| Manager | Default PORT | PID file |
|---|---|---|
| glob | 8188 | $LOG_DIR/comfyui.8188.pid |
| legacy | 8199 | $LOG_DIR/comfyui.8199.pid |
| strict | 8188 | $LOG_DIR/comfyui.8188.pid |
Because the PID file is per-port, glob + legacy suites can run concurrently
if they pick distinct ports. Never hard-code 8188 / 8199 in your
assertions — read PORT from env.
3.4 Safety env var — manager_requirements skip
start_comfyui.sh exports COMFYUI_MANAGER_SKIP_MANAGER_REQUIREMENTS=1
(added in WI-WW/WI-YY). Any install/update code path that would otherwise
reinstall manager_requirements.txt sees this flag and skips. Real install
tests (e.g. test_e2e_legacy_real_ops.py::TestInstallModelRealDownload)
depend on this — without it the test server would re-pin its own
dependencies mid-test and fail randomly.
3.5 E2E_ROOT
$E2E_ROOT is the venv root + artifact directory produced by
tests/e2e/scripts/setup_e2e_env.sh. All fixture scripts source from it.
Export it once per shell session, or pass it per-command:
export E2E_ROOT=</path/to/your/e2e-root> # one-time
Portability: never hard-code the absolute path inside test code. Read
from os.environ["E2E_ROOT"] or let the fixture script resolve it.
4. Fixture Patterns
4.1 comfyui module-scoped fixture
tests/e2e/conftest.py defines a module-scoped comfyui fixture that:
- Invokes
start_comfyui.shwith the current PORT. - Blocks until the server accepts requests (or timeout).
- Yields the server URL for the test.
- Tears down with
stop_comfyui.sh(kills the PID in the.pidfile).
4.2 Fixture variants
| Variant | Fixture script | Extra setup |
|---|---|---|
| glob (default) | start_comfyui.sh |
PORT=8188 |
| legacy | start_comfyui_legacy.sh |
PORT=8199, ENABLE_LEGACY_UI=1 |
| strict | start_comfyui_strict.sh |
Patches config.ini to security_level = strong (backup at config.ini.before-strict); fixture MUST restore on teardown. Used for negative-path 403 assertions on middle / middle+ gates. |
| permissive | start_comfyui_permissive.sh |
Patches config.ini to security_level = normal- (backup at config.ini.before-permissive) and sets ENABLE_LEGACY_UI=1; fixture MUST restore on teardown. Used for positive-path execution of high+ gated endpoints (comfyui_switch_version, install/git_url, install/pip) with hardcoded trusted inputs. |
4.3 Example invocation
E2E_ROOT=$E2E_ROOT $E2E_ROOT/venv/bin/python -m pytest \
tests/e2e/test_e2e_legacy_endpoints.py -v --timeout=300
Use the venv's interpreter explicitly ($E2E_ROOT/venv/bin/python) — not
the system python — so the test runs against the editable install from
§3.1.
5. Test Design Patterns
Distilled from the sweep. Match the pattern to your contract; do not
invent new patterns without recording them in
.claude/scaffold/e2e-methods.yml.
5.1 Positive-path
Status 200 + response schema + side-effect verification. Do not stop at status 200 — assert at least one schema field AND one observable side effect (disk artifact, queue entry, config change).
5.2 Negative-path
Status 400 / 403 / 405 + body message substring. CSRF-reject tests live
in test_e2e_csrf*.py and are parametrized over the full endpoint list.
5.3 Async queue polling
Install / update endpoints enqueue work and return immediately. Poll
/v2/manager/queue/status or the expected disk artifact until completion.
Never time.sleep() with a hard-coded interval — use expect.poll (TS)
or a polling helper with a timeout.
5.4 Teardown mandatory for mutation tests
Install / save / apply tests MUST restore the original state. Use
try/finally or a pytest fixture. A failing test that leaves a modified
config.ini or an installed custom_node poisons every subsequent test in
the suite.
5.5 @pytest.mark.xfail for known bugs
When a test documents a bug that is not yet fixed, mark it
@pytest.mark.xfail(reason=...). When the fix lands, flip to xfail(strict=True)
or remove the mark and include the fix commit hash in the removal commit
message (see test_reinstall_with_uv_compile / commit ddccefbc for the
pattern).
5.6 Playwright selector stability
Prefer title attribute — e.g. select[title^="Configure the channel"] —
over class selectors (shared across multiple combos). Label-based
(:has(span.text-muted:text-is("Channel"))) is second-best. Class-only
is brittle. See reports/legacy-ui-channel-combo-dom-mapping.md for the
channel-combo case study.
5.7 Playwright 2-hop mock
For fail-state UI scenarios: mock GET (inject a fake pack that puts the UI
into the failure state) + intercept POST (capture the payload the failure
handler fires). (The prior legacy-ui-mock-install.spec.ts::wi-015
exemplar was superseded by a real-E2E pre-seeded broken-pack test in
tests/e2e/test_e2e_legacy_real_ops.py::TestImportFailInfoReal — use
this technique only when real backend reproduction is infeasible.)
5.8 Playwright page.request.post fallback
For structurally unreachable UI triggers — e.g. the idle-state
queue/reset Stop button is display: none — use the browser-context
HTTP client to bypass the UI. Document the reason in the spec comment so
future readers don't "fix" it by forcing a click.
6. Security Gate Matrix
comfyui_manager/glob/utils/security_utils.py:20-26 gates endpoints by
risk level. The default security_level=normal allows middle+ but
rejects high+ with 403.
| Gate | Local only? | security_level must be in |
|---|---|---|
middle+ |
optional | {WEAK, NORMAL, NORMAL_} (default allows) |
high+ |
required | {WEAK, NORMAL_} (default NORMAL ⇒ 403) |
high+ endpoints (comfyui_switch_version, install/git_url,
install/pip, install_model with non-safetensors, etc.) require
start_comfyui_permissive.sh to be tested positively. That harness
backs up config.ini, patches security_level = normal-, sets
ENABLE_LEGACY_UI=1, and the pytest fixture restores the config on
teardown. Use hardcoded trusted inputs only (never user-derived) — the
default-security 403 contract is the positive-path security behavior
we want to preserve in production. The counterpart
start_comfyui_strict.sh harness (security_level = strong) is used to
exercise the 403 negative path for middle / middle+ gates. See the
WI-YY infeasibility log for which high+ endpoints still need harness
work.
Honesty boundary for mock-based closures: rows in
reports/api-coverage-matrix.md marked Y (WI-WW-mock) assert UI→API
wiring only — URL + method + payload shape. They do NOT assert backend
handler behavior; that is pytest's job. A regression that kept the UI
firing correctly but broke the backend would not be caught by the mock
test alone. Document the boundary in the spec comment.
7. Audit & Coverage Artifacts
Tracked reports under reports/. Update these when you add or retire
tests.
| Artifact | Purpose | Maintenance |
|---|---|---|
reports/e2e_verification_audit.md |
Per-test verdict matrix (✅ PASS / ⚠️ WEAK / ❌ INADEQUATE / N/A) with Summary table, per-file sections, TOTAL counts | Any new or retired test → update both the per-file section and Summary; then run scripts/verify_audit_counts.py |
reports/api-coverage-matrix.md |
39-endpoint × pytest + Playwright matrix | Update when a new endpoint is added or a coverage cell flips |
reports/test-bloat-inventory.md |
10-code bloat sweep (B1-BA), 127-test baseline | Run /pair-sweep test bloat identification for a re-baseline when churn exceeds 10 tests |
Verification script: scripts/verify_audit_counts.py parses
e2e_verification_audit.md and validates that the Summary row counts
match the per-file section counts and the TOTAL line. It MUST exit 0
before a doc-sync WI completes.
8. Test Bloat Checklist (B1-BA)
Use these codes when reviewing new tests. Sources:
reports/test-bloat-inventory.md + the sweep methodology in
.claude/pair-working/sessions/sweep-endpoint-coverage/scratch/goal-report-bloat-sweep.md.
| Code | Type | Criterion |
|---|---|---|
| B1 Redundant | Duplicate assertion | Same assertion or contract already covered by another test |
| B2 Over-parametrized | Boundary noise | Cases beyond boundary + equivalence class contribute zero |
| B3 Mixed-concerns | Multi-concern | One test mixes 3+ unrelated assertions |
| B4 Dead | Non-existent feature | Validates a removed / non-existent endpoint or API |
| B5 Smoke-only | Superficial | Only checks status 200 + dict type, no substantive validation |
| B6 Setup-heavy | Over-fixtured | 50+ lines of setup/teardown vs few assertions |
| B7 Stale-skip | Obsolete skip | pytest.skip reason no longer valid |
| B8 Title-mismatch | Misleading name | Function name does not match what it actually validates |
| B9 Copy-paste | DRY violation | ≥80% duplication with another test; helper/parametrization possible |
| BA Impl-detail | Implementation-coupled | Validates internal implementation, not contract |
Triage priority: B4 ⇒ remove. B1/B9 ⇒ parametrize or remove the duplicate. B8 ⇒ rename or refactor the assertion. B5/BA ⇒ strengthen or remove. B6/B7 ⇒ clean up. B2/B3 ⇒ case-by-case.
9. Running Tests
9.1 pytest E2E — glob manager
pytest tests/e2e/ -v --timeout=300
9.2 pytest E2E — legacy-only endpoints
pytest tests/e2e/test_e2e_legacy_endpoints.py tests/e2e/test_e2e_csrf_legacy.py -v
9.3 Playwright (legacy UI)
PORT=8199 npx playwright test tests/playwright/
9.4 CLI subprocess
pytest tests/cli/ -v
9.5 Audit verification
python3 scripts/verify_audit_counts.py
Exits 0 when reports/e2e_verification_audit.md is internally consistent
(Summary ↔ per-file ↔ TOTAL). Exit non-zero → drift between sections.
9.6 Full suite (reference invocation)
# pytest all, with explicit venv
E2E_ROOT=$E2E_ROOT $E2E_ROOT/venv/bin/python -m pytest tests/e2e/ -v --timeout=300
# Playwright, legacy UI mode
PORT=8199 npx playwright test tests/playwright/
# Audit verification
python3 scripts/verify_audit_counts.py
Appendix — Companion Files
.claude/scaffold/e2e-methods.yml— living registry of the test patterns described in §5 plus the bloat codes from §8. Append entries when you establish a new pattern.reports/e2e_verification_audit.md— per-test verdict matrix.reports/api-coverage-matrix.md— endpoint × coverage matrix.reports/test-bloat-inventory.md— 127-test sweep baseline.reports/legacy-ui-channel-combo-dom-mapping.md— DOM selector case study for the Channel combo.