ComfyUI-Manager/docs/guide/GUIDE_E2E_TEST.md
Dr.Lt.Data 4410ebc6a6
Some checks are pending
Publish to PyPI / build-and-publish (push) Waiting to run
Python Linting / Run Ruff (push) Waiting to run
fix(security): harden CSRF with Content-Type gate and expand E2E coverage (#2818)
Defense-in-depth over GET→POST alone: reject the three CORS-safelisted
simple-form Content-Types (x-www-form-urlencoded, multipart/form-data,
text/plain) on 16 no-body POST handlers (glob + legacy) to block
<form method=POST> CSRF that bypasses method-only gating. Move
comfyui_switch_version to a JSON body so the preflight requirement applies.
Split db_mode/policy/update/channel_url_list into GET(read) + POST(write).
Tighten do_fix (high → high+) and gate three previously-ungated config
setters at middle. Resynchronize openapi.yaml (27 paths, 30 operations,
ComfyUISwitchVersionParams as a shared $ref component). Add E2E harness
variants, Playwright config, CSRF/secgate suites, 39-endpoint coverage,
and a CHANGELOG.

Breaking: legacy per-op POST routes (install/uninstall/fix/disable/update/
reinstall/abort_current) are removed; callers already use queue/batch.
Legacy /manager/notice (v1) is removed; /v2/manager/notice is retained.

Reported-by: XlabAI Team of Tencent Xuanwu Lab
CVSS: 8.1 (AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H)
2026-04-22 05:04:30 +09:00

350 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GUIDE_E2E_TEST — ComfyUI-Manager E2E Test Guide
> Auto-generated by pair-scaffold (test-guide mode). Domain: **api** (primary)
> with mixed **web** (Playwright UI) + **cli** (cm-cli subprocess) elements.
> Date: 2026-04-21.
This guide consolidates session-wide E2E knowledge accumulated through the
WI-A..WI-ZZ sweep. It is the entry point for writing, running, and auditing
end-to-end tests against the ComfyUI-Manager backend (glob + legacy) and
the legacy management UI. Companion registry of reusable test patterns:
`.claude/scaffold/e2e-methods.yml`.
---
## 1. Testing Strategy
Three test surfaces cover different contract layers. Pick the surface that
matches the contract you are validating — do not default to "everything".
| Surface | Location | Transport | Authoritative for |
|---------|----------|-----------|-------------------|
| **pytest E2E** | `tests/e2e/*.py` | Direct HTTP via `requests` against running aiohttp server | Endpoint contract (status, schema, side-effect, CSRF) |
| **Playwright UI (real)** | `tests/playwright/legacy-ui-*.spec.ts` | Browser click → real backend | UI↔backend wiring against actual server state |
| **Playwright UI (mock)** | `tests/playwright/legacy-ui-mock-*.spec.ts` | Browser click → `page.route()` intercept | UI→API request shape (URL / method / payload) without backend mutation |
| **CLI subprocess** | `tests/cli/test_uv_compile.py` | `subprocess.run()` against `cm-cli` | CLI command contract (args, exit code, stdout/stderr) |
**Layering rule**: when both pytest and Playwright can cover a behavior, use
pytest for backend contract and Playwright for UI wiring. Mock Playwright is
acceptable ONLY when real execution is infeasible (security gate, destructive
operation, long-running network) — flag it in the test name suffix
(`-mock` / `WI-WW-mock`) and document the honesty boundary (§6 below).
---
## 2. Test Categories
| Category | Description | Location | Tools |
|----------|-------------|----------|-------|
| pytest E2E direct | Endpoint contract via HTTP | `tests/e2e/` | `requests` + `pytest` |
| pytest E2E negative | CSRF / 400 / input validation | `tests/e2e/test_e2e_csrf*.py` | `requests` |
| Playwright real | UI click → real backend | `tests/playwright/legacy-ui-*.spec.ts` | `@playwright/test` |
| Playwright mock | UI click → `page.route` mocked response | `tests/playwright/legacy-ui-mock-*.spec.ts` | `@playwright/test` `page.route` |
| CLI subprocess | cm-cli end-to-end | `tests/cli/` | `pytest` + `subprocess.run` |
---
## 3. Critical Environment Requirements
**MUST read before writing any E2E test.** These are not optional; every one
cost at least one debugging cycle in the sweep.
### 3.1 Editable install requirement
After editing source under `comfyui_manager/`, re-run `uv pip install -e .`
inside the E2E venv. A running server uses whatever source the venv
registered at startup — stale source means stale behaviour. After commit
`99caef55` (CSRF state-changing-endpoint conversion), skipping this step
caused POST routes to return 405 because the test venv still had the
pre-conversion routing table.
### 3.2 Legacy vs glob mutex
`comfyui_manager/__init__.py:14-45` loads EITHER the glob manager OR the
legacy manager — never both. Use the fixture script that matches the
contract under test:
| Script | Manager | Extra flags |
|--------|---------|-------------|
| `tests/e2e/scripts/start_comfyui.sh` | glob | none |
| `tests/e2e/scripts/start_comfyui_legacy.sh` | legacy | `--enable-manager-legacy-ui` |
| `tests/e2e/scripts/start_comfyui_strict.sh` | glob | patches `config.ini` to `security_level = strong` (for negative-path `403` tests on `middle` / `middle+` gates) |
| `tests/e2e/scripts/start_comfyui_permissive.sh` | legacy + relaxed | patches `config.ini` to `security_level = normal-` and sets `ENABLE_LEGACY_UI=1` (for positive-path tests on `high+` gates: `comfyui_switch_version`, `install/git_url`, `install/pip`) |
Mixing them in the same suite is an error — pytest modules that exercise
legacy-only endpoints MUST live in `test_e2e_legacy_*.py` so the fixture can
select the legacy script.
### 3.3 Port namespace + PID files
All fixture scripts write `$LOG_DIR/comfyui.${PORT}.pid`. Default ports:
| Manager | Default PORT | PID file |
|---------|:---:|----------|
| glob | 8188 | `$LOG_DIR/comfyui.8188.pid` |
| legacy | 8199 | `$LOG_DIR/comfyui.8199.pid` |
| strict | 8188 | `$LOG_DIR/comfyui.8188.pid` |
Because the PID file is per-port, glob + legacy suites can run concurrently
if they pick distinct ports. Never hard-code `8188` / `8199` in your
assertions — read `PORT` from env.
### 3.4 Safety env var — manager_requirements skip
`start_comfyui.sh` exports `COMFYUI_MANAGER_SKIP_MANAGER_REQUIREMENTS=1`
(added in WI-WW/WI-YY). Any install/update code path that would otherwise
reinstall `manager_requirements.txt` sees this flag and skips. Real install
tests (e.g. `test_e2e_legacy_real_ops.py::TestInstallModelRealDownload`)
depend on this — without it the test server would re-pin its own
dependencies mid-test and fail randomly.
### 3.5 E2E_ROOT
`$E2E_ROOT` is the venv root + artifact directory produced by
`tests/e2e/scripts/setup_e2e_env.sh`. All fixture scripts source from it.
Export it once per shell session, or pass it per-command:
```bash
export E2E_ROOT=</path/to/your/e2e-root> # one-time
```
**Portability**: never hard-code the absolute path inside test code. Read
from `os.environ["E2E_ROOT"]` or let the fixture script resolve it.
---
## 4. Fixture Patterns
### 4.1 `comfyui` module-scoped fixture
`tests/e2e/conftest.py` defines a module-scoped `comfyui` fixture that:
1. Invokes `start_comfyui.sh` with the current PORT.
2. Blocks until the server accepts requests (or timeout).
3. Yields the server URL for the test.
4. Tears down with `stop_comfyui.sh` (kills the PID in the `.pid` file).
### 4.2 Fixture variants
| Variant | Fixture script | Extra setup |
|---------|---------------|-------------|
| glob (default) | `start_comfyui.sh` | PORT=8188 |
| legacy | `start_comfyui_legacy.sh` | PORT=8199, `ENABLE_LEGACY_UI=1` |
| strict | `start_comfyui_strict.sh` | Patches `config.ini` to `security_level = strong` (backup at `config.ini.before-strict`); fixture MUST restore on teardown. Used for negative-path `403` assertions on `middle` / `middle+` gates. |
| permissive | `start_comfyui_permissive.sh` | Patches `config.ini` to `security_level = normal-` (backup at `config.ini.before-permissive`) and sets `ENABLE_LEGACY_UI=1`; fixture MUST restore on teardown. Used for positive-path execution of `high+` gated endpoints (`comfyui_switch_version`, `install/git_url`, `install/pip`) with hardcoded trusted inputs. |
### 4.3 Example invocation
```bash
E2E_ROOT=$E2E_ROOT $E2E_ROOT/venv/bin/python -m pytest \
tests/e2e/test_e2e_legacy_endpoints.py -v --timeout=300
```
Use the venv's interpreter explicitly (`$E2E_ROOT/venv/bin/python`) — not
the system `python` — so the test runs against the editable install from
§3.1.
---
## 5. Test Design Patterns
Distilled from the sweep. Match the pattern to your contract; do not
invent new patterns without recording them in
`.claude/scaffold/e2e-methods.yml`.
### 5.1 Positive-path
Status 200 + response schema + side-effect verification. Do not stop at
status 200 — assert at least one schema field AND one observable side
effect (disk artifact, queue entry, config change).
### 5.2 Negative-path
Status 400 / 403 / 405 + body message substring. CSRF-reject tests live
in `test_e2e_csrf*.py` and are parametrized over the full endpoint list.
### 5.3 Async queue polling
Install / update endpoints enqueue work and return immediately. Poll
`/v2/manager/queue/status` or the expected disk artifact until completion.
Never `time.sleep()` with a hard-coded interval — use `expect.poll` (TS)
or a polling helper with a timeout.
### 5.4 Teardown mandatory for mutation tests
Install / save / apply tests MUST restore the original state. Use
`try/finally` or a pytest fixture. A failing test that leaves a modified
config.ini or an installed custom_node poisons every subsequent test in
the suite.
### 5.5 `@pytest.mark.xfail` for known bugs
When a test documents a bug that is not yet fixed, mark it
`@pytest.mark.xfail(reason=...)`. When the fix lands, flip to `xfail(strict=True)`
or remove the mark and include the fix commit hash in the removal commit
message (see `test_reinstall_with_uv_compile` / commit `ddccefbc` for the
pattern).
### 5.6 Playwright selector stability
Prefer title attribute — e.g. `select[title^="Configure the channel"]`
over class selectors (shared across multiple combos). Label-based
(`:has(span.text-muted:text-is("Channel"))`) is second-best. Class-only
is brittle. See `reports/legacy-ui-channel-combo-dom-mapping.md` for the
channel-combo case study.
### 5.7 Playwright 2-hop mock
For fail-state UI scenarios: mock GET (inject a fake pack that puts the UI
into the failure state) + intercept POST (capture the payload the failure
handler fires). (The prior `legacy-ui-mock-install.spec.ts::wi-015`
exemplar was superseded by a real-E2E pre-seeded broken-pack test in
`tests/e2e/test_e2e_legacy_real_ops.py::TestImportFailInfoReal` — use
this technique only when real backend reproduction is infeasible.)
### 5.8 Playwright `page.request.post` fallback
For structurally unreachable UI triggers — e.g. the idle-state
`queue/reset` Stop button is `display: none` — use the browser-context
HTTP client to bypass the UI. Document the reason in the spec comment so
future readers don't "fix" it by forcing a click.
---
## 6. Security Gate Matrix
`comfyui_manager/glob/utils/security_utils.py:20-26` gates endpoints by
risk level. The default `security_level=normal` allows middle+ but
rejects high+ with 403.
| Gate | Local only? | security_level must be in |
|------|:---:|---------------------------|
| `middle+` | optional | `{WEAK, NORMAL, NORMAL_}` (default allows) |
| `high+` | required | `{WEAK, NORMAL_}` (default `NORMAL` ⇒ 403) |
**high+ endpoints** (`comfyui_switch_version`, `install/git_url`,
`install/pip`, install_model with non-safetensors, etc.) require
`start_comfyui_permissive.sh` to be tested positively. That harness
backs up `config.ini`, patches `security_level = normal-`, sets
`ENABLE_LEGACY_UI=1`, and the pytest fixture restores the config on
teardown. Use hardcoded trusted inputs only (never user-derived) — the
default-security `403` contract is the positive-path security behavior
we want to preserve in production. The counterpart
`start_comfyui_strict.sh` harness (`security_level = strong`) is used to
exercise the 403 negative path for `middle` / `middle+` gates. See the
WI-YY infeasibility log for which high+ endpoints still need harness
work.
**Honesty boundary for mock-based closures**: rows in
`reports/api-coverage-matrix.md` marked `Y (WI-WW-mock)` assert UI→API
wiring only — URL + method + payload shape. They do NOT assert backend
handler behavior; that is pytest's job. A regression that kept the UI
firing correctly but broke the backend would not be caught by the mock
test alone. Document the boundary in the spec comment.
---
## 7. Audit & Coverage Artifacts
Tracked reports under `reports/`. Update these when you add or retire
tests.
| Artifact | Purpose | Maintenance |
|----------|---------|-------------|
| `reports/e2e_verification_audit.md` | Per-test verdict matrix (✅ PASS / ⚠️ WEAK / ❌ INADEQUATE / N/A) with Summary table, per-file sections, TOTAL counts | Any new or retired test → update both the per-file section and Summary; then run `scripts/verify_audit_counts.py` |
| `reports/api-coverage-matrix.md` | 39-endpoint × pytest + Playwright matrix | Update when a new endpoint is added or a coverage cell flips |
| `reports/test-bloat-inventory.md` | 10-code bloat sweep (B1-BA), 127-test baseline | Run `/pair-sweep test bloat identification` for a re-baseline when churn exceeds 10 tests |
**Verification script**: `scripts/verify_audit_counts.py` parses
`e2e_verification_audit.md` and validates that the Summary row counts
match the per-file section counts and the TOTAL line. It MUST exit 0
before a doc-sync WI completes.
---
## 8. Test Bloat Checklist (B1-BA)
Use these codes when reviewing new tests. Sources:
`reports/test-bloat-inventory.md` + the sweep methodology in
`.claude/pair-working/sessions/sweep-endpoint-coverage/scratch/goal-report-bloat-sweep.md`.
| Code | Type | Criterion |
|------|------|-----------|
| **B1** Redundant | Duplicate assertion | Same assertion or contract already covered by another test |
| **B2** Over-parametrized | Boundary noise | Cases beyond boundary + equivalence class contribute zero |
| **B3** Mixed-concerns | Multi-concern | One test mixes 3+ unrelated assertions |
| **B4** Dead | Non-existent feature | Validates a removed / non-existent endpoint or API |
| **B5** Smoke-only | Superficial | Only checks status 200 + dict type, no substantive validation |
| **B6** Setup-heavy | Over-fixtured | 50+ lines of setup/teardown vs few assertions |
| **B7** Stale-skip | Obsolete skip | `pytest.skip` reason no longer valid |
| **B8** Title-mismatch | Misleading name | Function name does not match what it actually validates |
| **B9** Copy-paste | DRY violation | ≥80% duplication with another test; helper/parametrization possible |
| **BA** Impl-detail | Implementation-coupled | Validates internal implementation, not contract |
**Triage priority**: B4 ⇒ remove. B1/B9 ⇒ parametrize or remove the
duplicate. B8 ⇒ rename or refactor the assertion. B5/BA ⇒ strengthen or
remove. B6/B7 ⇒ clean up. B2/B3 ⇒ case-by-case.
---
## 9. Running Tests
### 9.1 pytest E2E — glob manager
```bash
pytest tests/e2e/ -v --timeout=300
```
### 9.2 pytest E2E — legacy-only endpoints
```bash
pytest tests/e2e/test_e2e_legacy_endpoints.py tests/e2e/test_e2e_csrf_legacy.py -v
```
### 9.3 Playwright (legacy UI)
```bash
PORT=8199 npx playwright test tests/playwright/
```
### 9.4 CLI subprocess
```bash
pytest tests/cli/ -v
```
### 9.5 Audit verification
```bash
python3 scripts/verify_audit_counts.py
```
Exits 0 when `reports/e2e_verification_audit.md` is internally consistent
(Summary ↔ per-file ↔ TOTAL). Exit non-zero → drift between sections.
### 9.6 Full suite (reference invocation)
```bash
# pytest all, with explicit venv
E2E_ROOT=$E2E_ROOT $E2E_ROOT/venv/bin/python -m pytest tests/e2e/ -v --timeout=300
# Playwright, legacy UI mode
PORT=8199 npx playwright test tests/playwright/
# Audit verification
python3 scripts/verify_audit_counts.py
```
---
## Appendix — Companion Files
- `.claude/scaffold/e2e-methods.yml` — living registry of the test
patterns described in §5 plus the bloat codes from §8. Append entries
when you establish a new pattern.
- `reports/e2e_verification_audit.md` — per-test verdict matrix.
- `reports/api-coverage-matrix.md` — endpoint × coverage matrix.
- `reports/test-bloat-inventory.md` — 127-test sweep baseline.
- `reports/legacy-ui-channel-combo-dom-mapping.md` — DOM selector case
study for the Channel combo.