Update AGENTS.md with more stuff. (#14725)

2026-07-03 13:19:23 +08:00 · 2026-07-01 18:55:13 -07:00 · 2026-07-01 18:55:13 -07:00 · 92594ca84c
commit 92594ca84c
parent 2c935de1b1
1 changed files with 97 additions and 1 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -11,7 +11,8 @@
 - Delete obsolete code aggressively when newer infrastructure makes it useless.
  Remove dead fallbacks, migration paths, unused options, debug prints, and
  compatibility branches that are no longer needed. Do not leave dead branches,
-  unreachable code, or functions that are never called.
+  unreachable code, or functions that are never called. If code is not
+  necessary for the current behavior, remove it.
 - Revert or disable problematic behavior quickly when it breaks users. It is
  better to remove a broken feature path than keep a complicated partial fix.
 - Preserve existing APIs, node names, model-loading behavior, file layout, and
@ -85,6 +86,14 @@
  not change a shared method to return extra values, alternate shapes, or
  sentinel wrappers for one implementation unless the shared interface is
  explicitly updated.
+- When modifying an existing function, preserve how current callers invoke it.
+  Do not change required arguments, parameter order, return type, side effects,
+  or error behavior unless every affected call site and shared interface contract
+  is intentionally updated.
+- Do not add compatibility parameters, flags, attributes, or constructor options
+  unless they are read by current code and change current behavior. Remove
+  pass-through or stored-but-unused values instead of preserving upstream or
+  deprecated API baggage.
 - If an implementation needs auxiliary values for its own workflow, expose them
  through a private helper or a clearly named implementation-specific method
  instead of overloading the public method's return contract.
@ -111,6 +120,11 @@
 - Do not add unnecessary `try`/`except` blocks. Use them for optional dependency,
  platform, or backend capability detection only when the program has a useful
  fallback. Prefer specific exception types when changing new code.
+- Remove any workarounds for PyTorch versions that ComfyUI no longer officially
+  supports. Deprecated workarounds include catching an exception and rerunning
+  the same op with the input cast to float. If a workaround does not have a
+  comment naming the exact PyTorch version or versions that still need it,
+  remove it.
 - Let unsupported model formats, invalid quantization metadata, and bad states
  fail with clear errors instead of silently producing lower quality output.
 - Match the existing local style in the file you edit. This codebase tolerates
@ -129,8 +143,87 @@
  adding parallel code paths. Use `comfy.quant_ops`, `comfy.model_management`,
  `comfy.memory_management`, `comfy.pinned_memory`, `comfy_aimdo`, and
  `comfy-kitchen` helpers where they already solve the problem.
+- Use optimized comfy-kitchen ops in places where they improve performance
+  without changing the expected dtype, device, memory, or interface behavior.
+- All models should use the optimized attention function selected by ComfyUI.
+  Treat optimized backend functions, dispatch helpers, and capability-selected
+  callables as opaque. Higher-level code must not inspect function identity,
+  names, modules, or implementation details to decide behavior.
+- Apply the same opacity rule to similar patterns beyond attention: callers
+  should depend on the documented interface and result contract, not on which
+  backend implementation was selected underneath.
+- Do not use custom inference ops that only duplicate an existing op while
+  upcasting to float32, such as custom RMSNorm variants. Use the generic ComfyUI
+  ops and/or native torch ops instead.
+- If a model class `__init__` has an `operations` parameter, assume
+  `operations` is never `None`. Do not add fallback branches or default torch
+  ops for a missing `operations` object.
+- Do not add unnecessary parameters to model, model block, or model ops related
+  classes. Constructor and forward signatures should carry only values that are
+  actually needed by that object for inference.
+- Reuse existing model classes, blocks, ops, and helper modules when appropriate.
+  Before implementing a new version of a model component, search the existing
+  model code for a class or helper that already provides the behavior.
+- Avoid adding `einops` usage in core inference code. Use native torch tensor
+  ops such as `reshape`, `view`, `permute`, `transpose`, `flatten`, `unflatten`,
+  `unsqueeze`, and `squeeze` instead.
+- Do not use tensors as general-purpose Python data structures. Keep metadata,
+  bookkeeping, counters, flags, shape math, padding math, index planning, memory
+  estimates, and control-flow decisions in plain Python values unless the data
+  must participate directly in tensor computation. Avoid creating temporary
+  tensors just to use tensor methods for scalar or structural calculations.
 - Avoid unnecessary casts and transfers. Preserve the intended compute dtype,
  storage dtype, bias dtype, and original tensor shape metadata.
+- Assume inputs to the main model forward are already in the compute dtype by
+  default, except integer inputs such as some model timestep tensors. Do not add
+  defensive or convenience casts in model code; it is better for invalid dtype
+  plumbing to error clearly than to hide it with unnecessary casts.
+- Raw model parameters that are not owned by an op and may be initialized in a
+  dtype different from the compute dtype should be cast at use in forward or
+  inference code with `comfy.ops.cast_to_input` or
+  `comfy.model_management.cast_to` to avoid dtype mismatches.
+- Model code should not care what dtype it is initialized in, and model
+  `__init__` methods should not contain workarounds for specific dtypes. Dtype
+  workaround code, such as making a model work with fp16 compute, belongs in the
+  execution or model-management layer that owns compute policy.
+- Model code should not perform unnecessary device-to-CPU or CPU-to-device
+  transfers. New allocations must be created on the correct device and dtype;
+  never allocate on CPU and then move to GPU, or allocate in one dtype and then
+  convert to another.
+- Model code itself should not perform memory management. Loading, unloading,
+  offloading, device movement, VRAM policy, cache lifetime, and cleanup belong
+  in the relevant model-management and execution layers, not inside model
+  implementations.
+- Do not add global, module-level, class-level, singleton, or model-owned stores
+  for tensors or other large memory that persist across executions. Temporary
+  caches must be scoped to a single execution or forward/encode/decode call:
+  allocate them in the owning top-level call, pass them explicitly through the
+  call stack, and let them be discarded when that call returns.
+- Follow the Wan VAE temporal cache pattern for temporary caches: create a local
+  cache such as `feat_map` for the encode/decode operation, pass it into the
+  blocks that need it, and do not retain it on the model or in global state.
+- In model init code, prefer `torch.empty` for parameter/buffer placeholders
+  that are populated from the model state dict instead of zero-initializing with
+  `torch.zeros` or similar. If an allocation is not loaded from the state dict
+  and is useless for inference, do not include it.
+- `nn.Parameter` tensors that are stored in and populated from the model state
+  dict should be initialized with `torch.empty`, not with zero, random, or
+  otherwise meaningful initialization.
+- Model initialization should describe module structure, not fabricate
+  checkpoint-owned tensor contents. Parameters and buffers that are loaded from
+  the state dict must not be manually initialized, reassigned, or filled with
+  fallback values unless that value is actually used when no checkpoint key
+  exists.
+- When slicing large tensors, copy the slice if the sliced tensor's lifetime
+  exceeds the current function scope. Do not keep a long-lived view into a large
+  backing tensor when a smaller copy would release memory sooner.
+- Use fused or compound torch operations such as `addcmul` when they naturally
+  match the math. Reducing Python and torch dispatch overhead is a valid
+  optimization when it does not obscure the code or change dtype/device
+  behavior.
+- Avoid caches that persist across different executions as much as possible.
+  Persistent caches are acceptable only when they use a very minimal amount of
+  memory and have a clear ownership and invalidation story.
 - When optimizing, favor small measurable changes: fewer allocations, fewer
  device transfers, less peak memory, better batching, or use of a faster
  existing backend op.
@ -141,6 +234,9 @@
  `CATEGORY`, and registration through the local mapping used by that file.
 - Keep node changes backward compatible by default. Add inputs with sensible
  defaults and avoid changing output types unless the request requires it.
+- Node-level code must not patch model code directly. Any node behavior that
+  modifies, wraps, hooks, or changes model behavior must go through the model
+  patcher class instead of reaching into model internals.
 - The official mascot of ComfyUI is a very cute anime girl with massive fennec
  ears, a big fluffy tail, long blonde wavy hair, and blue eyes. Feel free to
  use her in ComfyUI materials, UI text, examples, tests, generated assets, or