mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-07-15 02:49:18 +08:00

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

ai comfy comfyui python pytorch stable-diffusion

Go to file

xmarre 07c96b7238 Update README with performance-oriented WSL setup		2026-06-22 15:50:02 +02:00
.ci	Use windows line endings for windows portable readmes. (#14334 )	2026-06-07 23:56:53 -04:00
.github	Update line endings check to ignore .ci files. (#14319 )	2026-06-06 19:33:03 -07:00
alembic_db	chore(assets): drop vestigial tags.tag_type column (#14248 )	2026-06-09 21:07:10 -07:00
api_server	fix: append directory type annotation to internal files endpoint response (#13078 ) (#13305 )	2026-04-18 23:21:22 -04:00
app	revert(assets): drop job_ids filter from GET /api/assets (#14408 )	2026-06-10 19:23:01 -07:00
blueprints	Add new model blueprints (#14506 )	2026-06-17 08:52:55 +08:00
comfy	Merge branch 'Comfy-Org:master' into master	2026-06-21 16:57:46 +02:00
comfy_api	feat: add enable_telemetry CLI feature flag (#14530 )	2026-06-17 19:35:05 -07:00
comfy_api_nodes	[Partner Nodes] chore(Google): remove preview versions of models that will be deprecated soon (#14555 )	2026-06-20 09:13:37 +03:00
comfy_config	Add new fields to the config types (#8507 )	2025-06-18 15:12:29 -04:00
comfy_execution	Merge branch 'Comfy-Org:master' into master	2026-06-21 16:57:46 +02:00
comfy_extras	Rename a bunch of nodes (#14547 )	2026-06-20 08:01:28 +08:00
custom_nodes	Update nodes categories and display names (CORE-89) (#13786 )	2026-05-08 01:02:55 -04:00
input	LoadLatent and SaveLatent should behave like the LoadImage and SaveImage.	2023-05-18 00:09:12 -04:00
middleware	fix: use no-store cache headers to prevent stale frontend chunks (#12911 )	2026-03-14 18:25:09 -04:00
models	Update MediaPipe nodes to standardize with existing code base (CORE-242) (#14025 )	2026-05-21 14:39:30 +08:00
output	Initial commit.	2023-01-16 22:37:14 -05:00
script_examples	Update comment in api example. (#9708 )	2025-09-03 18:43:29 -04:00
tests	Merge branch 'Comfy-Org:master' into master	2026-06-21 16:57:46 +02:00
tests-unit	Merge branch 'Comfy-Org:master' into master	2026-06-21 16:57:46 +02:00
utils	Update logging level for invalid version format (#13526 )	2026-04-22 20:21:43 -04:00
.coderabbit.yaml	chore: tune CodeRabbit config to limit review scope and disable for drafts (#12567 )	2026-02-21 18:32:15 -08:00
.gitattributes	Add Veo3 video generation node with audio support (#9110 )	2025-08-05 01:52:25 -04:00
.gitignore	Add deploy environment header (Comfy-Env) to partner node API calls (#13425 )	2026-05-04 20:17:56 -07:00
.spectral.yaml	Suppress false-positive Spectral lint on WebSocket endpoint (#13842 )	2026-05-12 13:14:50 -07:00
alembic.ini	Add support for sqlite database (#8444 )	2025-06-11 16:43:39 -04:00
CODEOWNERS	Repo security stuff. (#14019 )	2026-05-20 17:17:55 -07:00
comfyui_version.py	ComfyUI v0.25.0	2026-06-15 23:45:14 -04:00
CONTRIBUTING.md	Add CONTRIBUTING.md (#3910 )	2024-07-01 13:51:00 -04:00
cuda_malloc.py	Always enable cuda malloc on cu130 and higher. (#14381 )	2026-06-09 21:39:24 -04:00
execution.py	Add jobs-namespace cancel endpoints (POST /api/jobs/{job_id}/cancel, POST /api/jobs/cancel) (#14493 )	2026-06-19 16:39:35 -07:00
extra_model_paths.yaml.example	Update extra model paths example. (#14570 )	2026-06-20 19:28:09 -07:00
folder_paths.py	Remove useless annotations imports. (#14105 )	2026-05-25 19:23:29 -07:00
hook_breaker_ac10a0.py	Prevent custom nodes from hooking certain functions. (#7825 )	2025-04-26 20:52:56 -04:00
latent_preview.py	Support LTX2 tiny vae (taeltx_2) (#11929 )	2026-01-21 23:03:51 -05:00
LICENSE	Initial commit.	2023-01-16 22:37:14 -05:00
main.py	fix: log base directory to startup messages when --base-directory is used (fixes #13363 ) (#13370 )	2026-06-16 19:21:36 +08:00
manager_requirements.txt	bump manager version to 4.2.2 (#14471 )	2026-06-14 14:42:03 -04:00
node_helpers.py	Fix issue blend images with alpha (#13615 )	2026-05-03 18:17:08 +08:00
nodes.py	Move comfy sys path insert to custom node loading. (#14459 )	2026-06-18 22:32:55 -04:00
openapi.yaml	chore(openapi): sync shared API contract from cloud@1aea581 (#14562 )	2026-06-20 12:53:10 +08:00
protocol.py	Support for async node functions (#8830 )	2025-07-10 14:46:19 -04:00
pyproject.toml	ComfyUI v0.25.0	2026-06-15 23:45:14 -04:00
pytest.ini	Execution Model Inversion (#2666 )	2024-08-15 11:21:11 -04:00
QUANTIZATION.md	Update quant doc so it's not completely wrong. (#13381 )	2026-04-12 23:27:38 -04:00
README.md	Update README with performance-oriented WSL setup	2026-06-22 15:50:02 +02:00
requirements.txt	Bump comfyui-frontend-package to 1.45.19 (#14559 )	2026-06-19 16:01:34 -07:00
SECURITY.md	Create SECURITY.md. (#13902 )	2026-05-14 16:02:22 -07:00
server.py	Add jobs-namespace cancel endpoints (POST /api/jobs/{job_id}/cancel, POST /api/jobs/cancel) (#14493 )	2026-06-19 16:39:35 -07:00

README.md

ComfyUI Global Memory Trim

Global native heap trimming for ComfyUI on Linux/WSL.

This custom node repo installs a small global execution patch when ComfyUI loads custom nodes. The patch can call Python gc.collect() and glibc malloc_trim(0) before and/or after node execution. It is meant for workflows that repeatedly create large CPU image/video buffers through PyTorch, NumPy, OpenCV, Pillow, or native custom nodes and then stall or wedge under WSL2 memory pressure.

It also provides two optional workflow nodes:

Global Memory Trim Now: manually run a trim and return RSS metrics.
Global Memory Trim Status: return current config and last trim result.

The global patch does not require adding either node to your workflow.

Why this exists

Some WSL2 workloads can stall when native libraries repeatedly allocate and free large CPU buffers. Python objects may be gone, but glibc arenas can retain pages. Under a WSL memory cap, that can trigger heavy reclaim or a hard-looking VM stall. malloc_trim(0) asks glibc to return free heap pages to the OS.

This repo is intentionally CPU/native-heap focused. It does not directly free CUDA VRAM, unload ComfyUI models, delete ComfyUI caches, or change workflow outputs.

Installation

From your ComfyUI directory:

git clone https://github.com/xmarre/ComfyUI-Global-Memory-Trim custom_nodes/ComfyUI-Global-Memory-Trim

Or copy this folder into:

ComfyUI/custom_nodes/ComfyUI-Global-Memory-Trim

Restart ComfyUI. On startup you should see a log line similar to:

Installed global memory trim patch: enabled=True before=False after=True ...

Performance-oriented WSL setup

This is the current practical setup I use for a large WSL2 ComfyUI workflow with heavy model switching, Flux/SDXL/SeedVR2/detailer passes, and large CPU image buffers.

The important parts are:

Keep ComfyUI on --highvram for performance.
Disable async weight offload and pinned memory on WSL.
Do not force --disable-cuda-malloc here; the normal CUDA allocator path avoids the VRAM over-reservation/overflow seen with the native allocator path in this workflow.
Keep PYTORCH_CUDA_ALLOC_CONF unset.
Use glibc trim thresholds and the global trim hook to reduce CPU/native heap retention.
Keep SeedVR2 BF16 forced on if using the patched SeedVR2 import probe workaround and wanting the higher-quality 7B path.

#!/usr/bin/env bash
set -e

_hold_terminal_on_failure() {
  local rc=$?
  if [ "$rc" -ne 0 ]; then
    printf '\nComfyUI launcher exited with status %d\n' "$rc" >&2
    printf 'Dropping into interactive shell so the terminal stays open.\n' >&2
    exec bash -i
  fi
}
trap _hold_terminal_on_failure EXIT

source ~/miniconda3/etc/profile.d/conda.sh
conda activate comfy312

# Native/CPU heap behavior. These do not free CUDA VRAM directly.
export MALLOC_MMAP_THRESHOLD_=65536
export MALLOC_TRIM_THRESHOLD_=65536

# Global trim hook.
# BEFORE=1 is more aggressive and can help before large model/node transitions.
# LOG=1 is useful while validating. Set it to 0 once stable.
export COMFYUI_GLOBAL_TRIM=1
export COMFYUI_GLOBAL_TRIM_AFTER=1
export COMFYUI_GLOBAL_TRIM_BEFORE=1
export COMFYUI_GLOBAL_TRIM_GC=1
export COMFYUI_GLOBAL_TRIM_INTERVAL=1
export COMFYUI_GLOBAL_TRIM_LOG=1
export COMFYUI_GLOBAL_TRIM_MIN_RSS_MB=8192

# Optional, workflow-specific: keep SeedVR2 on BF16 without running an import-time CUDA probe.
export SEEDVR2_FORCE_BFLOAT16=1
unset SEEDVR2_IMPORT_BFLOAT16_PROBE

# Do not force PyTorch's allocator through the environment.
unset PYTORCH_CUDA_ALLOC_CONF

# Optional, workflow-specific memory reduction for SuperBeasts.
export SUPERBEASTS_SPCA_RETURN_RESIDUALS=false
export SUPERBEASTS_HDR_MALLOC_TRIM=true

export PYTHONFAULTHANDLER=1

cd ~/ComfyUI

set +e
python main.py \
  --listen 0.0.0.0 \
  --port 8188 \
  --fast fp16_accumulation \
  --highvram \
  --use-pytorch-cross-attention \
  --disable-async-offload \
  --disable-pinned-memory \
  "$@"
status=$?
set -e

exit "$status"

After validating stability

Once the workflow is stable, reduce log overhead first:

export COMFYUI_GLOBAL_TRIM_LOG=0

Then, if performance still needs tuning, test one change at a time:

export COMFYUI_GLOBAL_TRIM_BEFORE=0

or:

export COMFYUI_GLOBAL_TRIM_INTERVAL=2

If wedges return, restore the previous value.

Conservative diagnostic WSL setup

For reproducing or isolating CPU/native heap stalls, use the more conservative version below. It clamps native CPU thread pools and limits glibc arenas, which can improve WSL stability but may slow CPU-heavy nodes.

export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OPENCV_OPENCL_RUNTIME=disabled

export MALLOC_ARENA_MAX=1
export MALLOC_MMAP_THRESHOLD_=65536
export MALLOC_TRIM_THRESHOLD_=65536

export COMFYUI_GLOBAL_TRIM=1
export COMFYUI_GLOBAL_TRIM_AFTER=1
export COMFYUI_GLOBAL_TRIM_BEFORE=0
export COMFYUI_GLOBAL_TRIM_GC=1
export COMFYUI_GLOBAL_TRIM_INTERVAL=1
export COMFYUI_GLOBAL_TRIM_LOG=0
export COMFYUI_GLOBAL_TRIM_MIN_RSS_MB=8192

Use this when the problem is clearly CPU/native memory pressure rather than VRAM pressure.

Configuration

All configuration is via environment variables.

Variable	Default	Meaning
`COMFYUI_GLOBAL_TRIM`	`1`	Enable/disable the global patch.
`COMFYUI_GLOBAL_TRIM_AFTER`	`1`	Trim after node execution.
`COMFYUI_GLOBAL_TRIM_BEFORE`	`0`	Also trim before node execution. More aggressive, useful for testing or fragile WSL setups.
`COMFYUI_GLOBAL_TRIM_GC`	`1`	Run `gc.collect()` before `malloc_trim(0)`.
`COMFYUI_GLOBAL_TRIM_INTERVAL`	`1`	Trim every N trim opportunities. Use `2`, `4`, etc. to reduce overhead.
`COMFYUI_GLOBAL_TRIM_MIN_RSS_MB`	`0`	Only trim when process RSS is at least this value. `0` means always.
`COMFYUI_GLOBAL_TRIM_LOG`	`0`	Log every trim with RSS before/after. Very noisy; enable only while diagnosing.
`COMFYUI_GLOBAL_TRIM_WARN_NO_LIBC`	`1`	Warn when glibc `malloc_trim` cannot be loaded.

Notes

Linux/WSL only. On non-Linux platforms the patch becomes a no-op.
malloc_trim(0) only returns already-free native heap pages. It does not free live tensors, ComfyUI outputs, model weights, or Python objects that are still referenced.
This is not a VRAM fixer. It targets CPU/native heap retention.
--disable-cuda-malloc can change CUDA allocator behavior and may increase VRAM reservation/fragmentation in some workflows. Do not assume it is safer unless you specifically need it.
--disable-async-offload and --disable-pinned-memory can be useful on WSL when async offload/pinned-memory paths cause wedges.
COMFYUI_GLOBAL_TRIM_LOG=1 is diagnostic only. Turn it off for normal use.

License

MIT