ComfyUI/comfy
rattus e154da83b1
Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116)
* memory_management: Add direct to read GPU mode

Make destination optional (or make it optionally GPU) and use aimdo
to file_read direct to GPU.

* ops: Remove stream pin buffers and use aimdo reads

This consumed too much RAM and its better to just take the hit on
the CPU syncing back the stream on a short ring buffer. Aimdo
implements this so just rip the stream pin buffer from comfy.

* model_management: all active pin registration movement

Its better to just let the active model load past the pin limit as
pins and let the pins move around. The saves the HDD and SATA
people disk traffic while only costing a few GPU syncs.

* utils: use aimdo file handle

This opens on windows with more favourable flags

* mp: only count the model proper for loaded_ram and vram

Exclude live loras from the numbers to avoid the case where the reported
loaded memory exceeds the size of the model.

This causes me confusion in the Kijai visualizer when it looked fully
loaded but was hitting disk due to this accounding disrepency.

* utils: add bit reverse utility

useful for max scattering something ordered.

* pinned_memory: Implement offload balancing

Use a max scatter alogorithm to prioritize pins of the same size such
that when doing a little bit of offloading it gets scattered, allowing
the prefetcher to more evenly swollow the offload.

* comfy-aimdo 0.4.7

Aimdo 0.4.7 implement VRAM buffer exhaustion predection to avoid
early speculative load of weights that definately wont fix once the
inference gets further in.

* model-prefetch: consolidate pin ensures on the sync point

This could happen mid prefetch block, cause a sync of the entire
block and lose overlap. Get ahead of the problem with a free down
at the natural compute stream sync point.

* mm: Put a 2GB min on the pin ceiling

This is reasonably bad if it starts causing swap pressure, moreso than
during normal ram-cache proceedings. Clamp it.

* add --fast-disk
2026-05-30 15:20:04 -04:00
..
audio_encoders Fix fp16 audio encoder models (#12811) 2026-03-06 18:20:07 -05:00
background_removal Add support for BiRefNet background remove model (CORE-46) (#12747) 2026-05-08 17:59:24 +08:00
cldm Add better error message for common error. (#10846) 2025-11-23 04:55:22 -05:00
comfy_types Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
extra_samplers Uni pc sampler now works with audio and video models. 2025-01-18 05:27:58 -05:00
image_encoders feat: Support MoGe (CORE-168) (#13878) 2026-05-15 10:34:56 +08:00
k_diffusion feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
ldm Speed up anima a bit on nvidia. (#14181) 2026-05-29 22:47:10 -07:00
sd1_tokenizer Silence clip tokenizer warning. (#8934) 2025-07-16 14:42:07 -04:00
t2i_adapter Controlnet refactor. 2024-06-27 18:43:11 -04:00
taesd Add high quality preview support for Flux2 latents (#13496) 2026-04-29 19:37:30 -04:00
text_encoders feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103) 2026-05-26 17:50:14 -07:00
weight_adapter MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446) 2026-02-15 20:30:09 -05:00
bg_removal_model.py Fix background removal mask output shape (#14171) 2026-05-29 09:14:32 -07:00
cli_args.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
clip_config_bigg.json Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
clip_model.py Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_config_g.json Add support for clip g vision model to CLIPVisionLoader. 2023-08-18 11:13:29 -04:00
clip_vision_config_h.json Add support for unCLIP SD2.x models. 2023-04-01 23:19:15 -04:00
clip_vision_config_vitl_336_llava.json Support llava clip vision model. 2025-03-06 00:24:43 -05:00
clip_vision_config_vitl_336.json support clip-vit-large-patch14-336 (#4042) 2024-07-17 13:12:50 -04:00
clip_vision_config_vitl.json Add support for unCLIP SD2.x models. 2023-04-01 23:19:15 -04:00
clip_vision_siglip2_base_naflex.json Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_siglip_384.json Support new flux model variants. 2024-11-21 08:38:23 -05:00
clip_vision_siglip_512.json Support 512 siglip model. 2025-04-05 07:01:01 -04:00
clip_vision.py Reduce RAM usage, fix VRAM OOMs, and fix Windows shared memory spilling with adaptive model loading (#11845) 2026-02-01 01:01:11 -05:00
conds.py Cleanups to the last PR. (#12646) 2026-02-26 01:30:31 -05:00
context_windows.py feat: Context windows - add causal_window_fix to improve blending of context windows (CORE-100) (#13563) 2026-05-05 16:40:53 -07:00
controlnet.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
deploy_environment.py Add deploy environment header (Comfy-Env) to partner node API calls (#13425) 2026-05-04 20:17:56 -07:00
diffusers_convert.py Remove useless code. 2025-01-24 06:15:54 -05:00
diffusers_load.py load_unet -> load_diffusion_model with a model_options argument. 2024-08-12 23:20:57 -04:00
float.py float: use CK stochastic rounding cuda kernel (#13971) 2026-05-28 19:23:42 -07:00
gligen.py Remove some useless code. (#8812) 2025-07-06 07:07:39 -04:00
hooks.py Fix typos (#10986) 2026-05-08 17:14:45 +08:00
latent_formats.py feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103) 2026-05-26 17:50:14 -07:00
lora_convert.py Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
lora.py Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
memory_management.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_base.py Support context window for PiD and fix lq_latent rounding (#14136) 2026-05-27 12:08:06 -07:00
model_detection.py feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103) 2026-05-26 17:50:14 -07:00
model_management.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_patcher.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_prefetch.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_sampling.py feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
multigpu.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
nested_tensor.py WIP way to support multi multi dimensional latents. (#10456) 2025-10-23 21:21:14 -04:00
ops.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
options.py Only parse command line args when main.py is called. 2023-09-13 11:38:20 -04:00
patcher_extension.py Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
pinned_memory.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
pixel_space_convert.py Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
quant_ops.py Enable triton comfy kitchen via cli-arg (#12730) 2026-05-03 14:07:21 -04:00
rmsnorm.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
sample.py Initial work to make downscale_ratio_temporal work. (#13972) 2026-05-18 23:01:43 -04:00
sampler_helpers.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
samplers.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
sd1_clip_config.json Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
sd1_clip.py feat: Support Qwen3.5 text generation models (#12771) 2026-03-25 22:48:28 -04:00
sd.py feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103) 2026-05-26 17:50:14 -07:00
sdxl_clip.py Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803) 2025-04-25 19:36:00 -04:00
supported_models_base.py Fix some custom nodes. (#11134) 2025-12-05 18:25:31 -05:00
supported_models.py Add memory usage factor for lens model. (#14124) 2026-05-26 18:06:51 -07:00
utils.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00