ComfyUI/comfy
Rattus 56d526c133 ops/mp: implement aimdo
Implement a model patcher and caster for aimdo.

A new ModelPatcher implementation which backs onto comfy-aimdo to implement varying model load levels that can be adjusted during model use. The patcher defers all load processes to lazily load the model during use (e.g. the first step of a ksampler) and automatically negotiates a load level during the inference to maximize VRAM usage without OOMing. If inference requires more VRAM than is available weights are offloaded to make space before the OOM happens.

As for loading the weight onto the GPU, that happens via comfy_cast_weights which is now used in all cases. cast_bias_weight checks whether the VBAR assigned to the model has space for the weight (based on the same load priority semantics as the original ModelPatcher). If it does, the VRAM as returned by the Aimdo allocator is used as the parameter GPU side. The caster is responsible for populating the weight data. This is done using the usual offload_stream (which mean we now have asynchronous load overlapping first use compute).

Pinning works a little differently. When a weight is detected during load as unable to fit, a pin is allocated at the time of casting and the weight as used by the layer is DMAd back to the the pin using the GPU DMA TX engine, also using the asynchronous offload streams. This means you get to pin the Lora modified and requantized weights which can be a major speedup for offload+quantize+lora use cases, This works around the JIT Lora + FP8 exclusion and brings FP8MM to heavy offloading users (who probably really need it with more modest GPUs). There is a performance risk in that a CPU+RAM patch has been replace with a GPU+RAM patch but my initial performance results look good. Most users as likely to have a GPU that outruns their CPU in these woods.

Some common code is written to consolidate a layers tensors for aimdo mapping, pinning, and DMA transfers. interpret_gathered_like() allows unpacking a raw buffer as a set of tensors. This is used consistently to bundle and pack weights, quantization metadata (QuantizedTensor bits) and biases into one payload for DMA in the load process reducing Cuda overhead a little. Some Quantization metadata was missing async offload is some cases which is now added. This also pins quantization metadata and consolidates the number of cuda_host_register calls (which can be expensive).
2026-01-23 16:52:31 +10:00
..
audio_encoders
cldm
comfy_types
extra_samplers
image_encoders Add Hunyuan 3D 2.1 Support (#8714) 2025-09-04 20:36:20 -04:00
k_diffusion
ldm qwen_image: propagate attention mask. (#11966) 2026-01-22 20:02:31 -05:00
sd1_tokenizer
t2i_adapter
taesd Support LTX2 tiny vae (taeltx_2) (#11929) 2026-01-21 23:03:51 -05:00
text_encoders Support loading flux 2 klein checkpoints saved with SaveCheckpoint. (#12033) 2026-01-22 18:20:48 -05:00
weight_adapter
checkpoint_pickle.py
cli_args.py
clip_config_bigg.json
clip_model.py
clip_vision_config_g.json
clip_vision_config_h.json
clip_vision_config_vitl_336_llava.json
clip_vision_config_vitl_336.json
clip_vision_config_vitl.json
clip_vision_siglip2_base_naflex.json
clip_vision_siglip_384.json
clip_vision_siglip_512.json
clip_vision.py Add image sizes to clip vision outputs. (#11923) 2026-01-16 23:02:28 -05:00
conds.py
context_windows.py
controlnet.py
diffusers_convert.py
diffusers_load.py
float.py
gligen.py
hooks.py
latent_formats.py
lora_convert.py
lora.py
memory_management.py mm: Implement cast buffer allocations 2026-01-23 16:52:31 +10:00
model_base.py Reduce RAM and compute time in model saving with Loras 2026-01-23 16:52:31 +10:00
model_detection.py Support the Anima model. (#12012) 2026-01-21 19:44:28 -05:00
model_management.py ops/mp: implement aimdo 2026-01-23 16:52:31 +10:00
model_patcher.py ops/mp: implement aimdo 2026-01-23 16:52:31 +10:00
model_sampling.py
nested_tensor.py
ops.py ops/mp: implement aimdo 2026-01-23 16:52:31 +10:00
options.py
patcher_extension.py
pinned_memory.py pinned_memory: add python 2026-01-23 16:52:31 +10:00
pixel_space_convert.py
quant_ops.py
rmsnorm.py
sample.py
sampler_helpers.py
samplers.py mp: wrap get_free_memory 2026-01-23 16:52:31 +10:00
sd1_clip_config.json
sd1_clip.py
sd.py mp: wrap get_free_memory 2026-01-23 16:52:31 +10:00
sdxl_clip.py
supported_models_base.py
supported_models.py Support loading flux 2 klein checkpoints saved with SaveCheckpoint. (#12033) 2026-01-22 18:20:48 -05:00
utils.py move string_to_seed to utils.py 2026-01-23 16:52:31 +10:00