EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-05-25 16:37:23 +08:00

History

Rattus 6a8255f0c5 ops/mp: implement aimdo Implement a model patcher and caster for aimdo. A new ModelPatcher implementation which backs onto comfy-aimdo to implement varying model load levels that can be adjusted during model use. The patcher defers all load processes to lazily load the model during use (e.g. the first step of a ksampler) and automatically negotiates a load level during the inference to maximize VRAM usage without OOMing. If inference requires more VRAM than is available weights are offloaded to make space before the OOM happens. As for loading the weight onto the GPU, that happens via comfy_cast_weights which is now used in all cases. cast_bias_weight checks whether the VBAR assigned to the model has space for the weight (based on the same load priority semantics as the original ModelPatcher). If it does, the VRAM as returned by the Aimdo allocator is used as the parameter GPU side. The caster is responsible for populating the weight data. This is done using the usual offload_stream (which mean we now have asynchronous load overlapping first use compute). Pinning works a little differently. When a weight is detected during load as unable to fit, a pin is allocated at the time of casting and the weight as used by the layer is DMAd back to the the pin using the GPU DMA TX engine, also using the asynchronous offload streams. This means you get to pin the Lora modified and requantized weights which can be a major speedup for offload+quantize+lora use cases, This works around the JIT Lora + FP8 exclusion and brings FP8MM to heavy offloading users (who probably really need it with more modest GPUs). There is a performance risk in that a CPU+RAM patch has been replace with a GPU+RAM patch but my initial performance results look good. Most users as likely to have a GPU that outruns their CPU in these woods. Some common code is written to consolidate a layers tensors for aimdo mapping, pinning, and DMA transfers. interpret_gathered_like() allows unpacking a raw buffer as a set of tensors. This is used consistently to bundle and pack weights, quantization metadata (QuantizedTensor bits) and biases into one payload for DMA in the load process reducing Cuda overhead a little. Some Quantization metadata was missing async offload is some cases which is now added. This also pins quantization metadata and consolidates the number of cuda_host_register calls (which can be expensive).		2026-01-27 18:56:10 +10:00
..
audio_encoders	Support the HuMo model. (#9903 )	2025-09-17 00:12:48 -04:00
cldm	Add better error message for common error. (#10846 )	2025-11-23 04:55:22 -05:00
comfy_types	LoRA Trainer: LoRA training node in weight adapter scheme (#8446 )	2025-06-13 19:25:59 -04:00
extra_samplers	Uni pc sampler now works with audio and video models.	2025-01-18 05:27:58 -05:00
image_encoders	Add Hunyuan 3D 2.1 Support (#8714 )	2025-09-04 20:36:20 -04:00
k_diffusion	Fix noise with ancestral samplers when inferencing on cpu. (#11528 )	2025-12-26 22:03:01 -05:00
ldm	wan-vae: Switch off feature cache for single frame (#12090 )	2026-01-26 19:40:19 -05:00
sd1_tokenizer	Silence clip tokenizer warning. (#8934 )	2025-07-16 14:42:07 -04:00
t2i_adapter	Controlnet refactor.	2024-06-27 18:43:11 -04:00
taesd	Support LTX2 tiny vae (taeltx_2) (#11929 )	2026-01-21 23:03:51 -05:00
text_encoders	Fix mistral 3 tokenizer code failing on latest transformers version and other breakage. (#12095 )	2026-01-26 11:39:00 -05:00
weight_adapter	[Weight-adapter/Trainer] Bypass forward mode in Weight adapter system (#11958 )	2026-01-24 22:56:22 -05:00
checkpoint_pickle.py	Remove pytorch_lightning dependency.	2023-06-13 10:11:33 -04:00
cli_args.py	Add most basic Asset support for models (#11315 )	2026-01-08 22:21:51 -05:00
clip_config_bigg.json	Fix potential issue with non clip text embeddings.	2024-07-30 14:41:13 -04:00
clip_model.py	Support the siglip 2 naflex model as a clip vision model. (#11831 )	2026-01-12 17:05:54 -05:00
clip_vision_config_g.json	Add support for clip g vision model to CLIPVisionLoader.	2023-08-18 11:13:29 -04:00
clip_vision_config_h.json	Add support for unCLIP SD2.x models.	2023-04-01 23:19:15 -04:00
clip_vision_config_vitl_336_llava.json	Support llava clip vision model.	2025-03-06 00:24:43 -05:00
clip_vision_config_vitl_336.json	support clip-vit-large-patch14-336 (#4042 )	2024-07-17 13:12:50 -04:00
clip_vision_config_vitl.json	Add support for unCLIP SD2.x models.	2023-04-01 23:19:15 -04:00
clip_vision_siglip2_base_naflex.json	Support the siglip 2 naflex model as a clip vision model. (#11831 )	2026-01-12 17:05:54 -05:00
clip_vision_siglip_384.json	Support new flux model variants.	2024-11-21 08:38:23 -05:00
clip_vision_siglip_512.json	Support 512 siglip model.	2025-04-05 07:01:01 -04:00
clip_vision.py	Add image sizes to clip vision outputs. (#11923 )	2026-01-16 23:02:28 -05:00
conds.py	Add some warnings and prevent crash when cond devices don't match. (#9169 )	2025-08-04 04:20:12 -04:00
context_windows.py	Add handling for vace_context in context windows (#11386 )	2025-12-30 14:40:42 -08:00
controlnet.py	Fix Race condition in --async-offload that can cause corruption (#10501 )	2025-10-29 17:17:46 -04:00
diffusers_convert.py	Remove useless code.	2025-01-24 06:15:54 -05:00
diffusers_load.py	load_unet -> load_diffusion_model with a model_options argument.	2024-08-12 23:20:57 -04:00
float.py	Optimize nvfp4 lora applying. (#11866 )	2026-01-14 00:49:38 -05:00
gligen.py	Remove some useless code. (#8812 )	2025-07-06 07:07:39 -04:00
hooks.py	New Year ruff cleanup. (#11595 )	2026-01-01 22:06:14 -05:00
latent_formats.py	Make empty latent node work with other models. (#12062 )	2026-01-24 19:23:20 -05:00
lora_convert.py	Implement the USO subject identity lora. (#9674 )	2025-09-01 18:54:02 -04:00
lora.py	Support ModelScope-Trainer/DiffSynth LoRA format for Flux.2 Klein models (#12042 )	2026-01-23 15:27:49 -05:00
memory_management.py	mm: Implement cast buffer allocations	2026-01-27 18:56:10 +10:00
model_base.py	Reduce RAM and compute time in model saving with Loras	2026-01-27 18:56:10 +10:00
model_detection.py	Only enable fp16 on z image models that actually support it. (#12065 )	2026-01-24 22:32:28 -05:00
model_management.py	ops/mp: implement aimdo	2026-01-27 18:56:10 +10:00
model_patcher.py	ops/mp: implement aimdo	2026-01-27 18:56:10 +10:00
model_sampling.py	Refactor model sampling sigmas code. (#10250 )	2025-10-08 17:49:02 -04:00
nested_tensor.py	WIP way to support multi multi dimensional latents. (#10456 )	2025-10-23 21:21:14 -04:00
ops.py	ops/mp: implement aimdo	2026-01-27 18:56:10 +10:00
options.py	Only parse command line args when main.py is called.	2023-09-13 11:38:20 -04:00
patcher_extension.py	Fix order of inputs nested merge_nested_dicts (#10362 )	2025-10-15 16:47:26 -07:00
pinned_memory.py	pinned_memory: add python	2026-01-27 18:56:10 +10:00
pixel_space_convert.py	Changes to the previous radiance commit. (#9851 )	2025-09-13 18:03:34 -04:00
quant_ops.py	Optimize nvfp4 lora applying. (#11866 )	2026-01-14 00:49:38 -05:00
rmsnorm.py	Add warning when using old pytorch. (#9347 )	2025-08-15 00:22:26 -04:00
sample.py	Make regular empty latent node work properly on flux 2 variants. (#12050 )	2026-01-23 19:50:48 -05:00
sampler_helpers.py	skip_load_model -> force_full_load (#11390 )	2025-12-17 23:29:32 -05:00
samplers.py	mp: wrap get_free_memory	2026-01-27 18:56:10 +10:00
sd1_clip_config.json	Fix potential issue with non clip text embeddings.	2024-07-30 14:41:13 -04:00
sd1_clip.py	Fix mistral 3 tokenizer code failing on latest transformers version and other breakage. (#12095 )	2026-01-26 11:39:00 -05:00
sd.py	mp: wrap get_free_memory	2026-01-27 18:56:10 +10:00
sdxl_clip.py	Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803 )	2025-04-25 19:36:00 -04:00
supported_models_base.py	Fix some custom nodes. (#11134 )	2025-12-05 18:25:31 -05:00
supported_models.py	Only enable fp16 on z image models that actually support it. (#12065 )	2026-01-24 22:32:28 -05:00
utils.py	move string_to_seed to utils.py	2026-01-27 18:56:10 +10:00