EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-04-30 12:22:37 +08:00

Author	SHA1	Message	Date
Rattus	74584f69c6	fixes to pinning rework	2026-01-31 01:12:55 +10:00
Rattus	46f9ac1967	bump aimdo	2026-01-29 23:50:00 +10:00
Rattus	b1eb25b5c1	Go back to pre-pins Post pins dont really work for low spec users and you are more likely to recycle your model with a different lora than to really care about that tiny little bit of perf of pre-computed Lora. Do it the old way.	2026-01-29 23:48:27 +10:00
Rattus	bc80f784d8	Fix ram freeing logic	2026-01-29 23:48:00 +10:00
Rattus	8067cb4f93	mm: dont clear_cache with mempools Two things. * pyt2.7 crashes if you try and clear_cache in the presence of mempools. * mempools don't actually ever clear_cache because the mempool itself is considered a ref. Guard the code accordingly and remove useless clear_cache calls. The offload stream resizer will need some fixing.	2026-01-29 01:42:38 +10:00
Rattus	f8f9a89f6e	bump aimdo to 1.4	2026-01-27 18:58:52 +10:00
Rattus	dff1ee9351	free dynamic pins properly	2026-01-27 18:57:40 +10:00
Rattus	101367b0da	mm: redefine free memory for Windows As commented.	2026-01-27 18:57:40 +10:00
Rattus	cd085314f9	ops: dont discard pins Its more likely that the user will rerun their workflow and want whatever pins are inplace so remove this. pins still have to respect RAM pressure per model anyway.	2026-01-27 18:57:40 +10:00
Rattus	04141efe54	mm: Dont GPU load models Aimdo will do this on demand as 0 copy. Remove the special case for vram > ram.	2026-01-27 18:57:40 +10:00
Rattus	2a76ec6e03	fix missing import	2026-01-27 18:57:40 +10:00
Rattus	f98c86ce9d	add missing signature set for non comfy	2026-01-27 18:57:40 +10:00
Rattus	4c875a2a8f	fix syncs Fix these sync to conditionalize properly for CPU and always run in exception flows.	2026-01-27 18:57:40 +10:00
Rattus	8bb291ba17	disable async pin population	2026-01-27 18:57:40 +10:00
Rattus	355172fe7e	remove bad pyt2.4 versions gate	2026-01-27 18:57:40 +10:00
Rattus	ede3d4b966	MPDynamic: Add support for model defined dtype If the model defines a dtype that is different to what is in the state dict, respect that at load time. This is done as part of the casting process.	2026-01-27 18:57:40 +10:00
Rattus	36c76527de	ops: fix __init__ return	2026-01-27 18:57:40 +10:00
Rattus	d1778d8085	archive the model defined dtypes Scan created models and save off the dtypes as defined by the model creation process. This is needed for assign=True, which will override the dtypes.	2026-01-27 18:57:40 +10:00
Rattus	49809b7b2d	mp: big bump on the VBAR sizes Now that the model defined dtype is decoupled from the state_dict dtypes we need to be able to handle worst case scenario casts between the SD and VBAR.	2026-01-27 18:57:40 +10:00
Rattus	12263b7fbf	ruff	2026-01-27 18:57:40 +10:00
Rattus	e54440a0c7	nodes_model_patch: fix copy-paste coding error	2026-01-27 18:57:40 +10:00
Rattus	f3854f6d2e	mp: handle blank __new__ call This is needed for deepcopy construction. We shouldnt really have deep copies of MP or MODynamic however this is a stay one in some controlnet flows.	2026-01-27 18:57:40 +10:00
Rattus	322d917991	mm: remove left over hooks draft code This is phase 2	2026-01-27 18:57:40 +10:00
Rattus	607d15cad6	execution: remove per node gc.collect() This isn't worth it and the likelyhood of inference leaving a complex data-structure with cyclic reference behind is now. Remove it. We would replace it with a condition on nodes that actually touch the GPU which might be win.	2026-01-27 18:57:40 +10:00
Rattus	cecf8c55f2	implement lightweight safetensors with READ mmap The CoW MMAP as used by safetensors is hardcoded to CoW which forcibly consumes windows commit charge on a zero copy. RIP. Implement safetensors in pytorch itself with a READ mmap to not get commit charged for all our open models.	2026-01-27 18:57:40 +10:00
Rattus	2f29e215f3	ops: defer creation of the parameters until state dict load If running on Windows, defer creation of the layer parameters until the state dict is loaded. This avoids a massive charge in windows commit charge spike when a model is created and not loaded. This problem doesnt exist on Linux as linux allows RAM overcommit, however windows does not. Before dynamic memory work this was also a non issue as every non-quant model would just immediate RAM load and need the memory anyway. Make the workaround windows specific, as there may be someone out there with some training from scratch workflow (which this might break), and assume said someone is on Linux.	2026-01-27 18:57:40 +10:00
Rattus	b0580b8393	remove junk arg	2026-01-27 18:57:40 +10:00
Rattus	5684c678da	aimdo version bump	2026-01-27 18:57:38 +10:00
Rattus	3908056730	main: Rework aimdo into process Be more tolerant of unsupported platforms and fallback properly. Fixes crash when cuda is not installed at all.	2026-01-27 18:57:21 +10:00
Rattus	f3021770a4	sampling: improve progress meter accuracy for dynamic loading	2026-01-27 18:57:21 +10:00
Rattus	0983fb88cc	clip: support assign load when taking clip from a ckpt	2026-01-27 18:57:21 +10:00
Rattus	9f701f69dc	sd: empty cache on tiler fallback This is needed for aimdo where the cache cant self recover from fragmentation. It is however a good thing to do anyway after an OOM so make it unconditional.	2026-01-27 18:57:21 +10:00
Rattus	01ca403bed	ruff	2026-01-27 18:57:21 +10:00
Rattus	7a18963a33	misc cleanup	2026-01-27 18:57:19 +10:00
Rattus	e8c9977973	add missing del on unpin	2026-01-27 18:56:53 +10:00
Rattus	e2d62b8f80	write better tx commentary	2026-01-27 18:56:53 +10:00
Rattus	ff434ea98c	mm: fix sync Sync before deleting anything.	2026-01-27 18:56:53 +10:00
Rattus	04bf6ef0de	main: Go live with --fast dynamic_vram Add the optional command line switch --fast dynamic_vram. This is mutually exclusing --high-vram and --gpu-only which contradict aimdos underlying feature. Add appropriate installation warning and a startup message, match the comfy debug level inconfiguring aimdo. Add comfy-aimdo pip requirement. This will safely stub to a nop for unsupported platforms.	2026-01-27 18:56:50 +10:00
Rattus	469d7a62de	execution: add aimdo primary pytorch cache integration We need to general pytorch cache defragmentation on an appropriate level for aimdo. Do in here on the per node basis, which has a reasonable chance of purging stale shapes out of the pytorch caching allocator and saving VRAM without costing too much garbage collector thrash. This looks like a lot of GC but because aimdo never fails from pytorch and saves the pytorch allocator from ever need to defrag out of demand, but it needs a oil change every now and then so we gotta do it. Doing it here also means the pytorch temps are cleared from task manager VRAM usage so user anxiety can go down a little when they see their vram drop back at the end of workflows inline with inference usage (rather than assuming full VRAM leaks).	2026-01-27 18:56:10 +10:00
Rattus	c862c42311	models: Use CoreModelPatcher Use CoreModelPatcher for all internal ModelPatcher implementations. This drives conditional use of the aimdo feature, while making sure custom node packs get to keep ModelPatcher unchanged for the moment.	2026-01-27 18:56:10 +10:00
Rattus	6a8255f0c5	ops/mp: implement aimdo Implement a model patcher and caster for aimdo. A new ModelPatcher implementation which backs onto comfy-aimdo to implement varying model load levels that can be adjusted during model use. The patcher defers all load processes to lazily load the model during use (e.g. the first step of a ksampler) and automatically negotiates a load level during the inference to maximize VRAM usage without OOMing. If inference requires more VRAM than is available weights are offloaded to make space before the OOM happens. As for loading the weight onto the GPU, that happens via comfy_cast_weights which is now used in all cases. cast_bias_weight checks whether the VBAR assigned to the model has space for the weight (based on the same load priority semantics as the original ModelPatcher). If it does, the VRAM as returned by the Aimdo allocator is used as the parameter GPU side. The caster is responsible for populating the weight data. This is done using the usual offload_stream (which mean we now have asynchronous load overlapping first use compute). Pinning works a little differently. When a weight is detected during load as unable to fit, a pin is allocated at the time of casting and the weight as used by the layer is DMAd back to the the pin using the GPU DMA TX engine, also using the asynchronous offload streams. This means you get to pin the Lora modified and requantized weights which can be a major speedup for offload+quantize+lora use cases, This works around the JIT Lora + FP8 exclusion and brings FP8MM to heavy offloading users (who probably really need it with more modest GPUs). There is a performance risk in that a CPU+RAM patch has been replace with a GPU+RAM patch but my initial performance results look good. Most users as likely to have a GPU that outruns their CPU in these woods. Some common code is written to consolidate a layers tensors for aimdo mapping, pinning, and DMA transfers. interpret_gathered_like() allows unpacking a raw buffer as a set of tensors. This is used consistently to bundle and pack weights, quantization metadata (QuantizedTensor bits) and biases into one payload for DMA in the load process reducing Cuda overhead a little. Some Quantization metadata was missing async offload is some cases which is now added. This also pins quantization metadata and consolidates the number of cuda_host_register calls (which can be expensive).	2026-01-27 18:56:10 +10:00
Rattus	594b472ca9	mp: add mode for non comfy weight prioritization non-comfy weights dont get async offload and a few other performance limitations. Load them at top priority accordingly.	2026-01-27 18:56:10 +10:00
Rattus	13a7b68ad7	mp/mm: APi expansions for dynamic loading Add two api expansions, a flag for whether a model patcher is dynamic a a very basic RAM freeing system. Implement the semantics of the dynamic model patcher which never frees VRAM ahead of time for the sake of another dynamic model patcher. At the same time add an API for clearing out pins on a reservation of model size x2 heuristic, as pins consume RAM in their own right in the dynamic patcher. This is actually less about OOMing RAM and more about performance, as with assign=True load semantics there needs to be plenty headroom for the OS to load models to dosk cache on demand so err on the side of kicking old pins out.	2026-01-27 18:56:10 +10:00
Rattus	b6fd3dc2eb	mp: wrap get_free_memory Dynamic load needs to adjust these numbers based on future movements, so wrap this in a MP API.	2026-01-27 18:56:10 +10:00
Rattus	3c2ce0d58d	pinned_memory: add python Add a python for managing pinned memory of the weight/bias module level. This allocates, pins and attached a tensor to a module for the pin for this module. It does not set the weight, just allocates a singular ram buffer for population and bulk DMA transfer.	2026-01-27 18:56:10 +10:00
Rattus	37567cb0d1	move string_to_seed to utils.py This needs to be visible by ops which may want to do stochastic rounding on the fly.	2026-01-27 18:56:10 +10:00
Rattus	a08aed2d7e	mm: Implement cast buffer allocations	2026-01-27 18:56:10 +10:00
Rattus	db78623796	ops: Do bias dtype conversion on compute stream For consistency with weights.	2026-01-27 18:56:10 +10:00
Rattus	daaeb5c96c	Reduce RAM and compute time in model saving with Loras Get the model saving logic away from force_patch_weights and instead do the patching JIT during safetensors saving. Firstly switch off force_patch_weights in the load for save which avoids creating CPU side tensors with loras calculated. Then at save time, wrap the tensor to catch safetensors call to .to() and patch it live. This avoids having to ever have a lora-calculated copy of offloaded weights on the CPU. Also take advantage of the presence of the GPU when doing this Lora calculation. The former force_patch_weights would just do eveyrthing on the CPU. Its generally faster to go the GPU and back even if its just a Lora application.	2026-01-27 18:56:10 +10:00
comfyanonymous	09725967cf	ComfyUI version v0.11.0 Some checks failed Execution Tests / test (ubuntu-latest) (push) Waiting to run Details Execution Tests / test (windows-latest) (push) Waiting to run Details Test server launches without errors / test (push) Waiting to run Details Unit Tests / test (macos-latest) (push) Waiting to run Details Unit Tests / test (ubuntu-latest) (push) Waiting to run Details Unit Tests / test (windows-2022) (push) Waiting to run Details Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details Build package / Build Test (3.10) (push) Has been cancelled Details Build package / Build Test (3.11) (push) Has been cancelled Details Build package / Build Test (3.14) (push) Has been cancelled Details Build package / Build Test (3.12) (push) Has been cancelled Details Build package / Build Test (3.13) (push) Has been cancelled Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run Details Execution Tests / test (macos-latest) (push) Waiting to run Details	2026-01-26 23:08:01 -05:00

1 2 3 4 5 ...

4668 Commits