EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-01-11 06:40:48 +08:00

Author	SHA1	Message	Date
patientx	6420b47885	Merge branch 'comfyanonymous:master' into master	2025-12-07 14:10:24 +03:00
comfyanonymous	50ca97e776	Speed up lora compute and lower memory usage by doing it in fp16. (#11161 )	2025-12-06 18:36:20 -05:00
patientx	f073f115e0	Merge branch 'comfyanonymous:master' into master	2025-11-29 18:19:41 +03:00
rattus	0ff0457892	mm: wrap the raw stream in context manager (#10958 ) The documentation of torch.foo.Stream being usable with with: suggests it starts at version 2.7. Use the old API for backwards compatibility.	2025-11-28 16:38:12 -05:00
comfyanonymous	f55c98a89f	Disable offload stream when torch compile. (#10961 )	2025-11-28 16:16:46 -05:00
patientx	4094cbf867	Merge branch 'comfyanonymous:master' into master	2025-11-28 13:43:35 +03:00
comfyanonymous	9d8a817985	Enable async offloading by default on Nvidia. (#10953 ) Some checks failed Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run Details Execution Tests / test (macos-latest) (push) Waiting to run Details Execution Tests / test (ubuntu-latest) (push) Waiting to run Details Execution Tests / test (windows-latest) (push) Waiting to run Details Test server launches without errors / test (push) Waiting to run Details Unit Tests / test (macos-latest) (push) Waiting to run Details Unit Tests / test (ubuntu-latest) (push) Waiting to run Details Unit Tests / test (windows-2022) (push) Waiting to run Details Build package / Build Test (3.10) (push) Has been cancelled Details Build package / Build Test (3.11) (push) Has been cancelled Details Build package / Build Test (3.12) (push) Has been cancelled Details Build package / Build Test (3.13) (push) Has been cancelled Details Build package / Build Test (3.9) (push) Has been cancelled Details Add --disable-async-offload to disable it. If this causes OOMs that go away when you --disable-async-offload please report it.	2025-11-27 17:46:12 -05:00
patientx	da822b5057	Merge branch 'comfyanonymous:master' into master	2025-11-27 14:43:23 +03:00
rattus	f17251bec6	Account for the VRAM cost of weight offloading (#10733 ) Some checks are pending Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run Details Execution Tests / test (macos-latest) (push) Waiting to run Details Execution Tests / test (ubuntu-latest) (push) Waiting to run Details Execution Tests / test (windows-latest) (push) Waiting to run Details Test server launches without errors / test (push) Waiting to run Details Unit Tests / test (macos-latest) (push) Waiting to run Details Unit Tests / test (ubuntu-latest) (push) Waiting to run Details Unit Tests / test (windows-2022) (push) Waiting to run Details * mm: default to 0 for NUM_STREAMS Dont count the compute stream as an offload stream. This makes async offload accounting easier. * mm: remove 128MB minimum This is from a previous offloading system requirement. Remove it to make behaviour of the loader and partial unloader consistent. * mp: order the module list by offload expense Calculate an approximate offloading temporary VRAM cost to offload a weight and primary order the module load list by that. In the simple case this is just the same as the module weight, but with Loras, a weight with a lora consumes considerably more VRAM to do the Lora application on-the-fly. This will slightly prioritize lora weights, but is really for proper VRAM offload accounting. * mp: Account for the VRAM cost of weight offloading when checking the VRAM headroom, assume that the weight needs to be offloaded, and only load if it has space for both the load and offload * the number of streams. As the weights are ordered from largest to smallest by offload cost this is guaranteed to fit in VRAM (tm), as all weights that follow will be smaller. Make the partial unload aware of this system as well by saving the budget for offload VRAM to the model state and accounting accordingly. Its possible that partial unload increases the size of the largest offloaded weights, and thus needs to unload a little bit more than asked to accomodate the bigger temp buffers. Honor the existing codes floor on model weight loading of 128MB by having the patcher honor this separately withough regard to offloading. Otherwise when MM specifies its 128MB minimum, MP will see the biggest weights, and budget that 128MB to only offload buffer and load nothing which isnt the intent of these minimums. The same clamp applies in case of partial offload of the currently loading model.	2025-11-27 01:03:03 -05:00
patientx	de43f7b30d	Merge branch 'comfyanonymous:master' into master	2025-11-25 14:05:26 +03:00
comfyanonymous	b6805429b9	Allow pinning quantized tensors. (#10873 )	2025-11-25 02:48:20 -05:00
patientx	00609a5102	Merge branch 'comfyanonymous:master' into master	2025-11-13 00:55:19 +03:00
rattus	18e7d6dba5	mm/mp: always unload re-used but modified models (#10724 ) The partial unloader path in model re-use flow skips straight to the actual unload without any check of the patching UUID. This means that if you do an upscale flow with a model patch on an existing model, it will not apply your patchings. Fix by delaying the partial_unload until after the uuid checks. This is done by making partial_unload a model of partial_load where extra_mem is -ve.	2025-11-12 16:19:53 -05:00
patientx	644778be49	Merge branch 'comfyanonymous:master' into master	2025-11-12 17:55:30 +03:00
comfyanonymous	1199411747	Don't pin tensor if not a torch.nn.parameter.Parameter (#10718 )	2025-11-11 19:33:30 -05:00
patientx	3662d0a2ce	Merge branch 'comfyanonymous:master' into master	2025-11-10 14:05:53 +03:00
comfyanonymous	dea899f221	Unload weights if vram usage goes up between runs. (#10690 )	2025-11-09 18:51:33 -05:00
patientx	8e02689534	Merge branch 'comfyanonymous:master' into master	2025-11-07 20:30:21 +03:00
comfyanonymous	a1a70362ca	Only unpin tensor if it was pinned by ComfyUI (#10677 )	2025-11-07 11:15:05 -05:00
patientx	d29dbbd829	Merge branch 'comfyanonymous:master' into master	2025-11-07 14:27:13 +03:00
rattus	cf97b033ee	mm: guard against double pin and unpin explicitly (#10672 ) As commented, if you let cuda be the one to detect double pin/unpinning it actually creates an asyc GPU error.	2025-11-06 21:20:48 -05:00
patientx	3ab45ae725	Merge branch 'comfyanonymous:master' into master	2025-11-06 15:35:41 +03:00
comfyanonymous	09dc24c8a9	Pinned mem also seems to work on AMD. (#10658 )	2025-11-05 19:11:15 -05:00
comfyanonymous	1d69245981	Enable pinned memory by default on Nvidia. (#10656 ) Removed the --fast pinned_memory flag. You can use --disable-pinned-memory to disable it. Please report if it causes any issues.	2025-11-05 18:08:13 -05:00
patientx	84faf45f09	Merge branch 'comfyanonymous:master' into master	2025-11-05 13:07:02 +03:00
comfyanonymous	7f3e4d486c	Limit amount of pinned memory on windows to prevent issues. (#10638 )	2025-11-04 17:37:50 -05:00
patientx	7907b8d6be	Merge branch 'comfyanonymous:master' into master	2025-10-30 03:16:55 +03:00
rattus	ab7ab5be23	Fix Race condition in --async-offload that can cause corruption (#10501 ) * mm: factor out the current stream getter Make this a reusable function. * ops: sync the offload stream with the consumption of w&b This sync is nessacary as pytorch will queue cuda async frees on the same stream as created to tensor. In the case of async offload, this will be on the offload stream. Weights and biases can go out of scope in python which then triggers the pytorch garbage collector to queue the free operation on the offload stream possible before the compute stream has used the weight. This causes a use after free on weight data leading to total corruption of some workflows. So sync the offload stream with the compute stream after the weight has been used so the free has to wait for the weight to be used. The cast_bias_weight is extended in a backwards compatible way with the new behaviour opt-in on a defaulted parameter. This handles custom node packs calling cast_bias_weight and defeatures async-offload for them (as they do not handle the race). The pattern is now: cast_bias_weight(... , offloadable=True) #This might be offloaded thing(weight, bias, ...) uncast_bias_weight(...) * controlnet: adopt new cast_bias_weight synchronization scheme This is nessacary for safe async weight offloading. * mm: sync the last stream in the queue, not the next Currently this peeks ahead to sync the next stream in the queue of streams with the compute stream. This doesnt allow a lot of parallelization, as then end result is you can only get one weight load ahead regardless of how many streams you have. Rotate the loop logic here to synchronize the end of the queue before returning the next stream. This allows weights to be loaded ahead of the compute streams position.	2025-10-29 17:17:46 -04:00
patientx	d8528ac31e	Merge branch 'comfyanonymous:master' into master	2025-10-29 12:42:07 +03:00
comfyanonymous	3fa7a5c04a	Speed up offloading using pinned memory. (#10526 ) To enable this feature use: --fast pinned_memory	2025-10-29 00:21:01 -04:00
patientx	8590e1f713	Merge branch 'comfyanonymous:master' into master	2025-10-26 14:29:29 +03:00
comfyanonymous	098a352f13	Add warning for torch-directml usage (#10482 ) Added a warning message about the state of torch-directml.	2025-10-25 20:05:22 -04:00
comfyanonymous	426cde37f1	Remove useless function (#10472 )	2025-10-24 19:56:51 -04:00
patientx	d4bcb93575	Merge branch 'comfyanonymous:master' into master	2025-10-22 11:34:33 +03:00
comfyanonymous	9cdc64998f	Only disable cudnn on newer AMD GPUs. (#10437 )	2025-10-21 19:15:23 -04:00
patientx	5bf1c8be44	Merge branch 'comfyanonymous:master' into master	2025-10-21 03:49:14 +03:00
comfyanonymous	2c2aa409b0	Log message for cudnn disable on AMD. (#10418 )	2025-10-20 15:43:24 -04:00
patientx	657a7872ab	Merge branch 'comfyanonymous:master' into master	2025-10-19 15:20:17 +03:00
comfyanonymous	5b80addafd	Turn off cuda malloc by default when --fast autotune is turned on. (#10393 )	2025-10-18 22:35:46 -04:00
patientx	26589a3a0b	Merge branch 'comfyanonymous:master' into master	2025-10-15 12:18:21 +03:00
comfyanonymous	1c10b33f9b	gfx942 doesn't support fp8 operations. (#10348 )	2025-10-15 00:21:11 -04:00
comfyanonymous	c8674bc6e9	Enable RDNA4 pytorch attention on ROCm 7.0 and up. (#10332 )	2025-10-13 21:19:03 -04:00
patientx	fa7942933b	Merge branch 'comfyanonymous:master' into master	2025-10-12 13:56:39 +03:00
comfyanonymous	a125cd84b0	Improve AMD performance. (#10302 ) I honestly have no idea why this improves things but it does.	2025-10-12 00:28:01 -04:00
patientx	258da26c98	Merge branch 'comfyanonymous:master' into master	2025-09-25 15:08:16 +03:00
Guy Niv	c8d2117f02	Fix memory leak by properly detaching model finalizer (#9979 ) When unloading models in load_models_gpu(), the model finalizer was not being explicitly detached, leading to a memory leak. This caused linear memory consumption increase over time as models are repeatedly loaded and unloaded. This change prevents orphaned finalizer references from accumulating in memory during model switching operations.	2025-09-24 22:35:12 -04:00
patientx	c62e820d45	Merge branch 'comfyanonymous:master' into master	2025-09-20 01:51:06 +03:00
DELUXA	8d6653fca6	Enable fp8 ops by default on gfx1200 (#9926 )	2025-09-18 19:50:37 -04:00
patientx	b46622ffa5	Merge branch 'comfyanonymous:master' into master	2025-09-08 11:14:04 +03:00
comfyanonymous	fb763d4333	Fix amd_min_version crash when cpu device. (#9754 )	2025-09-07 21:16:29 -04:00

1 2 3 4 5 ...

442 Commits