EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-02-17 08:52:34 +08:00

Author	SHA1	Message	Date
Jedrzej Kosinski	61133af772	Add '--flipflop-offload' startup argument	2025-10-13 21:10:44 -07:00
Jedrzej Kosinski	586a8de8da	Merge branch 'master' into flipflop-stream	2025-10-13 21:04:37 -07:00
comfyanonymous	3374e900d0	Faster workflow cancelling. (#10301 )	2025-10-13 23:43:53 -04:00
comfyanonymous	dfff7e5332	Better memory estimation for the SD/Flux VAE on AMD. (#10334 )	2025-10-13 22:37:19 -04:00
comfyanonymous	e4ea393666	Fix loading old stable diffusion ckpt files on newer numpy. (#10333 )	2025-10-13 22:18:58 -04:00
comfyanonymous	c8674bc6e9	Enable RDNA4 pytorch attention on ROCm 7.0 and up. (#10332 )	2025-10-13 21:19:03 -04:00
rattus128	95ca2e56c8	WAN2.2: Fix cache VRAM leak on error (#10308 ) Same change pattern as `7e8dd275c2` applied to WAN2.2 If this suffers an exception (such as a VRAM oom) it will leave the encode() and decode() methods which skips the cleanup of the WAN feature cache. The comfy node cache then ultimately keeps a reference this object which is in turn reffing large tensors from the failed execution. The feature cache is currently setup at a class variable on the encoder/decoder however, the encode and decode functions always clear it on both entry and exit of normal execution. Its likely the design intent is this is usable as a streaming encoder where the input comes in batches, however the functions as they are today don't support that. So simplify by bringing the cache back to local variable, so that if it does VRAM OOM the cache itself is properly garbage when the encode()/decode() functions dissappear from the stack.	2025-10-13 15:23:11 -04:00
comfyanonymous	e693e4db6a	Always set diffusion model to eval() mode. (#10331 )	2025-10-13 14:57:27 -04:00
comfyanonymous	a125cd84b0	Improve AMD performance. (#10302 ) I honestly have no idea why this improves things but it does.	2025-10-12 00:28:01 -04:00
comfyanonymous	84e9ce32c6	Implement the mmaudio VAE. (#10300 )	2025-10-11 22:57:23 -04:00
comfyanonymous	f1dd6e50f8	Fix bug with applying loras on fp8 scaled without fp8 ops. (#10279 )	2025-10-09 19:02:40 -04:00
comfyanonymous	139addd53c	More surgical fix for #10267 (#10276 )	2025-10-09 16:37:35 -04:00
comfyanonymous	6e59934089	Refactor model sampling sigmas code. (#10250 )	2025-10-08 17:49:02 -04:00
comfyanonymous	8aea746212	Implement gemma 3 as a text encoder. (#10241 ) Not useful yet.	2025-10-06 22:08:08 -04:00
comfyanonymous	195e0b0639	Remove useless code. (#10223 )	2025-10-05 15:41:19 -04:00
Jedrzej Kosinski	5329180fce	Made flipflop consider partial_unload, partial_offload, and add flip+flop to mem counters	2025-10-03 16:21:01 -07:00
Jedrzej Kosinski	0fdd327c2f	Merge branch 'master' into flipflop-stream	2025-10-03 14:32:56 -07:00
Finn-Hecker	93d859cfaa	Fix type annotation syntax in MotionEncoder_tc __init__ (#10186 ) ## Summary Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list. ## Changes - Line 647: Changed `num_heads=int` to `num_heads: int` - This corrects the parameter annotation from a default value assignment to proper type hint syntax ## Details The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.	2025-10-03 14:32:19 -07:00
Jedrzej Kosinski	ee01002e63	Add flipflop support to (base) WAN, fix issue with applying loras to flipflop weights being done on CPU instead of GPU, left some timing functions as the lora application time could use some reduction	2025-10-02 22:02:50 -07:00
Jedrzej Kosinski	831c3cf05e	Add a temporary workaround for odd amount of blocks not producing expected results	2025-10-02 20:29:11 -07:00
Jedrzej Kosinski	0d8e8abd90	Default ro smaller blocks getting flipflopped first	2025-10-02 18:00:21 -07:00
Jedrzej Kosinski	d5001ed90e	Make flux support flipflop	2025-10-02 17:53:22 -07:00
Jedrzej Kosinski	8d7b22b720	Fixed FlipFlipModule.execute_blocks having hardcoded strings from Qwen	2025-10-02 17:49:43 -07:00
Jedrzej Kosinski	6d3ec9fcf3	Simplified flipflop setup by adding FlipFlopModule.execute_blocks helper	2025-10-02 16:46:37 -07:00
Jedrzej Kosinski	c4420b6a41	Change log string slightly	2025-10-02 15:34:35 -07:00
Jedrzej Kosinski	a282586995	Merge branch 'master' into flipflop-stream	2025-10-02 15:03:26 -07:00
Jedrzej Kosinski	0df61b5032	Fix improper index slicing for flipflop get blocks, add extra log message	2025-10-01 21:21:36 -07:00
Jedrzej Kosinski	7c896c5567	Initial automatic support for flipflop within ModelPatcher - only Qwen Image diffusion_model uses FlipFlopModule currently	2025-10-01 20:13:50 -07:00
rattus128	4965c0e2ac	WAN: Fix cache VRAM leak on error (#10141 ) If this suffers an exception (such as a VRAM oom) it will leave the encode() and decode() methods which skips the cleanup of the WAN feature cache. The comfy node cache then ultimately keeps a reference this object which is in turn reffing large tensors from the failed execution. The feature cache is currently setup at a class variable on the encoder/decoder however, the encode and decode functions always clear it on both entry and exit of normal execution. Its likely the design intent is this is usable as a streaming encoder where the input comes in batches, however the functions as they are today don't support that. So simplify by bringing the cache back to local variable, so that if it does VRAM OOM the cache itself is properly garbage when the encode()/decode() functions dissappear from the stack.	2025-10-01 18:42:16 -04:00
rattus128	911331c06c	sd: fix VAE tiled fallback VRAM leak (#10139 ) When the VAE catches this VRAM OOM, it launches the fallback logic straight from the exception context. Python however refs the entire call stack that caused the exception including any local variables for the sake of exception report and debugging. In the case of tensors, this can hold on the references to GBs of VRAM and inhibit the VRAM allocated from freeing them. So dump the except context completely before going back to the VAE via the tiler by getting out of the except block with nothing but a flag. The greately increases the reliability of the tiler fallback, especially on low VRAM cards, as with the bug, if the leak randomly leaked more than the headroom needed for a single tile, the tiler would fallback would OOM and fail the flow.	2025-10-01 18:40:28 -04:00
comfyanonymous	a6f83a4a1a	Support the new hunyuan vae. (#10150 )	2025-10-01 17:19:13 -04:00
Jedrzej Kosinski	01f4512bf8	In-progress commit on making flipflop async weight streaming native, made loaded partially/loaded completely log messages have labels because having to memorize their meaning for dev work is annoying	2025-09-30 23:08:08 -07:00
Jedrzej Kosinski	8a8162e8da	Fix percentage logic, begin adding elements to ModelPatcher to track flip flop compatibility	2025-09-29 22:49:12 -07:00
Jedrzej Kosinski	0e966dcf85	Merge branch 'master' into flipflop-stream	2025-09-27 21:13:26 -07:00
rattus128	653ceab414	Reduce Peak WAN inference VRAM usage - part II (#10062 ) * flux: math: Use _addcmul to avoid expensive VRAM intermediate The rope process can be the VRAM peak and this intermediate for the addition result before releasing the original can OOM. addcmul_ it. * wan: Delete the self attention before cross attention This saves VRAM when the cross attention and FFN are in play as the VRAM peak.	2025-09-27 18:14:16 -04:00
Jedrzej Kosinski	196954ab8c	Add 'input_cond' and 'input_uncond' to the args dictionary passed into sampler_cfg_function (#10044 )	2025-09-26 19:55:03 -07:00
comfyanonymous	1e098d6132	Don't add template to qwen2.5vl when template is in prompt. (#10043 ) Make the hunyuan image refiner template_end 36.	2025-09-26 18:34:17 -04:00
Jedrzej Kosinski	6b240b0bce	Refactored old flip flop into a new implementation that allows for controlling the percentage of blocks getting flip flopped, converted nodes to v3 schema	2025-09-25 22:41:41 -07:00
Jedrzej Kosinski	f9fbf902d5	Added missing Qwen block params, further subdivided blocks function	2025-09-25 17:49:39 -07:00
Jedrzej Kosinski	f083720eb4	Refactored FlipFlopTransformer.__call__ to fully separate out actions between flip and flop	2025-09-25 16:16:51 -07:00
Jedrzej Kosinski	84e73f2aa5	Brought over flip flop prototype from contentis' fork, limiting it to only Qwen to ease the process of adapting it to be a native feature	2025-09-25 16:15:46 -07:00
Guy Niv	c8d2117f02	Fix memory leak by properly detaching model finalizer (#9979 ) When unloading models in load_models_gpu(), the model finalizer was not being explicitly detached, leading to a memory leak. This caused linear memory consumption increase over time as models are repeatedly loaded and unloaded. This change prevents orphaned finalizer references from accumulating in memory during model switching operations.	2025-09-24 22:35:12 -04:00
comfyanonymous	fccab99ec0	Fix issue with .view() in HuMo. (#10014 )	2025-09-24 20:09:42 -04:00
comfyanonymous	1fee8827cb	Support for qwen edit plus model. Use the new TextEncodeQwenImageEditPlus. (#9986 )	2025-09-22 16:49:48 -04:00
comfyanonymous	d1d9eb94b1	Lower wan memory estimation value a bit. (#9964 ) Previous pr reduced the peak memory requirement.	2025-09-20 22:09:35 -04:00
Kohaku-Blueleaf	7be2b49b6b	Fix LoRA Trainer bugs with FP8 models. (#9854 ) * Fix adapter weight init * Fix fp8 model training * Avoid inference tensor	2025-09-20 21:24:48 -04:00
comfyanonymous	e8df53b764	Update WanAnimateToVideo to more easily extend videos. (#9959 )	2025-09-19 18:48:56 -04:00
comfyanonymous	dc95b6acc0	Basic WIP support for the wan animate model. (#9939 )	2025-09-19 03:07:17 -04:00
comfyanonymous	24b0fce099	Do padding of audio embed in model for humo for more flexibility. (#9935 )	2025-09-18 19:54:16 -04:00
DELUXA	8d6653fca6	Enable fp8 ops by default on gfx1200 (#9926 )	2025-09-18 19:50:37 -04:00

1 2 3 4 5 ...

1748 Commits