EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-07-09 07:57:30 +08:00

Author	SHA1	Message	Date
Dustin	8c374c8b90	Fix SageAttention crash after PR #10276 fp8 weight scaling changes Problem: After PR #10276 (commit `139addd5`) introduced convert_func/set_func for proper fp8 weight scaling during LoRA application, users with SageAttention enabled experience 100% reproducible crashes (Exception 0xC0000005 ACCESS_VIOLATION) during KSampler execution. Root Cause: PR #10276 added fp8 weight transformations (scale up -> apply LoRA -> scale down) to fix LoRA quality with Wan 2.1/2.2 14B fp8 models. These transformations: 1. Convert weights to float32 and create copies (new memory addresses) 2. Invalidate tensor metadata that SageAttention cached 3. Break SageAttention's internal memory references 4. Cause access violation when SageAttention tries to use old pointers SageAttention expects weights at original memory addresses without transformations between caching and usage. Solution: Add conditional bypass in LowVramPatch.__call__ to detect when SageAttention is active (via --use-sage-attention flag) and skip convert_func/set_func calls. This preserves SageAttention's memory reference stability while maintaining PR #10276 benefits for users without SageAttention. Trade-offs: - When SageAttention is enabled with fp8 models + LoRAs, LoRAs are applied to scaled weights instead of properly scaled weights - Potential quality impact unknown (no issues observed in testing) - Only affects users who explicitly enable SageAttention flag - Users without SageAttention continue to benefit from PR #10276 Testing Completed: - RTX 5090, CUDA 12.8, PyTorch 2.7.0, SageAttention 2.1.1 - Wan 2.2 fp8 models with multiple LoRAs - Crash eliminated, ~40% SageAttention performance benefit preserved - No visual quality degradation observed - Non-SageAttention workflows unaffected Testing Requested: - Other GPU architectures (RTX 4090, 3090, etc.) - Different CUDA/PyTorch version combinations - fp8 LoRA quality comparison with SageAttention enabled - Edge cases: mixed fp8/non-fp8 workflows Files Changed: - comfy/model_patcher.py: LowVramPatch.__call__ method Related: - Issue: SageAttention incompatibility with fp8 weight scaling - Original PR: #10276 (fp8 LoRA quality fix for Wan models) - SageAttention: https://github.com/thu-ml/SageAttention	2025-10-12 02:40:30 -04:00
comfyanonymous	a125cd84b0	Improve AMD performance. (#10302 ) I honestly have no idea why this improves things but it does.	2025-10-12 00:28:01 -04:00
comfyanonymous	84e9ce32c6	Implement the mmaudio VAE. (#10300 )	2025-10-11 22:57:23 -04:00
comfyanonymous	f1dd6e50f8	Fix bug with applying loras on fp8 scaled without fp8 ops. (#10279 )	2025-10-09 19:02:40 -04:00
comfyanonymous	139addd53c	More surgical fix for #10267 (#10276 )	2025-10-09 16:37:35 -04:00
comfyanonymous	6e59934089	Refactor model sampling sigmas code. (#10250 )	2025-10-08 17:49:02 -04:00
comfyanonymous	8aea746212	Implement gemma 3 as a text encoder. (#10241 ) Not useful yet.	2025-10-06 22:08:08 -04:00
comfyanonymous	195e0b0639	Remove useless code. (#10223 )	2025-10-05 15:41:19 -04:00
Finn-Hecker	93d859cfaa	Fix type annotation syntax in MotionEncoder_tc __init__ (#10186 ) ## Summary Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list. ## Changes - Line 647: Changed `num_heads=int` to `num_heads: int` - This corrects the parameter annotation from a default value assignment to proper type hint syntax ## Details The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.	2025-10-03 14:32:19 -07:00
rattus128	4965c0e2ac	WAN: Fix cache VRAM leak on error (#10141 ) If this suffers an exception (such as a VRAM oom) it will leave the encode() and decode() methods which skips the cleanup of the WAN feature cache. The comfy node cache then ultimately keeps a reference this object which is in turn reffing large tensors from the failed execution. The feature cache is currently setup at a class variable on the encoder/decoder however, the encode and decode functions always clear it on both entry and exit of normal execution. Its likely the design intent is this is usable as a streaming encoder where the input comes in batches, however the functions as they are today don't support that. So simplify by bringing the cache back to local variable, so that if it does VRAM OOM the cache itself is properly garbage when the encode()/decode() functions dissappear from the stack.	2025-10-01 18:42:16 -04:00
rattus128	911331c06c	sd: fix VAE tiled fallback VRAM leak (#10139 ) When the VAE catches this VRAM OOM, it launches the fallback logic straight from the exception context. Python however refs the entire call stack that caused the exception including any local variables for the sake of exception report and debugging. In the case of tensors, this can hold on the references to GBs of VRAM and inhibit the VRAM allocated from freeing them. So dump the except context completely before going back to the VAE via the tiler by getting out of the except block with nothing but a flag. The greately increases the reliability of the tiler fallback, especially on low VRAM cards, as with the bug, if the leak randomly leaked more than the headroom needed for a single tile, the tiler would fallback would OOM and fail the flow.	2025-10-01 18:40:28 -04:00
comfyanonymous	a6f83a4a1a	Support the new hunyuan vae. (#10150 )	2025-10-01 17:19:13 -04:00
rattus128	653ceab414	Reduce Peak WAN inference VRAM usage - part II (#10062 ) * flux: math: Use _addcmul to avoid expensive VRAM intermediate The rope process can be the VRAM peak and this intermediate for the addition result before releasing the original can OOM. addcmul_ it. * wan: Delete the self attention before cross attention This saves VRAM when the cross attention and FFN are in play as the VRAM peak.	2025-09-27 18:14:16 -04:00
Jedrzej Kosinski	196954ab8c	Add 'input_cond' and 'input_uncond' to the args dictionary passed into sampler_cfg_function (#10044 )	2025-09-26 19:55:03 -07:00
comfyanonymous	1e098d6132	Don't add template to qwen2.5vl when template is in prompt. (#10043 ) Make the hunyuan image refiner template_end 36.	2025-09-26 18:34:17 -04:00
Guy Niv	c8d2117f02	Fix memory leak by properly detaching model finalizer (#9979 ) When unloading models in load_models_gpu(), the model finalizer was not being explicitly detached, leading to a memory leak. This caused linear memory consumption increase over time as models are repeatedly loaded and unloaded. This change prevents orphaned finalizer references from accumulating in memory during model switching operations.	2025-09-24 22:35:12 -04:00
comfyanonymous	fccab99ec0	Fix issue with .view() in HuMo. (#10014 )	2025-09-24 20:09:42 -04:00
comfyanonymous	1fee8827cb	Support for qwen edit plus model. Use the new TextEncodeQwenImageEditPlus. (#9986 )	2025-09-22 16:49:48 -04:00
comfyanonymous	d1d9eb94b1	Lower wan memory estimation value a bit. (#9964 ) Previous pr reduced the peak memory requirement.	2025-09-20 22:09:35 -04:00
Kohaku-Blueleaf	7be2b49b6b	Fix LoRA Trainer bugs with FP8 models. (#9854 ) * Fix adapter weight init * Fix fp8 model training * Avoid inference tensor	2025-09-20 21:24:48 -04:00
comfyanonymous	e8df53b764	Update WanAnimateToVideo to more easily extend videos. (#9959 )	2025-09-19 18:48:56 -04:00
comfyanonymous	dc95b6acc0	Basic WIP support for the wan animate model. (#9939 )	2025-09-19 03:07:17 -04:00
comfyanonymous	24b0fce099	Do padding of audio embed in model for humo for more flexibility. (#9935 )	2025-09-18 19:54:16 -04:00
DELUXA	8d6653fca6	Enable fp8 ops by default on gfx1200 (#9926 )	2025-09-18 19:50:37 -04:00
comfyanonymous	dd611a7700	Support the HuMo 17B model. (#9912 )	2025-09-17 18:39:24 -04:00
comfyanonymous	9288c78fc5	Support the HuMo model. (#9903 )	2025-09-17 00:12:48 -04:00
rattus128	e42682b24e	Reduce Peak WAN inference VRAM usage (#9898 ) * flux: Do the xq and xk ropes one at a time This was doing independendent interleaved tensor math on the q and k tensors, leading to the holding of more than the minimum intermediates in VRAM. On a bad day, it would VRAM OOM on xk intermediates. Do everything q and then everything k, so torch can garbage collect all of qs intermediates before k allocates its intermediates. This reduces peak VRAM usage for some WAN2.2 inferences (at least). * wan: Optimize qkv intermediates on attention As commented. The former logic computed independent pieces of QKV in parallel which help more inference intermediates in VRAM spiking VRAM usage. Fully roping Q and garbage collecting the intermediates before touching K reduces the peak inference VRAM usage.	2025-09-16 19:21:14 -04:00
comfyanonymous	a39ac59c3e	Add encoder part of whisper large v3 as an audio encoder model. (#9894 ) Not useful yet but some models use it.	2025-09-16 01:19:50 -04:00
blepping	1a85483da1	Fix depending on asserts to raise an exception in BatchedBrownianTree and Flash attn module (#9884 ) Correctly handle the case where w0 is passed by kwargs in BatchedBrownianTree	2025-09-15 20:05:03 -04:00
comfyanonymous	47a9cde5d3	Support the omnigen2 umo lora. (#9886 )	2025-09-15 18:10:55 -04:00
Jedrzej Kosinski	f228367c5e	Make ModuleNotFoundError ImportError instead (#9850 )	2025-09-13 21:34:21 -04:00
comfyanonymous	80b7c9455b	Changes to the previous radiance commit. (#9851 )	2025-09-13 18:03:34 -04:00
blepping	c1297f4eb3	Add support for Chroma Radiance (#9682 ) * Initial Chroma Radiance support * Minor Chroma Radiance cleanups * Update Radiance nodes to ensure latents/images are on the intermediate device * Fix Chroma Radiance memory estimation. * Increase Chroma Radiance memory usage factor * Increase Chroma Radiance memory usage factor once again * Ensure images are multiples of 16 for Chroma Radiance Add batch dimension and fix channels when necessary in ChromaRadianceImageToLatent node * Tile Chroma Radiance NeRF to reduce memory consumption, update memory usage factor * Update Radiance to support conv nerf final head type. * Allow setting NeRF embedder dtype for Radiance Bump Radiance nerf tile size to 32 Support EasyCache/LazyCache on Radiance (maybe) * Add ChromaRadianceStubVAE node * Crop Radiance image inputs to multiples of 16 instead of erroring to be in line with existing VAE behavior * Convert Chroma Radiance nodes to V3 schema. * Add ChromaRadianceOptions node and backend support. Cleanups/refactoring to reduce code duplication with Chroma. * Fix overriding the NeRF embedder dtype for Chroma Radiance * Minor Chroma Radiance cleanups * Move Chroma Radiance to its own directory in ldm Minor code cleanups and tooltip improvements * Fix Chroma Radiance embedder dtype overriding * Remove Radiance dynamic nerf_embedder dtype override feature * Unbork Radiance NeRF embedder init * Remove Chroma Radiance image conversion and stub VAE nodes Add a chroma_radiance option to the VAELoader builtin node which uses comfy.sd.PixelspaceConversionVAE Add a PixelspaceConversionVAE to comfy.sd for converting BHWC 0..1 <-> BCHW -1..1	2025-09-13 17:58:43 -04:00
Kimbing Ng	e5e70636e7	Remove single quote pattern to avoid wrong matches (#9842 )	2025-09-13 16:59:19 -04:00
comfyanonymous	29bf807b0e	Cleanup. (#9838 )	2025-09-12 21:57:04 -04:00
Jukka Seppänen	2559dee492	Support wav2vec base models (#9637 ) * Support wav2vec base models * trim trailing whitespace * Do interpolation after	2025-09-12 21:52:58 -04:00
comfyanonymous	a3b04de700	Hunyuan refiner vae now works with tiled. (#9836 )	2025-09-12 19:46:46 -04:00
Jedrzej Kosinski	d7f40442f9	Enable Runtime Selection of Attention Functions (#9639 ) * Looking into a @wrap_attn decorator to look for 'optimized_attention_override' entry in transformer_options * Created logging code for this branch so that it can be used to track down all the code paths where transformer_options would need to be added * Fix memory usage issue with inspect * Made WAN attention receive transformer_options, test node added to wan to test out attention override later * Added *kwargs to all attention functions so transformer_options could potentially be passed through Make sure wrap_attn doesn't make itself recurse infinitely, attempt to load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention * Turn off attention logging for now, make AttentionOverrideTestNode have a dropdown with available attention (this is a test node only) * Make flux work with optimized_attention_override * Add logs to verify optimized_attention_override is passed all the way into attention function * Make Qwen work with optimized_attention_override * Made hidream work with optimized_attention_override * Made wan patches_replace work with optimized_attention_override * Made SD3 work with optimized_attention_override * Made HunyuanVideo work with optimized_attention_override * Made Mochi work with optimized_attention_override * Made LTX work with optimized_attention_override * Made StableAudio work with optimized_attention_override * Made optimized_attention_override work with ACE Step * Made Hunyuan3D work with optimized_attention_override * Make CosmosPredict2 work with optimized_attention_override * Made CosmosVideo work with optimized_attention_override * Made Omnigen 2 work with optimized_attention_override * Made StableCascade work with optimized_attention_override * Made AuraFlow work with optimized_attention_override * Made Lumina work with optimized_attention_override * Made Chroma work with optimized_attention_override * Made SVD work with optimized_attention_override * Fix WanI2VCrossAttention so that it expects to receive transformer_options * Fixed Wan2.1 Fun Camera transformer_options passthrough * Fixed WAN 2.1 VACE transformer_options passthrough * Add optimized to get_attention_function * Disable attention logs for now * Remove attention logging code * Remove _register_core_attention_functions, as we wouldn't want someone to call that, just in case * Satisfy ruff * Remove AttentionOverrideTest node, that's something to cook up for later	2025-09-12 18:07:38 -04:00
comfyanonymous	b149e2e1e3	Better way of doing the generator for the hunyuan image noise aug. (#9834 )	2025-09-12 17:53:15 -04:00
comfyanonymous	7757d5a657	Set default hunyuan refiner shift to 4.0 (#9833 )	2025-09-12 16:40:12 -04:00
comfyanonymous	e600520f8a	Fix hunyuan refiner blownout colors at noise aug less than 0.25 (#9832 )	2025-09-12 16:35:34 -04:00
comfyanonymous	fd2b820ec2	Add noise augmentation to hunyuan image refiner. (#9831 ) This was missing and should help with colors being blown out.	2025-09-12 16:03:08 -04:00
comfyanonymous	33bd9ed9cb	Implement hunyuan image refiner model. (#9817 )	2025-09-12 00:43:20 -04:00
comfyanonymous	18de0b2830	Fast preview for hunyuan image. (#9814 )	2025-09-11 19:33:02 -04:00
comfyanonymous	e01e99d075	Support hunyuan image distilled model. (#9807 )	2025-09-10 23:17:34 -04:00
comfyanonymous	543888d3d8	Fix lowvram issue with hunyuan image vae. (#9794 )	2025-09-10 02:15:34 -04:00
comfyanonymous	85e34643f8	Support hunyuan image 2.1 regular model. (#9792 )	2025-09-10 02:05:07 -04:00
comfyanonymous	5c33872e2f	Fix issue on old torch. (#9791 )	2025-09-10 00:23:47 -04:00
comfyanonymous	b288fb0db8	Small refactor of some vae code. (#9787 )	2025-09-09 18:09:56 -04:00
comfyanonymous	103a12cb66	Support qwen inpaint controlnet. (#9772 )	2025-09-08 17:30:26 -04:00

1 2 3 4 5 ...

1722 Commits