* Add support for sage attention 3 in comfyui, enable via new cli arg
--use-sage-attiention3
* Fix some bugs found in PR review. The N dimension at which Sage
Attention 3 takes effect is reduced to 1024 (although the improvement is
not significant at this scale).
* Remove the Sage Attention3 switch, but retain the attention function
registration.
* Fix a ruff check issue in attention.py
* hunyuan upsampler: rework imports
Remove the transitive import of VideoConv3d and Resnet and takes these
from actual implementation source.
* model: remove unused give_pre_end
According to git grep, this is not used now, and was not used in the
initial commit that introduced it (see below).
This semantic is difficult to implement temporal roll VAE for (and would
defeat the purpose). Rather than implement the complex if, just delete
the unused feature.
(venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline
220afe33 (HEAD) Initial commit.
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end:
(venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master
Previous HEAD position was 220afe33 Initial commit.
HEAD is now at 9d8a8179 Enable async offloading by default on Nvidia. (#10953)
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end:
* move refiner VAE temporal roller to core
Move the carrying conv op to the common VAE code and give it a better
name. Roll the carry implementation logic for Resnet into the base
class and scrap the Hunyuan specific subclass.
* model: Add temporal roll to main VAE decoder
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolloing VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
* model: Add temporal roll to main VAE encoder
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolling VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
* Looking into a @wrap_attn decorator to look for 'optimized_attention_override' entry in transformer_options
* Created logging code for this branch so that it can be used to track down all the code paths where transformer_options would need to be added
* Fix memory usage issue with inspect
* Made WAN attention receive transformer_options, test node added to wan to test out attention override later
* Added **kwargs to all attention functions so transformer_options could potentially be passed through
* Make sure wrap_attn doesn't make itself recurse infinitely, attempt to load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention
* Turn off attention logging for now, make AttentionOverrideTestNode have a dropdown with available attention (this is a test node only)
* Make flux work with optimized_attention_override
* Add logs to verify optimized_attention_override is passed all the way into attention function
* Make Qwen work with optimized_attention_override
* Made hidream work with optimized_attention_override
* Made wan patches_replace work with optimized_attention_override
* Made SD3 work with optimized_attention_override
* Made HunyuanVideo work with optimized_attention_override
* Made Mochi work with optimized_attention_override
* Made LTX work with optimized_attention_override
* Made StableAudio work with optimized_attention_override
* Made optimized_attention_override work with ACE Step
* Made Hunyuan3D work with optimized_attention_override
* Make CosmosPredict2 work with optimized_attention_override
* Made CosmosVideo work with optimized_attention_override
* Made Omnigen 2 work with optimized_attention_override
* Made StableCascade work with optimized_attention_override
* Made AuraFlow work with optimized_attention_override
* Made Lumina work with optimized_attention_override
* Made Chroma work with optimized_attention_override
* Made SVD work with optimized_attention_override
* Fix WanI2VCrossAttention so that it expects to receive transformer_options
* Fixed Wan2.1 Fun Camera transformer_options passthrough
* Fixed WAN 2.1 VACE transformer_options passthrough
* Add optimized to get_attention_function
* Disable attention logs for now
* Remove attention logging code
* Remove _register_core_attention_functions, as we wouldn't want someone to call that, just in case
* Satisfy ruff
* Remove AttentionOverrideTest node, that's something to cook up for later
* fix attention OOM in xformers
* allow passing attention mask in flux attention
* allow an attn_mask in flux
* attn masks can be done using replace patches instead of a separate dict
* fix return types
* fix return order
* enumerate
* patch the right keys
* arg names
* fix a silly bug
* fix xformers masks
* replace match with if, elif, else
* mask with image_ref_size
* remove unused import
* remove unused import 2
* fix pytorch/xformers attention
This corrects a weird inconsistency with skip_reshape.
It also allows masks of various shapes to be passed, which will be
automtically expanded (in a memory-efficient way) to a size that is
compatible with xformers or pytorch sdpa respectively.
* fix mask shapes