* hunyuan upsampler: rework imports
Remove the transitive import of VideoConv3d and Resnet and takes these
from actual implementation source.
* model: remove unused give_pre_end
According to git grep, this is not used now, and was not used in the
initial commit that introduced it (see below).
This semantic is difficult to implement temporal roll VAE for (and would
defeat the purpose). Rather than implement the complex if, just delete
the unused feature.
(venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline
220afe33 (HEAD) Initial commit.
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end:
(venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master
Previous HEAD position was 220afe33 Initial commit.
HEAD is now at 9d8a8179 Enable async offloading by default on Nvidia. (#10953)
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end:
* move refiner VAE temporal roller to core
Move the carrying conv op to the common VAE code and give it a better
name. Roll the carry implementation logic for Resnet into the base
class and scrap the Hunyuan specific subclass.
* model: Add temporal roll to main VAE decoder
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolloing VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
* model: Add temporal roll to main VAE encoder
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolling VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
* Looking into a @wrap_attn decorator to look for 'optimized_attention_override' entry in transformer_options
* Created logging code for this branch so that it can be used to track down all the code paths where transformer_options would need to be added
* Fix memory usage issue with inspect
* Made WAN attention receive transformer_options, test node added to wan to test out attention override later
* Added **kwargs to all attention functions so transformer_options could potentially be passed through
* Make sure wrap_attn doesn't make itself recurse infinitely, attempt to load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention
* Turn off attention logging for now, make AttentionOverrideTestNode have a dropdown with available attention (this is a test node only)
* Make flux work with optimized_attention_override
* Add logs to verify optimized_attention_override is passed all the way into attention function
* Make Qwen work with optimized_attention_override
* Made hidream work with optimized_attention_override
* Made wan patches_replace work with optimized_attention_override
* Made SD3 work with optimized_attention_override
* Made HunyuanVideo work with optimized_attention_override
* Made Mochi work with optimized_attention_override
* Made LTX work with optimized_attention_override
* Made StableAudio work with optimized_attention_override
* Made optimized_attention_override work with ACE Step
* Made Hunyuan3D work with optimized_attention_override
* Make CosmosPredict2 work with optimized_attention_override
* Made CosmosVideo work with optimized_attention_override
* Made Omnigen 2 work with optimized_attention_override
* Made StableCascade work with optimized_attention_override
* Made AuraFlow work with optimized_attention_override
* Made Lumina work with optimized_attention_override
* Made Chroma work with optimized_attention_override
* Made SVD work with optimized_attention_override
* Fix WanI2VCrossAttention so that it expects to receive transformer_options
* Fixed Wan2.1 Fun Camera transformer_options passthrough
* Fixed WAN 2.1 VACE transformer_options passthrough
* Add optimized to get_attention_function
* Disable attention logs for now
* Remove attention logging code
* Remove _register_core_attention_functions, as we wouldn't want someone to call that, just in case
* Satisfy ruff
* Remove AttentionOverrideTest node, that's something to cook up for later
- add xet support and add the xet cache to manageable directories
- xet is enabled by default
- fix logging to root in various places
- improve logging about model unloading and loading
- TorchCompileNode now supports the VAE
- torchaudio missing will cause less noise in the logs
- feature flags will assume to be supporting everything in the distributed progress context
- fixes progress notifications