ltx: vae: implement chunked encoder + CPU IO chunking (Big VRAM reductions) (#13062)

* ltx: vae: add cache state to downsample block

* ltx: vae: Add time stride awareness to causal_conv_3d

* ltx: vae: Automate truncation for encoder

Other VAEs just truncate without error. Do the same.

* sd/ltx: Make chunked_io a flag in its own right

Taking this bi-direcitonal, so make it a for-purpose named flag.

* ltx: vae: implement chunked encoder + CPU IO chunking

People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.

* ltx: vae-encode: round chunk sizes more strictly

Only powers of 2 and multiple of 8 are valid due to cache slicing.
This commit is contained in:
rattus 2026-03-19 10:01:12 -07:00 committed by GitHub
parent fabed694a2
commit 6589562ae3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

Diff Content Not Available