Roll the convolution through time using 2-latent-frame chunks and a
FIFO queue for the convolution seams.
Added support for encoder, lowered to 1 latent frame to save more
VRAM, made work for Hunyuan Image 3.0 (as code shared).
Fixed names, cleaned up code.