Fix noise/latent tensor mismatch when latent is nested but noise is not

When using LTXAV (audio+video) workflows, latent_image is a NestedTensor but noise may be a regular tensor. Calling unbind() on non-nested noise splits along dim=0 (channels), producing a shape mismatch at noise_scaling. Check whether noise is nested before unbinding. If not, pad with zero-noise for additional components (e.g. audio), which is semantically correct since those components don't need denoising in the video sampler.
2026-06-19 22:39:24 +08:00 · 2026-04-07 06:07:26 -04:00 · 2026-04-07 06:07:26 -04:00 · 2beca418ad
commit 2beca418ad
parent b615af1c65
1 changed files with 10 additions and 2 deletions
--- a/comfy/samplers.py
+++ b/comfy/samplers.py
@ -1006,8 +1006,16 @@ class CFGGuider:
            return latent_image
        if latent_image.is_nested:
-            latent_image, latent_shapes = comfy.utils.pack_latents(latent_image.unbind())
+            li_tensors = latent_image.unbind()
-            noise, _ = comfy.utils.pack_latents(noise.unbind())
+            if noise.is_nested:
                n_tensors = noise.unbind()
            else:
                # Noise only covers video -- pad remaining components (audio) with zeros
                n_tensors = [noise]
                for i in range(1, len(li_tensors)):
                    n_tensors.append(torch.zeros_like(li_tensors[i]))
            latent_image, latent_shapes = comfy.utils.pack_latents(li_tensors)
            noise, _ = comfy.utils.pack_latents(n_tensors)
        else:
            latent_shapes = [latent_image.shape]