Commit Graph

4 Commits

Author SHA1 Message Date
Jedrzej Kosinski
aeb3c77ae9 Cube3D: route VAE decode through managed comfy.sd.VAE.decode
Stop fighting ComfyUI's model management. VAEDecodeCube was manually
calling load_models_gpu + .to(vae.device) and the VAE forced
disable_offload=True because it bypassed the managed decode path.

Now CubeShapeVAE.decode(samples) is the entry point that comfy.sd.VAE.decode
calls, so loading/device/dtype are handled automatically (like Hunyuan3Dv2):
- removed disable_offload=True (let the offload system manage weights)
- removed manual load_models_gpu + .to(device) from the node
- process_output set to identity (default clamps [0,1] in-place and would
  destroy the occupancy isosurface)
- decode() pre-inverts VAE.decode's trailing movedim(1,-1) so the node
  receives grid logits unchanged (parity preserved)
- memory_used_decode sized by num_tokens (shape[-1]) for the new latent layout

Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
2026-06-14 23:28:22 -07:00
Jedrzej Kosinski
a6c7397b71 Cube3D: use channels-first 1D latent (B,1,L) like Hunyuan3Dv2
Replaces the dummy trailing-dim latent with a channels-first 1D latent
(B, 1, num_tokens) and a dedicated latent_formats.Cube3D
(latent_channels=1, latent_dimensions=1). This mirrors the existing
native 3D model Hunyuan3Dv2's (B, C, L) convention and avoids
fix_empty_latent_channels truncating the token sequence (it narrows
dim=1 to latent_channels for empty latents). Requires no core sampler
changes: encode_model_conds sees a valid noise.shape[2].

- latent_formats.Cube3D added; wired into supported_models.Cube3D
- EmptyCubeLatent emits (B, 1, num_tokens)
- sample_cube takes T from x.shape[-1], returns (B, 1, T), and repeats
  conditioning to the latent batch size

Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
2026-06-14 23:14:17 -07:00
Jedrzej Kosinski
871f7bc390 Cube3D: fix graph integration (3D latent, VAE device, fp32 cond, scikit-image)
Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
2026-06-14 22:59:11 -07:00
Jedrzej Kosinski
01a8783bee Add native Roblox Cube3D text-to-3D support
Cube3D is an autoregressive VQ-token shape model (DualStreamRoformer) plus a
VQ-VAE shape tokenizer (OneDAutoEncoder), not a diffusion model. It is wired
natively following the Causal-WAN AR-video pattern: the GPT loads as a normal
MODEL and generation runs through a dedicated 'cube' sampler instead of KSampler.

- comfy/ldm/cube/gpt.py: DualStreamRoformer port (dual-stream RoPE attention,
  per-head RMSNorm, SwiGLU, KV cache; rope_theta=10000).
- comfy/ldm/cube/vae.py: OneDAutoEncoder decode path (codebook lookup, decoder,
  occupancy decoder, dense-grid extraction + skimage marching cubes).
- model_detection/supported_models/model_base: register shape_gpt as Cube3D MODEL
  (dims inferred from state dict; apply_model guarded to point at SamplerCube).
- sd.py: detect shape_tokenizer and build CubeShapeVAE.
- k_diffusion/sampling.py: sample_cube autoregressive sampler (decaying CFG +
  optional top-p), faithful to upstream Engine.run_gpt.
- comfy_extras/nodes_cube.py: EmptyCubeLatent, CubeCodebookPatch (inject VQ
  codebook into wte), SamplerCube, VAEDecodeCube (-> MESH).

Reuses CLIP-L conditioning, CFGGuider/SamplerCustomAdvanced, and SaveGLB.

Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
2026-06-14 20:21:37 -07:00