Replace cube's bespoke complex-number RoPE (torch.polar / view_as_complex) with
ComfyUI's shared Flux rotary embedding (comfy.ldm.flux.math):
* precompute_freqs_cis now returns Flux's real rotation freqs via rope().
* apply_rotary_emb applies them via apply_rope1, which at inference dispatches to
comfy-kitchen's optimized apply_rope kernel (comfy.quant_ops.ck). q and k are
still rotated separately to preserve the decode-time position asymmetry.
The pairing convention (adjacent dims) and rotation math are identical, so token
outputs are unchanged. The only numerical difference is that rope() computes the
rotation angles in fp64 before casting to fp32 (cube's original used fp32), so output
now matches upstream to fp32 rounding (~1e-6 on rotated q/k in a standalone check)
rather than bit-for-bit. Greedy argmax token selection is unaffected.
Deviation note: this is a deliberate, documented divergence from a strict upstream
port, taken to gain the shared optimized kernel. Needs GPU parity re-validation on the
2x4090 box (kosin-X570-AORUS-ULTRA) before merge.
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019f013b-5892-71b9-af6b-c2ef28c67d2b
The vendored Lorensen table emits the opposite base winding from skimage, so
the upstream-style faces[:, [2,1,0]] flip produced inward-facing normals
(negative mesh volume). Drop the flip so normals point outward (positive
volume), matching the upstream output orientation.
Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
scikit-image was added solely for Cube3D's VAEDecodeCube. Replace it with a
vendored, vectorized pure-PyTorch marching cubes (classic Lorensen tables) in
comfy/ldm/cube/marching_cubes.py. This is the same algorithm family as upstream
cube's default warp.MarchingCubes backend, so geometry is closer to upstream's
default than skimage's Lewiner fallback was.
Validated against skimage method='lorensen': identical face count and surface
(nearest-neighbour distance ~3.8e-6, float precision) on sphere/torus fields.
Vertices are welded (shared grid edges interpolate identically) for a clean
indexed mesh. requirements.txt no longer needs scikit-image.
Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
Stop fighting ComfyUI's model management. VAEDecodeCube was manually
calling load_models_gpu + .to(vae.device) and the VAE forced
disable_offload=True because it bypassed the managed decode path.
Now CubeShapeVAE.decode(samples) is the entry point that comfy.sd.VAE.decode
calls, so loading/device/dtype are handled automatically (like Hunyuan3Dv2):
- removed disable_offload=True (let the offload system manage weights)
- removed manual load_models_gpu + .to(device) from the node
- process_output set to identity (default clamps [0,1] in-place and would
destroy the occupancy isosurface)
- decode() pre-inverts VAE.decode's trailing movedim(1,-1) so the node
receives grid logits unchanged (parity preserved)
- memory_used_decode sized by num_tokens (shape[-1]) for the new latent layout
Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
Cube3D is an autoregressive VQ-token shape model (DualStreamRoformer) plus a
VQ-VAE shape tokenizer (OneDAutoEncoder), not a diffusion model. It is wired
natively following the Causal-WAN AR-video pattern: the GPT loads as a normal
MODEL and generation runs through a dedicated 'cube' sampler instead of KSampler.
- comfy/ldm/cube/gpt.py: DualStreamRoformer port (dual-stream RoPE attention,
per-head RMSNorm, SwiGLU, KV cache; rope_theta=10000).
- comfy/ldm/cube/vae.py: OneDAutoEncoder decode path (codebook lookup, decoder,
occupancy decoder, dense-grid extraction + skimage marching cubes).
- model_detection/supported_models/model_base: register shape_gpt as Cube3D MODEL
(dims inferred from state dict; apply_model guarded to point at SamplerCube).
- sd.py: detect shape_tokenizer and build CubeShapeVAE.
- k_diffusion/sampling.py: sample_cube autoregressive sampler (decaying CFG +
optional top-p), faithful to upstream Engine.run_gpt.
- comfy_extras/nodes_cube.py: EmptyCubeLatent, CubeCodebookPatch (inject VQ
codebook into wte), SamplerCube, VAEDecodeCube (-> MESH).
Reuses CLIP-L conditioning, CFGGuider/SamplerCustomAdvanced, and SaveGLB.
Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>