Cube3D is an autoregressive VQ-token shape model (DualStreamRoformer) plus a
VQ-VAE shape tokenizer (OneDAutoEncoder), not a diffusion model. It is wired
natively following the Causal-WAN AR-video pattern: the GPT loads as a normal
MODEL and generation runs through a dedicated 'cube' sampler instead of KSampler.
- comfy/ldm/cube/gpt.py: DualStreamRoformer port (dual-stream RoPE attention,
per-head RMSNorm, SwiGLU, KV cache; rope_theta=10000).
- comfy/ldm/cube/vae.py: OneDAutoEncoder decode path (codebook lookup, decoder,
occupancy decoder, dense-grid extraction + skimage marching cubes).
- model_detection/supported_models/model_base: register shape_gpt as Cube3D MODEL
(dims inferred from state dict; apply_model guarded to point at SamplerCube).
- sd.py: detect shape_tokenizer and build CubeShapeVAE.
- k_diffusion/sampling.py: sample_cube autoregressive sampler (decaying CFG +
optional top-p), faithful to upstream Engine.run_gpt.
- comfy_extras/nodes_cube.py: EmptyCubeLatent, CubeCodebookPatch (inject VQ
codebook into wte), SamplerCube, VAEDecodeCube (-> MESH).
Reuses CLIP-L conditioning, CFGGuider/SamplerCustomAdvanced, and SaveGLB.
Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>
* Initial HiDream01-image support
* Cleanup nodes
* Cleaner handling of empty placeholder models
* Remove snap_to_predefined, prefer tooltip for the trained resolutions
* Add model and block wrappers
* Fix shift tooltip
* Add node to work around the patch tile issue
Experimental, runs multiple passes with the patch grid offset and blends with various different methods.
* Qwen35 vision rotary_pos_emb cast fix
* Fix embedding layout type
* Some small optimizations
* Cleanup, don't need this fallback
* Prefix KV cache, cleanup
Bit of speed, reduce redundant code
* Get rid of redundant custom sampler, refactor noise scaling
Our existing lcm sampler is mathematically same, just added the missing options to it instead and a node to control them. Refactored the noise scaling and fix it for the stochastic samplers, add a generic node to control the initial noise scale.
* Update nodes_hidream_o1.py
* Fix some cache validation cases
* Keep existing sampling params
* Remove redundant video vision path
* Replace some numpy ops with torch
* Fx RoPE index for batch size > 1
* Prefer torch preprocessing
* Rename block_type to be compatible with existing patch nodes
* Fixes and tweaks
* Use `torch.special.expm1`
This function provides greater precision than `exp(x) - 1` for small values of `x`.
Found with TorchFix https://github.com/pytorch-labs/torchfix/
* Use non-alias
* Update sampling.py
* Update samplers.py
* my bad
* "fix" the sampler
* Update samplers.py
* i named it wrong
* minor sampling improvements
mainly using a dynamic rho value (hey this sounds a lot like smea!!!)
* revert rho change
rho? r? its just 1/2