EasyAI代码托管平台

wangbo/ComfyUI

Fork 0

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-06-28 18:59:27 +08:00

Commit Graph

Author	SHA1	Message	Date
Jedrzej Kosinski	94bcb5701e	Cube3D: reuse shared Flux RoPE (comfy-kitchen optimized kernel) Some checks failed Python Linting / Run Ruff (push) Has been cancelled Details Python Linting / Run Pylint (push) Has been cancelled Details Replace cube's bespoke complex-number RoPE (torch.polar / view_as_complex) with ComfyUI's shared Flux rotary embedding (comfy.ldm.flux.math): * precompute_freqs_cis now returns Flux's real rotation freqs via rope(). * apply_rotary_emb applies them via apply_rope1, which at inference dispatches to comfy-kitchen's optimized apply_rope kernel (comfy.quant_ops.ck). q and k are still rotated separately to preserve the decode-time position asymmetry. The pairing convention (adjacent dims) and rotation math are identical, so token outputs are unchanged. The only numerical difference is that rope() computes the rotation angles in fp64 before casting to fp32 (cube's original used fp32), so output now matches upstream to fp32 rounding (~1e-6 on rotated q/k in a standalone check) rather than bit-for-bit. Greedy argmax token selection is unaffected. Deviation note: this is a deliberate, documented divergence from a strict upstream port, taken to gain the shared optimized kernel. Needs GPU parity re-validation on the 2x4090 box (kosin-X570-AORUS-ULTRA) before merge. Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019f013b-5892-71b9-af6b-c2ef28c67d2b	2026-06-25 18:15:15 -07:00
Jedrzej Kosinski	01a8783bee	Add native Roblox Cube3D text-to-3D support Cube3D is an autoregressive VQ-token shape model (DualStreamRoformer) plus a VQ-VAE shape tokenizer (OneDAutoEncoder), not a diffusion model. It is wired natively following the Causal-WAN AR-video pattern: the GPT loads as a normal MODEL and generation runs through a dedicated 'cube' sampler instead of KSampler. - comfy/ldm/cube/gpt.py: DualStreamRoformer port (dual-stream RoPE attention, per-head RMSNorm, SwiGLU, KV cache; rope_theta=10000). - comfy/ldm/cube/vae.py: OneDAutoEncoder decode path (codebook lookup, decoder, occupancy decoder, dense-grid extraction + skimage marching cubes). - model_detection/supported_models/model_base: register shape_gpt as Cube3D MODEL (dims inferred from state dict; apply_model guarded to point at SamplerCube). - sd.py: detect shape_tokenizer and build CubeShapeVAE. - k_diffusion/sampling.py: sample_cube autoregressive sampler (decaying CFG + optional top-p), faithful to upstream Engine.run_gpt. - comfy_extras/nodes_cube.py: EmptyCubeLatent, CubeCodebookPatch (inject VQ codebook into wte), SamplerCube, VAEDecodeCube (-> MESH). Reuses CLIP-L conditioning, CFGGuider/SamplerCustomAdvanced, and SaveGLB. Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096 Co-authored-by: Amp <amp@ampcode.com>	2026-06-14 20:21:37 -07:00

Author

SHA1

Message

Date

Jedrzej Kosinski

94bcb5701e

Cube3D: reuse shared Flux RoPE (comfy-kitchen optimized kernel)

Python Linting / Run Ruff (push) Has been cancelled

Details

Python Linting / Run Pylint (push) Has been cancelled

Details

Replace cube's bespoke complex-number RoPE (torch.polar / view_as_complex) with
ComfyUI's shared Flux rotary embedding (comfy.ldm.flux.math):
  * precompute_freqs_cis now returns Flux's real rotation freqs via rope().
  * apply_rotary_emb applies them via apply_rope1, which at inference dispatches to
    comfy-kitchen's optimized apply_rope kernel (comfy.quant_ops.ck). q and k are
    still rotated separately to preserve the decode-time position asymmetry.

The pairing convention (adjacent dims) and rotation math are identical, so token
outputs are unchanged. The only numerical difference is that rope() computes the
rotation angles in fp64 before casting to fp32 (cube's original used fp32), so output
now matches upstream to fp32 rounding (~1e-6 on rotated q/k in a standalone check)
rather than bit-for-bit. Greedy argmax token selection is unaffected.

Deviation note: this is a deliberate, documented divergence from a strict upstream
port, taken to gain the shared optimized kernel. Needs GPU parity re-validation on the
2x4090 box (kosin-X570-AORUS-ULTRA) before merge.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019f013b-5892-71b9-af6b-c2ef28c67d2b

2026-06-25 18:15:15 -07:00

Jedrzej Kosinski

01a8783bee

Add native Roblox Cube3D text-to-3D support

Cube3D is an autoregressive VQ-token shape model (DualStreamRoformer) plus a
VQ-VAE shape tokenizer (OneDAutoEncoder), not a diffusion model. It is wired
natively following the Causal-WAN AR-video pattern: the GPT loads as a normal
MODEL and generation runs through a dedicated 'cube' sampler instead of KSampler.

- comfy/ldm/cube/gpt.py: DualStreamRoformer port (dual-stream RoPE attention,
  per-head RMSNorm, SwiGLU, KV cache; rope_theta=10000).
- comfy/ldm/cube/vae.py: OneDAutoEncoder decode path (codebook lookup, decoder,
  occupancy decoder, dense-grid extraction + skimage marching cubes).
- model_detection/supported_models/model_base: register shape_gpt as Cube3D MODEL
  (dims inferred from state dict; apply_model guarded to point at SamplerCube).
- sd.py: detect shape_tokenizer and build CubeShapeVAE.
- k_diffusion/sampling.py: sample_cube autoregressive sampler (decaying CFG +
  optional top-p), faithful to upstream Engine.run_gpt.
- comfy_extras/nodes_cube.py: EmptyCubeLatent, CubeCodebookPatch (inject VQ
  codebook into wte), SamplerCube, VAEDecodeCube (-> MESH).

Reuses CLIP-L conditioning, CFGGuider/SamplerCustomAdvanced, and SaveGLB.

Amp-Thread-ID: https://ampcode.com/threads/T-019ec361-addb-70d8-a74b-438ce8a1e096
Co-authored-by: Amp <amp@ampcode.com>

2026-06-14 20:21:37 -07:00

2 Commits