ComfyUI/comfy/ldm/cube
Jedrzej Kosinski 94bcb5701e
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Cube3D: reuse shared Flux RoPE (comfy-kitchen optimized kernel)
Replace cube's bespoke complex-number RoPE (torch.polar / view_as_complex) with
ComfyUI's shared Flux rotary embedding (comfy.ldm.flux.math):
  * precompute_freqs_cis now returns Flux's real rotation freqs via rope().
  * apply_rotary_emb applies them via apply_rope1, which at inference dispatches to
    comfy-kitchen's optimized apply_rope kernel (comfy.quant_ops.ck). q and k are
    still rotated separately to preserve the decode-time position asymmetry.

The pairing convention (adjacent dims) and rotation math are identical, so token
outputs are unchanged. The only numerical difference is that rope() computes the
rotation angles in fp64 before casting to fp32 (cube's original used fp32), so output
now matches upstream to fp32 rounding (~1e-6 on rotated q/k in a standalone check)
rather than bit-for-bit. Greedy argmax token selection is unaffected.

Deviation note: this is a deliberate, documented divergence from a strict upstream
port, taken to gain the shared optimized kernel. Needs GPU parity re-validation on the
2x4090 box (kosin-X570-AORUS-ULTRA) before merge.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019f013b-5892-71b9-af6b-c2ef28c67d2b
2026-06-25 18:15:15 -07:00
..
gpt.py Cube3D: reuse shared Flux RoPE (comfy-kitchen optimized kernel) 2026-06-25 18:15:15 -07:00
marching_cubes.py Cube3D: vendor dependency-free marching cubes, drop scikit-image 2026-06-14 23:44:20 -07:00
vae.py Cube3D: fix mesh winding for vendored marching cubes 2026-06-14 23:48:03 -07:00