ComfyUI/comfy
Jedrzej Kosinski 94bcb5701e
Some checks are pending
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Cube3D: reuse shared Flux RoPE (comfy-kitchen optimized kernel)
Replace cube's bespoke complex-number RoPE (torch.polar / view_as_complex) with
ComfyUI's shared Flux rotary embedding (comfy.ldm.flux.math):
  * precompute_freqs_cis now returns Flux's real rotation freqs via rope().
  * apply_rotary_emb applies them via apply_rope1, which at inference dispatches to
    comfy-kitchen's optimized apply_rope kernel (comfy.quant_ops.ck). q and k are
    still rotated separately to preserve the decode-time position asymmetry.

The pairing convention (adjacent dims) and rotation math are identical, so token
outputs are unchanged. The only numerical difference is that rope() computes the
rotation angles in fp64 before casting to fp32 (cube's original used fp32), so output
now matches upstream to fp32 rounding (~1e-6 on rotated q/k in a standalone check)
rather than bit-for-bit. Greedy argmax token selection is unaffected.

Deviation note: this is a deliberate, documented divergence from a strict upstream
port, taken to gain the shared optimized kernel. Needs GPU parity re-validation on the
2x4090 box (kosin-X570-AORUS-ULTRA) before merge.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019f013b-5892-71b9-af6b-c2ef28c67d2b
2026-06-25 18:15:15 -07:00
..
audio_encoders Fix fp16 audio encoder models (#12811) 2026-03-06 18:20:07 -05:00
background_removal Some cast/dtype fixes for the birefnet and dino3 models. (#14217) 2026-06-01 14:35:26 -07:00
cldm Add better error message for common error. (#10846) 2025-11-23 04:55:22 -05:00
comfy_types Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
extra_samplers Uni pc sampler now works with audio and video models. 2025-01-18 05:27:58 -05:00
image_encoders Depth anything 3 (Core-135) (#13853) 2026-06-10 09:28:24 +08:00
k_diffusion Cube3D: use channels-first 1D latent (B,1,L) like Hunyuan3Dv2 2026-06-14 23:14:17 -07:00
ldm Cube3D: reuse shared Flux RoPE (comfy-kitchen optimized kernel) 2026-06-25 18:15:15 -07:00
sd1_tokenizer Silence clip tokenizer warning. (#8934) 2025-07-16 14:42:07 -04:00
t2i_adapter Controlnet refactor. 2024-06-27 18:43:11 -04:00
taesd Add high quality preview support for Flux2 latents (#13496) 2026-04-29 19:37:30 -04:00
text_encoders Allow custom templates with Ideogram4 TE (#14374) 2026-06-09 21:11:05 +08:00
weight_adapter MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446) 2026-02-15 20:30:09 -05:00
bg_removal_model.py Fix background removal mask output shape (#14171) 2026-05-29 09:14:32 -07:00
cli_args.py add --high-ram option (#14437) 2026-06-12 07:53:33 -07:00
clip_config_bigg.json Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
clip_model.py Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_config_g.json
clip_vision_config_h.json
clip_vision_config_vitl_336_llava.json Support llava clip vision model. 2025-03-06 00:24:43 -05:00
clip_vision_config_vitl_336.json support clip-vit-large-patch14-336 (#4042) 2024-07-17 13:12:50 -04:00
clip_vision_config_vitl.json
clip_vision_siglip2_base_naflex.json Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_siglip_384.json Support new flux model variants. 2024-11-21 08:38:23 -05:00
clip_vision_siglip_512.json Support 512 siglip model. 2025-04-05 07:01:01 -04:00
clip_vision.py Some cast/dtype fixes for the birefnet and dino3 models. (#14217) 2026-06-01 14:35:26 -07:00
conds.py Cleanups to the last PR. (#12646) 2026-02-26 01:30:31 -05:00
context_windows.py feat: Context windows - add causal_window_fix to improve blending of context windows (CORE-100) (#13563) 2026-05-05 16:40:53 -07:00
controlnet.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
deploy_environment.py Add deploy environment header (Comfy-Env) to partner node API calls (#13425) 2026-05-04 20:17:56 -07:00
diffusers_convert.py Remove useless code. 2025-01-24 06:15:54 -05:00
diffusers_load.py load_unet -> load_diffusion_model with a model_options argument. 2024-08-12 23:20:57 -04:00
float.py float: use CK stochastic rounding cuda kernel (#13971) 2026-05-28 19:23:42 -07:00
gligen.py Remove some useless code. (#8812) 2025-07-06 07:07:39 -04:00
hooks.py Fix typos (#10986) 2026-05-08 17:14:45 +08:00
latent_formats.py Cube3D: use channels-first 1D latent (B,1,L) like Hunyuan3Dv2 2026-06-14 23:14:17 -07:00
lora_convert.py Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
lora.py Add LoRA key mapping for LTXV/LTXAV models (#14349) 2026-06-09 09:57:58 -04:00
memory_management.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_base.py Add native Roblox Cube3D text-to-3D support 2026-06-14 20:21:37 -07:00
model_detection.py Cube3D: document convention deviations + drop unused VAE flag (review aid) 2026-06-14 23:58:14 -07:00
model_management.py add --high-ram option (#14437) 2026-06-12 07:53:33 -07:00
model_patcher.py [Trainer/bug] Ensure model is not inference mode (CORE-72) (#13400) 2026-06-09 23:07:47 -04:00
model_prefetch.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_sampling.py feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
multigpu.py fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split (#14235) 2026-06-02 10:05:24 -07:00
nested_tensor.py WIP way to support multi multi dimensional latents. (#10456) 2025-10-23 21:21:14 -04:00
ops.py add --high-ram option (#14437) 2026-06-12 07:53:33 -07:00
options.py
patcher_extension.py Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
pinned_memory.py Fix interoperation with external source of pinned memory pressure (#14252) 2026-06-05 08:39:35 -07:00
pixel_space_convert.py Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
quant_ops.py Enable triton comfy kitchen via cli-arg (#12730) 2026-05-03 14:07:21 -04:00
rmsnorm.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
sample.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
sampler_helpers.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
samplers.py fix(multigpu): replace hardcoded torch.cuda.set_device with device-agnostic set_torch_device (#14191) 2026-05-30 21:18:42 -04:00
sd1_clip_config.json Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
sd1_clip.py feat: Support Qwen3.5 text generation models (#12771) 2026-03-25 22:48:28 -04:00
sd.py Cube3D: document convention deviations + drop unused VAE flag (review aid) 2026-06-14 23:58:14 -07:00
sdxl_clip.py Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803) 2025-04-25 19:36:00 -04:00
supported_models_base.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
supported_models.py Cube3D: document convention deviations + drop unused VAE flag (review aid) 2026-06-14 23:58:14 -07:00
utils.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00