ComfyUI/comfy/ldm
huangfeice e29384be0d Add JoyImageEditPlus multi-image edit support (unify onto Plus-style forward)
JoyImageEditPlus is the multi-image (1-6 reference images) variant of
JoyImageEdit, trained from the same base. Its diffusers transformer shares
byte-identical weight structure with the single-image variant (894 keys, zero
rename) but injects references differently: instead of the single-image
slot-stack (stack refs + noise into a 6D tensor and rotate on the frame dim,
which forces all items to share resolution), each reference is independently
patchified and concatenated on the sequence dim with per-image temporal-offset
3D RoPE, allowing references at different resolutions.

Since the single-image port is not yet upstream, this unifies both variants
onto the Plus-style forward rather than keeping two paths; single-image is now
the ref=1 special case. Verified numerically: at ref=1 with equal resolution
the new path's RoPE is bit-identical to the old slot-stack layout, and the
transformer output matches the diffusers Plus reference (fp32, incl. the
different-resolution case).

ComfyUI runs cond/uncond in one forward with a shared reference configuration,
so the diffusers Plus batched RoPE, padding attention_mask, and dedicated
attention processor are unnecessary here: the unified forward reuses the
existing unbatched _apply_rotary_emb and JoyImageAttention. Confirmed
equivalent to the diffusers batched+mask path for a single sample.

- comfy/ldm/joyimage/model.py: forward takes ref_latents and builds
  components=[target, ref0, ...]; per-component patchify + temporal-offset
  RoPE; output keeps only the target segment. Old single-grid RoPE removed.
- comfy/model_base.py: JoyImage drops the slot-stack / frame-rotation /
  shape-equality path in _apply_model, passing ref_latents straight to the
  transformer. Guidance-rescale and the reference_latents requirement are kept.
- comfy/text_encoders/joyimage.py: the image template emits one vision block
  per reference (N = image count); N=1 is byte-for-byte the old template.
- comfy_extras/nodes_joyimage.py: add TextEncodeJoyImageEditPlus with optional
  image1..image6 inputs, each bucket-resized and VAE-encoded into the
  reference_latents list.

Detection, supported_models, and sd.py need no changes: the identical weight
structure routes both variants through image_model="joyimage".
2026-07-01 18:36:43 +08:00
..
ace Support Ace Step 1.5 XL model. (#13317) 2026-04-07 03:13:47 -04:00
anima Fix anima LLM adapter forward when manual cast (#12504) 2026-02-17 07:56:44 -08:00
audio Disable sage attention in stable audio dit and VAE. (#14148) 2026-05-27 20:35:03 -04:00
aura Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
cascade cascade: remove dead weight init code (#13026) 2026-03-17 20:59:10 -04:00
chroma Implement NAG on all the models based on the Flux code. (#12500) 2026-02-16 23:30:34 -05:00
chroma_radiance Radiance: support variant with nonzero txt_ids (#14206) 2026-06-01 22:07:48 -07:00
cogvideo Cogvideox (#13402) 2026-04-29 19:30:08 -04:00
cosmos Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192) 2026-05-30 17:53:37 -07:00
depth_anything_3 Depth anything 3 (Core-135) (#13853) 2026-06-10 09:28:24 +08:00
ernie Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192) 2026-05-30 17:53:37 -07:00
flux Remove old useless no comfy kitchen fallback. (#14245) 2026-06-02 17:52:41 -07:00
genmo Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hidream Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hidream_o1 feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
hunyuan3d Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hunyuan3dv2_1 MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
hunyuan_video Implement NAG on all the models based on the Flux code. (#12500) 2026-02-16 23:30:34 -05:00
hydit Change cosmos and hydit models to use the native RMSNorm. (#7934) 2025-05-04 06:26:20 -04:00
ideogram4 Fix potential dtype issue with ideogram 4. (#14436) 2026-06-12 07:51:12 -07:00
joyimage Add JoyImageEditPlus multi-image edit support (unify onto Plus-style forward) 2026-07-01 18:36:43 +08:00
kandinsky5 Fix qwen scaled fp8 not working with kandinsky. Make basic t2i wf work. (#11162) 2025-12-06 17:50:10 -08:00
lens Lens: some cleanup (#14112) 2026-05-26 10:32:53 +03:00
lightricks fix: cross-attention AdaLN scale, shift, sigma parameters calculation (#14097) 2026-05-25 20:07:09 -07:00
lumina Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
mmaudio/vae Implement the mmaudio VAE. (#10300) 2025-10-11 22:57:23 -04:00
models Add support for small flux.2 decoder (#13314) 2026-04-07 03:44:18 -04:00
modules Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
moge Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
omnigen Use comfy kitchen apply rope in omnigen2 model. (#14442) 2026-06-13 09:38:39 +08:00
pixart Remove windows line endings. (#8866) 2025-07-11 02:37:51 -04:00
pixeldit Support context window for PiD and fix lq_latent rounding (#14136) 2026-05-27 12:08:06 -07:00
qwen_image fix: Add back apply_rotary_emb for Qwen Image (#14364) 2026-06-09 11:55:49 +08:00
rt_detr CORE-13 feat: Support RT-DETRv4 detection model (#12748) 2026-03-28 23:34:10 -04:00
sam3 Improve SAM3 large input handling (#13767) 2026-05-07 17:18:28 -07:00
supir feat: SUPIR model support (CORE-17) (#13250) 2026-04-18 23:02:01 -04:00
triposplat feat: Add TripoSplat support (#14210) 2026-06-01 07:01:50 -07:00
wan feat: Add Bernini-R model support (Wan video) (CORE-279) (#14216) 2026-06-10 07:47:34 +08:00
colormap.py Depth anything 3 (Core-135) (#13853) 2026-06-10 09:28:24 +08:00
common_dit.py add RMSNorm to comfy.ops 2025-04-14 18:00:33 -04:00
util.py New Year ruff cleanup. (#11595) 2026-01-01 22:06:14 -05:00