mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-07-03 21:20:49 +08:00
JoyImageEditPlus is the multi-image (1-6 reference images) variant of JoyImageEdit, trained from the same base. Its diffusers transformer shares byte-identical weight structure with the single-image variant (894 keys, zero rename) but injects references differently: instead of the single-image slot-stack (stack refs + noise into a 6D tensor and rotate on the frame dim, which forces all items to share resolution), each reference is independently patchified and concatenated on the sequence dim with per-image temporal-offset 3D RoPE, allowing references at different resolutions. Since the single-image port is not yet upstream, this unifies both variants onto the Plus-style forward rather than keeping two paths; single-image is now the ref=1 special case. Verified numerically: at ref=1 with equal resolution the new path's RoPE is bit-identical to the old slot-stack layout, and the transformer output matches the diffusers Plus reference (fp32, incl. the different-resolution case). ComfyUI runs cond/uncond in one forward with a shared reference configuration, so the diffusers Plus batched RoPE, padding attention_mask, and dedicated attention processor are unnecessary here: the unified forward reuses the existing unbatched _apply_rotary_emb and JoyImageAttention. Confirmed equivalent to the diffusers batched+mask path for a single sample. - comfy/ldm/joyimage/model.py: forward takes ref_latents and builds components=[target, ref0, ...]; per-component patchify + temporal-offset RoPE; output keeps only the target segment. Old single-grid RoPE removed. - comfy/model_base.py: JoyImage drops the slot-stack / frame-rotation / shape-equality path in _apply_model, passing ref_latents straight to the transformer. Guidance-rescale and the reference_latents requirement are kept. - comfy/text_encoders/joyimage.py: the image template emits one vision block per reference (N = image count); N=1 is byte-for-byte the old template. - comfy_extras/nodes_joyimage.py: add TextEncodeJoyImageEditPlus with optional image1..image6 inputs, each bucket-resized and VAE-encoded into the reference_latents list. Detection, supported_models, and sd.py need no changes: the identical weight structure routes both variants through image_model="joyimage". |
||
|---|---|---|
| .. | ||
| ace | ||
| anima | ||
| audio | ||
| aura | ||
| cascade | ||
| chroma | ||
| chroma_radiance | ||
| cogvideo | ||
| cosmos | ||
| depth_anything_3 | ||
| ernie | ||
| flux | ||
| genmo | ||
| hidream | ||
| hidream_o1 | ||
| hunyuan3d | ||
| hunyuan3dv2_1 | ||
| hunyuan_video | ||
| hydit | ||
| ideogram4 | ||
| joyimage | ||
| kandinsky5 | ||
| lens | ||
| lightricks | ||
| lumina | ||
| mmaudio/vae | ||
| models | ||
| modules | ||
| moge | ||
| omnigen | ||
| pixart | ||
| pixeldit | ||
| qwen_image | ||
| rt_detr | ||
| sam3 | ||
| supir | ||
| triposplat | ||
| wan | ||
| colormap.py | ||
| common_dit.py | ||
| util.py | ||