mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-07-03 21:20:49 +08:00
JoyImageEditPlus is the multi-image (1-6 reference images) variant of JoyImageEdit, trained from the same base. Its diffusers transformer shares byte-identical weight structure with the single-image variant (894 keys, zero rename) but injects references differently: instead of the single-image slot-stack (stack refs + noise into a 6D tensor and rotate on the frame dim, which forces all items to share resolution), each reference is independently patchified and concatenated on the sequence dim with per-image temporal-offset 3D RoPE, allowing references at different resolutions. Since the single-image port is not yet upstream, this unifies both variants onto the Plus-style forward rather than keeping two paths; single-image is now the ref=1 special case. Verified numerically: at ref=1 with equal resolution the new path's RoPE is bit-identical to the old slot-stack layout, and the transformer output matches the diffusers Plus reference (fp32, incl. the different-resolution case). ComfyUI runs cond/uncond in one forward with a shared reference configuration, so the diffusers Plus batched RoPE, padding attention_mask, and dedicated attention processor are unnecessary here: the unified forward reuses the existing unbatched _apply_rotary_emb and JoyImageAttention. Confirmed equivalent to the diffusers batched+mask path for a single sample. - comfy/ldm/joyimage/model.py: forward takes ref_latents and builds components=[target, ref0, ...]; per-component patchify + temporal-offset RoPE; output keeps only the target segment. Old single-grid RoPE removed. - comfy/model_base.py: JoyImage drops the slot-stack / frame-rotation / shape-equality path in _apply_model, passing ref_latents straight to the transformer. Guidance-rescale and the reference_latents requirement are kept. - comfy/text_encoders/joyimage.py: the image template emits one vision block per reference (N = image count); N=1 is byte-for-byte the old template. - comfy_extras/nodes_joyimage.py: add TextEncodeJoyImageEditPlus with optional image1..image6 inputs, each bucket-resized and VAE-encoded into the reference_latents list. Detection, supported_models, and sd.py need no changes: the identical weight structure routes both variants through image_model="joyimage". |
||
|---|---|---|
| .. | ||
| ace_lyrics_tokenizer | ||
| byt5_tokenizer | ||
| hydit_clip_tokenizer | ||
| llama_tokenizer | ||
| qwen25_tokenizer | ||
| qwen35_tokenizer | ||
| t5_pile_tokenizer | ||
| t5_tokenizer | ||
| ace15.py | ||
| ace_text_cleaners.py | ||
| ace.py | ||
| anima.py | ||
| aura_t5.py | ||
| bert.py | ||
| byt5_config_small_glyph.json | ||
| cogvideo.py | ||
| cosmos.py | ||
| ernie.py | ||
| flux.py | ||
| gemma4.py | ||
| genmo.py | ||
| gpt_oss.py | ||
| hidream_o1.py | ||
| hidream.py | ||
| hunyuan_image.py | ||
| hunyuan_video.py | ||
| hydit_clip.json | ||
| hydit.py | ||
| ideogram4.py | ||
| jina_clip_2.py | ||
| joyimage.py | ||
| kandinsky5.py | ||
| llama.py | ||
| long_clipl.py | ||
| longcat_image.py | ||
| lt.py | ||
| lumina2.py | ||
| mt5_config_xl.json | ||
| newbie.py | ||
| omnigen2.py | ||
| ovis.py | ||
| pixart_t5.py | ||
| pixeldit.py | ||
| qwen3vl.py | ||
| qwen35.py | ||
| qwen_image.py | ||
| qwen_vl.py | ||
| sa3.py | ||
| sa_t5.py | ||
| sam3_clip.py | ||
| sd2_clip_config.json | ||
| sd2_clip.py | ||
| sd3_clip.py | ||
| spiece_tokenizer.py | ||
| t5_config_base.json | ||
| t5_config_xxl.json | ||
| t5_old_config_xxl.json | ||
| t5_pile_config_xl.json | ||
| t5.py | ||
| umt5_config_base.json | ||
| umt5_config_xxl.json | ||
| wan.py | ||
| z_image.py | ||