ComfyUI/comfy/text_encoders
huangfeice e29384be0d Add JoyImageEditPlus multi-image edit support (unify onto Plus-style forward)
JoyImageEditPlus is the multi-image (1-6 reference images) variant of
JoyImageEdit, trained from the same base. Its diffusers transformer shares
byte-identical weight structure with the single-image variant (894 keys, zero
rename) but injects references differently: instead of the single-image
slot-stack (stack refs + noise into a 6D tensor and rotate on the frame dim,
which forces all items to share resolution), each reference is independently
patchified and concatenated on the sequence dim with per-image temporal-offset
3D RoPE, allowing references at different resolutions.

Since the single-image port is not yet upstream, this unifies both variants
onto the Plus-style forward rather than keeping two paths; single-image is now
the ref=1 special case. Verified numerically: at ref=1 with equal resolution
the new path's RoPE is bit-identical to the old slot-stack layout, and the
transformer output matches the diffusers Plus reference (fp32, incl. the
different-resolution case).

ComfyUI runs cond/uncond in one forward with a shared reference configuration,
so the diffusers Plus batched RoPE, padding attention_mask, and dedicated
attention processor are unnecessary here: the unified forward reuses the
existing unbatched _apply_rotary_emb and JoyImageAttention. Confirmed
equivalent to the diffusers batched+mask path for a single sample.

- comfy/ldm/joyimage/model.py: forward takes ref_latents and builds
  components=[target, ref0, ...]; per-component patchify + temporal-offset
  RoPE; output keeps only the target segment. Old single-grid RoPE removed.
- comfy/model_base.py: JoyImage drops the slot-stack / frame-rotation /
  shape-equality path in _apply_model, passing ref_latents straight to the
  transformer. Guidance-rescale and the reference_latents requirement are kept.
- comfy/text_encoders/joyimage.py: the image template emits one vision block
  per reference (N = image count); N=1 is byte-for-byte the old template.
- comfy_extras/nodes_joyimage.py: add TextEncodeJoyImageEditPlus with optional
  image1..image6 inputs, each bucket-resized and VAE-encoded into the
  reference_latents list.

Detection, supported_models, and sd.py need no changes: the identical weight
structure routes both variants through image_model="joyimage".
2026-07-01 18:36:43 +08:00
..
ace_lyrics_tokenizer Initial ACE-Step model implementation. (#7972) 2025-05-07 08:33:34 -04:00
byt5_tokenizer Support hunyuan image 2.1 regular model. (#9792) 2025-09-10 02:05:07 -04:00
hydit_clip_tokenizer Basic hunyuan dit implementation. (#4102) 2024-07-25 18:21:08 -04:00
llama_tokenizer Basic Hunyuan Video model support. 2024-12-16 19:35:40 -05:00
qwen25_tokenizer Update qwen tokenizer to add qwen 3 tokens. (#11029) 2025-12-01 17:13:48 -05:00
qwen35_tokenizer feat: Support Qwen3.5 text generation models (#12771) 2026-03-25 22:48:28 -04:00
t5_pile_tokenizer Better tokenizing code for AuraFlow. 2024-07-12 01:15:25 -04:00
t5_tokenizer Refactor: Move some code to the comfy/text_encoders folder. 2024-07-15 17:36:24 -04:00
ace15.py fix(ace15): handle missing lm_metadata in memory estimation during checkpoint export #12669 (#12686) 2026-02-28 01:18:40 -05:00
ace_text_cleaners.py Make japanese hiragana and katakana characters work with ACE. (#7997) 2025-05-08 03:32:36 -04:00
ace.py Make japanese hiragana and katakana characters work with ACE. (#7997) 2025-05-08 03:32:36 -04:00
anima.py Small cleanup and try to get qwen 3 work with the text gen. (#12537) 2026-02-19 22:42:28 -05:00
aura_t5.py More flexible long clip support. 2025-04-15 10:32:21 -04:00
bert.py P2 of qwen edit model. (#9412) 2025-08-18 22:38:34 -04:00
byt5_config_small_glyph.json Support hunyuan image 2.1 regular model. (#9792) 2025-09-10 02:05:07 -04:00
cogvideo.py Void model - pass 1 & 2 (CORE-38) (#13403) 2026-05-05 19:59:04 -07:00
cosmos.py Fix chroma fp8 te being treated as fp16. (#11795) 2026-01-10 14:40:42 -08:00
ernie.py Use ErnieTEModel_ not ErnieTEModel. (#13431) 2026-04-16 10:11:58 -04:00
flux.py Implement Ernie Image model. (#13369) 2026-04-11 22:29:31 -04:00
gemma4.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
genmo.py Fix chroma fp8 te being treated as fp16. (#11795) 2026-01-10 14:40:42 -08:00
gpt_oss.py feat: Microsoft Lens support (CORE-248) (#14077) 2026-05-25 23:01:51 -07:00
hidream_o1.py feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
hidream.py Make old scaled fp8 format use the new mixed quant ops system. (#11000) 2025-12-05 14:35:42 -05:00
hunyuan_image.py Make old scaled fp8 format use the new mixed quant ops system. (#11000) 2025-12-05 14:35:42 -05:00
hunyuan_video.py Support loading flux 2 klein checkpoints saved with SaveCheckpoint. (#12033) 2026-01-22 18:20:48 -05:00
hydit_clip.json Basic hunyuan dit implementation. (#4102) 2024-07-25 18:21:08 -04:00
hydit.py Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803) 2025-04-25 19:36:00 -04:00
ideogram4.py feat: Support text generation with Qwen3-VL (CORE-276) (#14298) 2026-06-17 08:12:44 +08:00
jina_clip_2.py Implement Jina CLIP v2 and NewBie dual CLIP (#11415) 2025-12-20 00:57:22 -05:00
joyimage.py Add JoyImageEditPlus multi-image edit support (unify onto Plus-style forward) 2026-07-01 18:36:43 +08:00
kandinsky5.py Fix qwen scaled fp8 not working with kandinsky. Make basic t2i wf work. (#11162) 2025-12-06 17:50:10 -08:00
llama.py feat: Support text generation with Qwen3-VL (CORE-276) (#14298) 2026-06-17 08:12:44 +08:00
long_clipl.py Cleanup. 2025-04-15 12:13:28 -04:00
longcat_image.py LongCat-Image edit (#13003) 2026-03-21 23:51:05 -04:00
lt.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
lumina2.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
mt5_config_xl.json Basic hunyuan dit implementation. (#4102) 2024-07-25 18:21:08 -04:00
newbie.py Only apply gemma quant config to gemma model for newbie. (#11436) 2025-12-20 01:02:43 -05:00
omnigen2.py Make old scaled fp8 format use the new mixed quant ops system. (#11000) 2025-12-05 14:35:42 -05:00
ovis.py Fix #11963 (#11982) 2026-01-19 22:32:40 -05:00
pixart_t5.py Fix chroma fp8 te being treated as fp16. (#11795) 2026-01-10 14:40:42 -08:00
pixeldit.py feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103) 2026-05-26 17:50:14 -07:00
qwen3vl.py feat: Support text generation with Qwen3-VL (CORE-276) (#14298) 2026-06-17 08:12:44 +08:00
qwen35.py feat: Support text generation with Qwen3-VL (CORE-276) (#14298) 2026-06-17 08:12:44 +08:00
qwen_image.py Make old scaled fp8 format use the new mixed quant ops system. (#11000) 2025-12-05 14:35:42 -05:00
qwen_vl.py feat: Support text generation with Qwen3-VL (CORE-276) (#14298) 2026-06-17 08:12:44 +08:00
sa3.py Support Stable Audio 3 model. (#14010) 2026-05-20 11:34:22 -04:00
sa_t5.py More flexible long clip support. 2025-04-15 10:32:21 -04:00
sam3_clip.py feat: SAM (segment anything) 3.1 support (CORE-34) (#13408) 2026-04-23 00:07:43 -04:00
sd2_clip_config.json Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
sd2_clip.py More flexible long clip support. 2025-04-15 10:32:21 -04:00
sd3_clip.py Make old scaled fp8 format use the new mixed quant ops system. (#11000) 2025-12-05 14:35:42 -05:00
spiece_tokenizer.py feat: Add basic text generation support with native models, initially supporting Gemma3 (#12392) 2026-02-18 20:49:43 -05:00
t5_config_base.json Refactor: Move some code to the comfy/text_encoders folder. 2024-07-15 17:36:24 -04:00
t5_config_xxl.json Refactor: Move some code to the comfy/text_encoders folder. 2024-07-15 17:36:24 -04:00
t5_old_config_xxl.json WIP support for Nvidia Cosmos 7B and 14B text to world (video) models. 2025-01-10 09:14:16 -05:00
t5_pile_config_xl.json AuraFlow model implementation. 2024-07-11 16:52:26 -04:00
t5.py P2 of qwen edit model. (#9412) 2025-08-18 22:38:34 -04:00
umt5_config_base.json Initial ACE-Step model implementation. (#7972) 2025-05-07 08:33:34 -04:00
umt5_config_xxl.json WIP support for Wan t2v model. 2025-02-25 17:20:35 -05:00
wan.py Make old scaled fp8 format use the new mixed quant ops system. (#11000) 2025-12-05 14:35:42 -05:00
z_image.py Enable embeddings for some qwen 3 models. (#12218) 2026-02-02 03:51:09 -05:00