EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-07-03 21:20:49 +08:00

History

huangfeice e29384be0d Add JoyImageEditPlus multi-image edit support (unify onto Plus-style forward) JoyImageEditPlus is the multi-image (1-6 reference images) variant of JoyImageEdit, trained from the same base. Its diffusers transformer shares byte-identical weight structure with the single-image variant (894 keys, zero rename) but injects references differently: instead of the single-image slot-stack (stack refs + noise into a 6D tensor and rotate on the frame dim, which forces all items to share resolution), each reference is independently patchified and concatenated on the sequence dim with per-image temporal-offset 3D RoPE, allowing references at different resolutions. Since the single-image port is not yet upstream, this unifies both variants onto the Plus-style forward rather than keeping two paths; single-image is now the ref=1 special case. Verified numerically: at ref=1 with equal resolution the new path's RoPE is bit-identical to the old slot-stack layout, and the transformer output matches the diffusers Plus reference (fp32, incl. the different-resolution case). ComfyUI runs cond/uncond in one forward with a shared reference configuration, so the diffusers Plus batched RoPE, padding attention_mask, and dedicated attention processor are unnecessary here: the unified forward reuses the existing unbatched _apply_rotary_emb and JoyImageAttention. Confirmed equivalent to the diffusers batched+mask path for a single sample. - comfy/ldm/joyimage/model.py: forward takes ref_latents and builds components=[target, ref0, ...]; per-component patchify + temporal-offset RoPE; output keeps only the target segment. Old single-grid RoPE removed. - comfy/model_base.py: JoyImage drops the slot-stack / frame-rotation / shape-equality path in _apply_model, passing ref_latents straight to the transformer. Guidance-rescale and the reference_latents requirement are kept. - comfy/text_encoders/joyimage.py: the image template emits one vision block per reference (N = image count); N=1 is byte-for-byte the old template. - comfy_extras/nodes_joyimage.py: add TextEncodeJoyImageEditPlus with optional image1..image6 inputs, each bucket-resized and VAE-encoded into the reference_latents list. Detection, supported_models, and sd.py need no changes: the identical weight structure routes both variants through image_model="joyimage".		2026-07-01 18:36:43 +08:00
..
ace_lyrics_tokenizer	Initial ACE-Step model implementation. (#7972 )	2025-05-07 08:33:34 -04:00
byt5_tokenizer	Support hunyuan image 2.1 regular model. (#9792 )	2025-09-10 02:05:07 -04:00
hydit_clip_tokenizer	Basic hunyuan dit implementation. (#4102 )	2024-07-25 18:21:08 -04:00
llama_tokenizer	Basic Hunyuan Video model support.	2024-12-16 19:35:40 -05:00
qwen25_tokenizer	Update qwen tokenizer to add qwen 3 tokens. (#11029 )	2025-12-01 17:13:48 -05:00
qwen35_tokenizer	feat: Support Qwen3.5 text generation models (#12771 )	2026-03-25 22:48:28 -04:00
t5_pile_tokenizer	Better tokenizing code for AuraFlow.	2024-07-12 01:15:25 -04:00
t5_tokenizer	Refactor: Move some code to the comfy/text_encoders folder.	2024-07-15 17:36:24 -04:00
ace15.py	fix(ace15): handle missing lm_metadata in memory estimation during checkpoint export #12669 (#12686 )	2026-02-28 01:18:40 -05:00
ace_text_cleaners.py	Make japanese hiragana and katakana characters work with ACE. (#7997 )	2025-05-08 03:32:36 -04:00
ace.py	Make japanese hiragana and katakana characters work with ACE. (#7997 )	2025-05-08 03:32:36 -04:00
anima.py	Small cleanup and try to get qwen 3 work with the text gen. (#12537 )	2026-02-19 22:42:28 -05:00
aura_t5.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
bert.py	P2 of qwen edit model. (#9412 )	2025-08-18 22:38:34 -04:00
byt5_config_small_glyph.json	Support hunyuan image 2.1 regular model. (#9792 )	2025-09-10 02:05:07 -04:00
cogvideo.py	Void model - pass 1 & 2 (CORE-38) (#13403 )	2026-05-05 19:59:04 -07:00
cosmos.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
ernie.py	Use `ErnieTEModel_` not `ErnieTEModel`. (#13431 )	2026-04-16 10:11:58 -04:00
flux.py	Implement Ernie Image model. (#13369 )	2026-04-11 22:29:31 -04:00
gemma4.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
genmo.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
gpt_oss.py	feat: Microsoft Lens support (CORE-248) (#14077 )	2026-05-25 23:01:51 -07:00
hidream_o1.py	feat: Support HiDream-O1-Image (CORE-187) (#13817 )	2026-05-11 20:35:53 -07:00
hidream.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
hunyuan_image.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
hunyuan_video.py	Support loading flux 2 klein checkpoints saved with SaveCheckpoint. (#12033 )	2026-01-22 18:20:48 -05:00
hydit_clip.json	Basic hunyuan dit implementation. (#4102 )	2024-07-25 18:21:08 -04:00
hydit.py	Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803 )	2025-04-25 19:36:00 -04:00
ideogram4.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
jina_clip_2.py	Implement Jina CLIP v2 and NewBie dual CLIP (#11415 )	2025-12-20 00:57:22 -05:00
joyimage.py	Add JoyImageEditPlus multi-image edit support (unify onto Plus-style forward)	2026-07-01 18:36:43 +08:00
kandinsky5.py	Fix qwen scaled fp8 not working with kandinsky. Make basic t2i wf work. (#11162 )	2025-12-06 17:50:10 -08:00
llama.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
long_clipl.py	Cleanup.	2025-04-15 12:13:28 -04:00
longcat_image.py	LongCat-Image edit (#13003 )	2026-03-21 23:51:05 -04:00
lt.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
lumina2.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
mt5_config_xl.json	Basic hunyuan dit implementation. (#4102 )	2024-07-25 18:21:08 -04:00
newbie.py	Only apply gemma quant config to gemma model for newbie. (#11436 )	2025-12-20 01:02:43 -05:00
omnigen2.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
ovis.py	Fix #11963 (#11982 )	2026-01-19 22:32:40 -05:00
pixart_t5.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
pixeldit.py	feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103 )	2026-05-26 17:50:14 -07:00
qwen3vl.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
qwen35.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
qwen_image.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
qwen_vl.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
sa3.py	Support Stable Audio 3 model. (#14010 )	2026-05-20 11:34:22 -04:00
sa_t5.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
sam3_clip.py	feat: SAM (segment anything) 3.1 support (CORE-34) (#13408 )	2026-04-23 00:07:43 -04:00
sd2_clip_config.json	Fix potential issue with non clip text embeddings.	2024-07-30 14:41:13 -04:00
sd2_clip.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
sd3_clip.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
spiece_tokenizer.py	feat: Add basic text generation support with native models, initially supporting Gemma3 (#12392 )	2026-02-18 20:49:43 -05:00
t5_config_base.json	Refactor: Move some code to the comfy/text_encoders folder.	2024-07-15 17:36:24 -04:00
t5_config_xxl.json	Refactor: Move some code to the comfy/text_encoders folder.	2024-07-15 17:36:24 -04:00
t5_old_config_xxl.json	WIP support for Nvidia Cosmos 7B and 14B text to world (video) models.	2025-01-10 09:14:16 -05:00
t5_pile_config_xl.json	AuraFlow model implementation.	2024-07-11 16:52:26 -04:00
t5.py	P2 of qwen edit model. (#9412 )	2025-08-18 22:38:34 -04:00
umt5_config_base.json	Initial ACE-Step model implementation. (#7972 )	2025-05-07 08:33:34 -04:00
umt5_config_xxl.json	WIP support for Wan t2v model.	2025-02-25 17:20:35 -05:00
wan.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
z_image.py	Enable embeddings for some qwen 3 models. (#12218 )	2026-02-02 03:51:09 -05:00