EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-07-03 21:20:49 +08:00

History

huangfeice 5260e18cdf Add JoyImageEdit native model support JoyImageEdit is an image-edit diffusion transformer from JD (jd-opensource), Apache 2.0. This adds native ComfyUI support so it loads and runs like other edit models (load checkpoint -> TextEncode + ReferenceLatent -> KSampler -> VAEDecode), with no diffusers dependency. Architecture: - Transformer (comfy/ldm/joyimage/model.py): dual-stream (img/txt) DiT with a Conv3d patch embed (patch_size [1,2,2]), Wan-style learnable modulation, and 3D RoPE (rope_dim_list [16,56,56]). All attention goes through comfy.ldm.modules.attention.optimized_attention. - Text encoder (comfy/text_encoders/{qwen3_vl,joyimage}.py): a reusable Qwen3-VL multimodal stack (vision tower + LM) in qwen3_vl.py, plus a thin JoyImage-specific layer (prompt templates, drop_idx, tokenizer, te() factory) in joyimage.py that depends on it. text_dim 4096. - VAE: reuses the existing Wan 2.1 latent format (AutoencoderKLWan), no new latent format. - Edit conditioning: reuses the reference_latents mechanism. Reference and noise latents are stacked on a new n-slot dimension and rotated at the model boundary (model_base.JoyImage), so the transformer stays 5D-in/5D-out. Guidance-rescale is built into the CFG path. Model wiring: - model_base.JoyImage uses ModelType.FLOW with sampling_settings multiplier=1000 (the time embedding is trained on t in [0,1000]) and shift=1.5; FLOW's linear time_snr_shift matches the diffusers FlowMatchEuler sigma schedule. - model_detection sniffs the transformer state-dict (double_blocks., condition_embedder., 5D img_in Conv3d) to route image_model="joyimage". - supported_models.JoyImage and the CLIPLoader "joyimage" type register it. User-facing node TextEncodeJoyImageEdit (comfy_extras/nodes_joyimage.py) bucket-resizes the input image to the nearest 1024-base bucket, encodes the prompt with the image, and emits both the conditioning and the bucketed image so the same pixels feed VAEEncode and the negative encode (JoyImage requires noise and reference latents to share spatial dims).		2026-06-17 18:53:36 +08:00
..
ace_lyrics_tokenizer	Initial ACE-Step model implementation. (#7972 )	2025-05-07 08:33:34 -04:00
byt5_tokenizer	Support hunyuan image 2.1 regular model. (#9792 )	2025-09-10 02:05:07 -04:00
hydit_clip_tokenizer
llama_tokenizer
qwen25_tokenizer	Update qwen tokenizer to add qwen 3 tokens. (#11029 )	2025-12-01 17:13:48 -05:00
qwen35_tokenizer	feat: Support Qwen3.5 text generation models (#12771 )	2026-03-25 22:48:28 -04:00
t5_pile_tokenizer
t5_tokenizer
ace15.py	fix(ace15): handle missing lm_metadata in memory estimation during checkpoint export #12669 (#12686 )	2026-02-28 01:18:40 -05:00
ace_text_cleaners.py	Make japanese hiragana and katakana characters work with ACE. (#7997 )	2025-05-08 03:32:36 -04:00
ace.py	Make japanese hiragana and katakana characters work with ACE. (#7997 )	2025-05-08 03:32:36 -04:00
anima.py	Small cleanup and try to get qwen 3 work with the text gen. (#12537 )	2026-02-19 22:42:28 -05:00
aura_t5.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
bert.py	P2 of qwen edit model. (#9412 )	2025-08-18 22:38:34 -04:00
byt5_config_small_glyph.json	Support hunyuan image 2.1 regular model. (#9792 )	2025-09-10 02:05:07 -04:00
cogvideo.py	Void model - pass 1 & 2 (CORE-38) (#13403 )	2026-05-05 19:59:04 -07:00
cosmos.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
ernie.py	Use `ErnieTEModel_` not `ErnieTEModel`. (#13431 )	2026-04-16 10:11:58 -04:00
flux.py	Implement Ernie Image model. (#13369 )	2026-04-11 22:29:31 -04:00
gemma4.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
genmo.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
gpt_oss.py	feat: Microsoft Lens support (CORE-248) (#14077 )	2026-05-25 23:01:51 -07:00
hidream_o1.py	feat: Support HiDream-O1-Image (CORE-187) (#13817 )	2026-05-11 20:35:53 -07:00
hidream.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
hunyuan_image.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
hunyuan_video.py	Support loading flux 2 klein checkpoints saved with SaveCheckpoint. (#12033 )	2026-01-22 18:20:48 -05:00
hydit_clip.json
hydit.py	Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803 )	2025-04-25 19:36:00 -04:00
ideogram4.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
jina_clip_2.py	Implement Jina CLIP v2 and NewBie dual CLIP (#11415 )	2025-12-20 00:57:22 -05:00
joyimage.py	Add JoyImageEdit native model support	2026-06-17 18:53:36 +08:00
kandinsky5.py	Fix qwen scaled fp8 not working with kandinsky. Make basic t2i wf work. (#11162 )	2025-12-06 17:50:10 -08:00
llama.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
long_clipl.py	Cleanup.	2025-04-15 12:13:28 -04:00
longcat_image.py	LongCat-Image edit (#13003 )	2026-03-21 23:51:05 -04:00
lt.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
lumina2.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
mt5_config_xl.json
newbie.py	Only apply gemma quant config to gemma model for newbie. (#11436 )	2025-12-20 01:02:43 -05:00
omnigen2.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
ovis.py	Fix #11963 (#11982 )	2026-01-19 22:32:40 -05:00
pixart_t5.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
pixeldit.py	feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103 )	2026-05-26 17:50:14 -07:00
qwen3_vl.py	Add JoyImageEdit native model support	2026-06-17 18:53:36 +08:00
qwen3vl.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
qwen35.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
qwen_image.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
qwen_vl.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
sa3.py	Support Stable Audio 3 model. (#14010 )	2026-05-20 11:34:22 -04:00
sa_t5.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
sam3_clip.py	feat: SAM (segment anything) 3.1 support (CORE-34) (#13408 )	2026-04-23 00:07:43 -04:00
sd2_clip_config.json
sd2_clip.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
sd3_clip.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
spiece_tokenizer.py	feat: Add basic text generation support with native models, initially supporting Gemma3 (#12392 )	2026-02-18 20:49:43 -05:00
t5_config_base.json
t5_config_xxl.json
t5_old_config_xxl.json
t5_pile_config_xl.json
t5.py	P2 of qwen edit model. (#9412 )	2025-08-18 22:38:34 -04:00
umt5_config_base.json	Initial ACE-Step model implementation. (#7972 )	2025-05-07 08:33:34 -04:00
umt5_config_xxl.json
wan.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
z_image.py	Enable embeddings for some qwen 3 models. (#12218 )	2026-02-02 03:51:09 -05:00