EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-07-03 21:20:49 +08:00

History

huangfeice e96bd48e2d Adapt JoyImageEdit text encoder onto upstream Qwen3-VL stack Upstream merged native Qwen3-VL support (#14298), adding comfy/text_encoders/qwen3vl.py plus helpers in qwen_vl.py / llama.py / qwen35.py. The JoyImage port previously shipped its own duplicate Qwen3-VL implementation (comfy/text_encoders/qwen3_vl.py); that duplication is now removed and the JoyImage text encoder rides on the upstream stack. - Delete comfy/text_encoders/qwen3_vl.py. - Rewrite comfy/text_encoders/joyimage.py to subclass upstream comfy.text_encoders.qwen3vl. The JoyImage checkpoint is a stock qwen3vl_8b, so only JoyImage-specific behavior is overridden: * Qwen3VL8B_JoyImage.forward builds the 3D MRoPE position ids and injects deepstack visual features on the conditioning path. Upstream Qwen3VL only does this inside generate() via build_image_inputs; SDClipModel.forward never passes those kwargs. The JoyImage node feeds an image through the encoder (clip.tokenize(prompt, images=[..])), so the override reuses build_image_inputs to reproduce the multimodal conditioning that Llama2_.forward already accepts kwargs for. * preprocess_embed keeps JoyImage's bicubic+clamp image preprocessing (process_qwen3vl_image) instead of upstream's bilinear path, to preserve validated DiT numerics. * JoyImageTokenizer keeps the JoyImage system-prompt templates, suppresses the Qwen3 <think> block, and raises on image-placeholder count mismatch. * JoyImageTEModel keeps the drop_idx=34 system-prompt strip and the pre-final-norm layer tap (layer="hidden", layer_idx=-1). - sd.py QWEN3VL_8B_JOYIMAGE branch: apply the same state-dict prefix remap the sibling QWEN3VL branch uses (model.language_model.->model., model.visual.->visual., lm_head.->model.lm_head.) so the checkpoint loads into the upstream Qwen3VL namespace, then use the module-level llama_detect. Detection ordering is preserved: the JoyImage discriminator is checked before the generic Qwen3-VL deepstack key. No changes to llama.py / qwen3vl.py / qwen_vl.py / qwen35.py.		2026-06-17 21:29:33 +08:00
..
ace_lyrics_tokenizer	Initial ACE-Step model implementation. (#7972 )	2025-05-07 08:33:34 -04:00
byt5_tokenizer	Support hunyuan image 2.1 regular model. (#9792 )	2025-09-10 02:05:07 -04:00
hydit_clip_tokenizer
llama_tokenizer
qwen25_tokenizer	Update qwen tokenizer to add qwen 3 tokens. (#11029 )	2025-12-01 17:13:48 -05:00
qwen35_tokenizer	feat: Support Qwen3.5 text generation models (#12771 )	2026-03-25 22:48:28 -04:00
t5_pile_tokenizer
t5_tokenizer
ace15.py	fix(ace15): handle missing lm_metadata in memory estimation during checkpoint export #12669 (#12686 )	2026-02-28 01:18:40 -05:00
ace_text_cleaners.py	Make japanese hiragana and katakana characters work with ACE. (#7997 )	2025-05-08 03:32:36 -04:00
ace.py	Make japanese hiragana and katakana characters work with ACE. (#7997 )	2025-05-08 03:32:36 -04:00
anima.py	Small cleanup and try to get qwen 3 work with the text gen. (#12537 )	2026-02-19 22:42:28 -05:00
aura_t5.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
bert.py	P2 of qwen edit model. (#9412 )	2025-08-18 22:38:34 -04:00
byt5_config_small_glyph.json	Support hunyuan image 2.1 regular model. (#9792 )	2025-09-10 02:05:07 -04:00
cogvideo.py	Void model - pass 1 & 2 (CORE-38) (#13403 )	2026-05-05 19:59:04 -07:00
cosmos.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
ernie.py	Use `ErnieTEModel_` not `ErnieTEModel`. (#13431 )	2026-04-16 10:11:58 -04:00
flux.py	Implement Ernie Image model. (#13369 )	2026-04-11 22:29:31 -04:00
gemma4.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
genmo.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
gpt_oss.py	feat: Microsoft Lens support (CORE-248) (#14077 )	2026-05-25 23:01:51 -07:00
hidream_o1.py	feat: Support HiDream-O1-Image (CORE-187) (#13817 )	2026-05-11 20:35:53 -07:00
hidream.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
hunyuan_image.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
hunyuan_video.py	Support loading flux 2 klein checkpoints saved with SaveCheckpoint. (#12033 )	2026-01-22 18:20:48 -05:00
hydit_clip.json
hydit.py	Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803 )	2025-04-25 19:36:00 -04:00
ideogram4.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
jina_clip_2.py	Implement Jina CLIP v2 and NewBie dual CLIP (#11415 )	2025-12-20 00:57:22 -05:00
joyimage.py	Adapt JoyImageEdit text encoder onto upstream Qwen3-VL stack	2026-06-17 21:29:33 +08:00
kandinsky5.py	Fix qwen scaled fp8 not working with kandinsky. Make basic t2i wf work. (#11162 )	2025-12-06 17:50:10 -08:00
llama.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
long_clipl.py	Cleanup.	2025-04-15 12:13:28 -04:00
longcat_image.py	LongCat-Image edit (#13003 )	2026-03-21 23:51:05 -04:00
lt.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
lumina2.py	feat: Gemma4 text generation support (CORE-30) (#13376 )	2026-05-02 22:46:15 -04:00
mt5_config_xl.json
newbie.py	Only apply gemma quant config to gemma model for newbie. (#11436 )	2025-12-20 01:02:43 -05:00
omnigen2.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
ovis.py	Fix #11963 (#11982 )	2026-01-19 22:32:40 -05:00
pixart_t5.py	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
pixeldit.py	feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103 )	2026-05-26 17:50:14 -07:00
qwen3vl.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
qwen35.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
qwen_image.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
qwen_vl.py	feat: Support text generation with Qwen3-VL (CORE-276) (#14298 )	2026-06-17 08:12:44 +08:00
sa3.py	Support Stable Audio 3 model. (#14010 )	2026-05-20 11:34:22 -04:00
sa_t5.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
sam3_clip.py	feat: SAM (segment anything) 3.1 support (CORE-34) (#13408 )	2026-04-23 00:07:43 -04:00
sd2_clip_config.json
sd2_clip.py	More flexible long clip support.	2025-04-15 10:32:21 -04:00
sd3_clip.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
spiece_tokenizer.py	feat: Add basic text generation support with native models, initially supporting Gemma3 (#12392 )	2026-02-18 20:49:43 -05:00
t5_config_base.json
t5_config_xxl.json
t5_old_config_xxl.json
t5_pile_config_xl.json
t5.py	P2 of qwen edit model. (#9412 )	2025-08-18 22:38:34 -04:00
umt5_config_base.json	Initial ACE-Step model implementation. (#7972 )	2025-05-07 08:33:34 -04:00
umt5_config_xxl.json
wan.py	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
z_image.py	Enable embeddings for some qwen 3 models. (#12218 )	2026-02-02 03:51:09 -05:00