mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-07-03 21:20:49 +08:00
Upstream merged native Qwen3-VL support (#14298), adding comfy/text_encoders/qwen3vl.py plus helpers in qwen_vl.py / llama.py / qwen35.py. The JoyImage port previously shipped its own duplicate Qwen3-VL implementation (comfy/text_encoders/qwen3_vl.py); that duplication is now removed and the JoyImage text encoder rides on the upstream stack. - Delete comfy/text_encoders/qwen3_vl.py. - Rewrite comfy/text_encoders/joyimage.py to subclass upstream comfy.text_encoders.qwen3vl. The JoyImage checkpoint is a stock qwen3vl_8b, so only JoyImage-specific behavior is overridden: * Qwen3VL8B_JoyImage.forward builds the 3D MRoPE position ids and injects deepstack visual features on the conditioning path. Upstream Qwen3VL only does this inside generate() via build_image_inputs; SDClipModel.forward never passes those kwargs. The JoyImage node feeds an image through the encoder (clip.tokenize(prompt, images=[..])), so the override reuses build_image_inputs to reproduce the multimodal conditioning that Llama2_.forward already accepts kwargs for. * preprocess_embed keeps JoyImage's bicubic+clamp image preprocessing (process_qwen3vl_image) instead of upstream's bilinear path, to preserve validated DiT numerics. * JoyImageTokenizer keeps the JoyImage system-prompt templates, suppresses the Qwen3 <think> block, and raises on image-placeholder count mismatch. * JoyImageTEModel keeps the drop_idx=34 system-prompt strip and the pre-final-norm layer tap (layer="hidden", layer_idx=-1). - sd.py QWEN3VL_8B_JOYIMAGE branch: apply the same state-dict prefix remap the sibling QWEN3VL branch uses (model.language_model.->model., model.visual.->visual., lm_head.->model.lm_head.) so the checkpoint loads into the upstream Qwen3VL namespace, then use the module-level llama_detect. Detection ordering is preserved: the JoyImage discriminator is checked before the generic Qwen3-VL deepstack key. No changes to llama.py / qwen3vl.py / qwen_vl.py / qwen35.py. |
||
|---|---|---|
| .. | ||
| ace_lyrics_tokenizer | ||
| byt5_tokenizer | ||
| hydit_clip_tokenizer | ||
| llama_tokenizer | ||
| qwen25_tokenizer | ||
| qwen35_tokenizer | ||
| t5_pile_tokenizer | ||
| t5_tokenizer | ||
| ace15.py | ||
| ace_text_cleaners.py | ||
| ace.py | ||
| anima.py | ||
| aura_t5.py | ||
| bert.py | ||
| byt5_config_small_glyph.json | ||
| cogvideo.py | ||
| cosmos.py | ||
| ernie.py | ||
| flux.py | ||
| gemma4.py | ||
| genmo.py | ||
| gpt_oss.py | ||
| hidream_o1.py | ||
| hidream.py | ||
| hunyuan_image.py | ||
| hunyuan_video.py | ||
| hydit_clip.json | ||
| hydit.py | ||
| ideogram4.py | ||
| jina_clip_2.py | ||
| joyimage.py | ||
| kandinsky5.py | ||
| llama.py | ||
| long_clipl.py | ||
| longcat_image.py | ||
| lt.py | ||
| lumina2.py | ||
| mt5_config_xl.json | ||
| newbie.py | ||
| omnigen2.py | ||
| ovis.py | ||
| pixart_t5.py | ||
| pixeldit.py | ||
| qwen3vl.py | ||
| qwen35.py | ||
| qwen_image.py | ||
| qwen_vl.py | ||
| sa3.py | ||
| sa_t5.py | ||
| sam3_clip.py | ||
| sd2_clip_config.json | ||
| sd2_clip.py | ||
| sd3_clip.py | ||
| spiece_tokenizer.py | ||
| t5_config_base.json | ||
| t5_config_xxl.json | ||
| t5_old_config_xxl.json | ||
| t5_pile_config_xl.json | ||
| t5.py | ||
| umt5_config_base.json | ||
| umt5_config_xxl.json | ||
| wan.py | ||
| z_image.py | ||