ComfyUI/comfy
huangfeice e96bd48e2d Adapt JoyImageEdit text encoder onto upstream Qwen3-VL stack
Upstream merged native Qwen3-VL support (#14298), adding
comfy/text_encoders/qwen3vl.py plus helpers in qwen_vl.py / llama.py /
qwen35.py. The JoyImage port previously shipped its own duplicate
Qwen3-VL implementation (comfy/text_encoders/qwen3_vl.py); that
duplication is now removed and the JoyImage text encoder rides on the
upstream stack.

- Delete comfy/text_encoders/qwen3_vl.py.
- Rewrite comfy/text_encoders/joyimage.py to subclass upstream
  comfy.text_encoders.qwen3vl. The JoyImage checkpoint is a stock
  qwen3vl_8b, so only JoyImage-specific behavior is overridden:
  * Qwen3VL8B_JoyImage.forward builds the 3D MRoPE position ids and
    injects deepstack visual features on the conditioning path. Upstream
    Qwen3VL only does this inside generate() via build_image_inputs;
    SDClipModel.forward never passes those kwargs. The JoyImage node
    feeds an image through the encoder (clip.tokenize(prompt, images=[..])),
    so the override reuses build_image_inputs to reproduce the multimodal
    conditioning that Llama2_.forward already accepts kwargs for.
  * preprocess_embed keeps JoyImage's bicubic+clamp image preprocessing
    (process_qwen3vl_image) instead of upstream's bilinear path, to
    preserve validated DiT numerics.
  * JoyImageTokenizer keeps the JoyImage system-prompt templates,
    suppresses the Qwen3 <think> block, and raises on image-placeholder
    count mismatch.
  * JoyImageTEModel keeps the drop_idx=34 system-prompt strip and the
    pre-final-norm layer tap (layer="hidden", layer_idx=-1).
- sd.py QWEN3VL_8B_JOYIMAGE branch: apply the same state-dict prefix
  remap the sibling QWEN3VL branch uses (model.language_model.->model.,
  model.visual.->visual., lm_head.->model.lm_head.) so the checkpoint
  loads into the upstream Qwen3VL namespace, then use the module-level
  llama_detect. Detection ordering is preserved: the JoyImage
  discriminator is checked before the generic Qwen3-VL deepstack key.

No changes to llama.py / qwen3vl.py / qwen_vl.py / qwen35.py.
2026-06-17 21:29:33 +08:00
..
audio_encoders Fix fp16 audio encoder models (#12811) 2026-03-06 18:20:07 -05:00
background_removal Some cast/dtype fixes for the birefnet and dino3 models. (#14217) 2026-06-01 14:35:26 -07:00
cldm Add better error message for common error. (#10846) 2025-11-23 04:55:22 -05:00
comfy_types Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
extra_samplers
image_encoders Depth anything 3 (Core-135) (#13853) 2026-06-10 09:28:24 +08:00
k_diffusion feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
ldm Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
sd1_tokenizer Silence clip tokenizer warning. (#8934) 2025-07-16 14:42:07 -04:00
t2i_adapter
taesd Add high quality preview support for Flux2 latents (#13496) 2026-04-29 19:37:30 -04:00
text_encoders Adapt JoyImageEdit text encoder onto upstream Qwen3-VL stack 2026-06-17 21:29:33 +08:00
weight_adapter MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446) 2026-02-15 20:30:09 -05:00
bg_removal_model.py Fix background removal mask output shape (#14171) 2026-05-29 09:14:32 -07:00
cli_args.py Comfy Aimdo 0.4.10 + Dynamic --reserve-vram + --vram-headroom (#14480) 2026-06-15 07:54:36 -07:00
clip_config_bigg.json
clip_model.py Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_config_g.json
clip_vision_config_h.json
clip_vision_config_vitl_336_llava.json Support llava clip vision model. 2025-03-06 00:24:43 -05:00
clip_vision_config_vitl_336.json
clip_vision_config_vitl.json
clip_vision_siglip2_base_naflex.json Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_siglip_384.json
clip_vision_siglip_512.json Support 512 siglip model. 2025-04-05 07:01:01 -04:00
clip_vision.py Some cast/dtype fixes for the birefnet and dino3 models. (#14217) 2026-06-01 14:35:26 -07:00
conds.py Cleanups to the last PR. (#12646) 2026-02-26 01:30:31 -05:00
context_windows.py feat: Context windows - add causal_window_fix to improve blending of context windows (CORE-100) (#13563) 2026-05-05 16:40:53 -07:00
controlnet.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
deploy_environment.py Add deploy environment header (Comfy-Env) to partner node API calls (#13425) 2026-05-04 20:17:56 -07:00
diffusers_convert.py
diffusers_load.py
float.py float: use CK stochastic rounding cuda kernel (#13971) 2026-05-28 19:23:42 -07:00
gligen.py Remove some useless code. (#8812) 2025-07-06 07:07:39 -04:00
hooks.py Fix typos (#10986) 2026-05-08 17:14:45 +08:00
latent_formats.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
lora_convert.py Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
lora.py Add LoRA key mapping for LTXV/LTXAV models (#14349) 2026-06-09 09:57:58 -04:00
memory_management.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_base.py Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
model_detection.py Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
model_management.py add --high-ram option (#14437) 2026-06-12 07:53:33 -07:00
model_patcher.py [Trainer/bug] Ensure model is not inference mode (CORE-72) (#13400) 2026-06-09 23:07:47 -04:00
model_prefetch.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_sampling.py feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
multigpu.py fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split (#14235) 2026-06-02 10:05:24 -07:00
nested_tensor.py WIP way to support multi multi dimensional latents. (#10456) 2025-10-23 21:21:14 -04:00
ops.py add --high-ram option (#14437) 2026-06-12 07:53:33 -07:00
options.py
patcher_extension.py Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
pinned_memory.py Fix interoperation with external source of pinned memory pressure (#14252) 2026-06-05 08:39:35 -07:00
pixel_space_convert.py Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
quant_ops.py Enable triton comfy kitchen via cli-arg (#12730) 2026-05-03 14:07:21 -04:00
rmsnorm.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
sample.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
sampler_helpers.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
samplers.py fix(multigpu): replace hardcoded torch.cuda.set_device with device-agnostic set_torch_device (#14191) 2026-05-30 21:18:42 -04:00
sd1_clip_config.json
sd1_clip.py feat: Support Qwen3.5 text generation models (#12771) 2026-03-25 22:48:28 -04:00
sd.py Adapt JoyImageEdit text encoder onto upstream Qwen3-VL stack 2026-06-17 21:29:33 +08:00
sdxl_clip.py Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803) 2025-04-25 19:36:00 -04:00
supported_models_base.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
supported_models.py Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
utils.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00