ComfyUI/comfy
huangfeice 5260e18cdf Add JoyImageEdit native model support
JoyImageEdit is an image-edit diffusion transformer from JD (jd-opensource),
Apache 2.0. This adds native ComfyUI support so it loads and runs like other
edit models (load checkpoint -> TextEncode + ReferenceLatent -> KSampler ->
VAEDecode), with no diffusers dependency.

Architecture:
- Transformer (comfy/ldm/joyimage/model.py): dual-stream (img/txt) DiT with a
  Conv3d patch embed (patch_size [1,2,2]), Wan-style learnable modulation,
  and 3D RoPE (rope_dim_list [16,56,56]). All attention goes through
  comfy.ldm.modules.attention.optimized_attention.
- Text encoder (comfy/text_encoders/{qwen3_vl,joyimage}.py): a reusable
  Qwen3-VL multimodal stack (vision tower + LM) in qwen3_vl.py, plus a thin
  JoyImage-specific layer (prompt templates, drop_idx, tokenizer, te() factory)
  in joyimage.py that depends on it. text_dim 4096.
- VAE: reuses the existing Wan 2.1 latent format (AutoencoderKLWan), no new
  latent format.
- Edit conditioning: reuses the reference_latents mechanism. Reference and
  noise latents are stacked on a new n-slot dimension and rotated at the model
  boundary (model_base.JoyImage), so the transformer stays 5D-in/5D-out.
  Guidance-rescale is built into the CFG path.

Model wiring:
- model_base.JoyImage uses ModelType.FLOW with sampling_settings
  multiplier=1000 (the time embedding is trained on t in [0,1000]) and
  shift=1.5; FLOW's linear time_snr_shift matches the diffusers
  FlowMatchEuler sigma schedule.
- model_detection sniffs the transformer state-dict (double_blocks.*,
  condition_embedder.*, 5D img_in Conv3d) to route image_model="joyimage".
- supported_models.JoyImage and the CLIPLoader "joyimage" type register it.

User-facing node TextEncodeJoyImageEdit (comfy_extras/nodes_joyimage.py)
bucket-resizes the input image to the nearest 1024-base bucket, encodes the
prompt with the image, and emits both the conditioning and the bucketed image
so the same pixels feed VAEEncode and the negative encode (JoyImage requires
noise and reference latents to share spatial dims).
2026-06-17 18:53:36 +08:00
..
audio_encoders Fix fp16 audio encoder models (#12811) 2026-03-06 18:20:07 -05:00
background_removal Some cast/dtype fixes for the birefnet and dino3 models. (#14217) 2026-06-01 14:35:26 -07:00
cldm Add better error message for common error. (#10846) 2025-11-23 04:55:22 -05:00
comfy_types Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
extra_samplers
image_encoders Depth anything 3 (Core-135) (#13853) 2026-06-10 09:28:24 +08:00
k_diffusion feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
ldm Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
sd1_tokenizer
t2i_adapter
taesd Add high quality preview support for Flux2 latents (#13496) 2026-04-29 19:37:30 -04:00
text_encoders Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
weight_adapter MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446) 2026-02-15 20:30:09 -05:00
bg_removal_model.py Fix background removal mask output shape (#14171) 2026-05-29 09:14:32 -07:00
cli_args.py Comfy Aimdo 0.4.10 + Dynamic --reserve-vram + --vram-headroom (#14480) 2026-06-15 07:54:36 -07:00
clip_config_bigg.json
clip_model.py Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_config_g.json
clip_vision_config_h.json
clip_vision_config_vitl_336_llava.json
clip_vision_config_vitl_336.json
clip_vision_config_vitl.json
clip_vision_siglip2_base_naflex.json Support the siglip 2 naflex model as a clip vision model. (#11831) 2026-01-12 17:05:54 -05:00
clip_vision_siglip_384.json
clip_vision_siglip_512.json
clip_vision.py Some cast/dtype fixes for the birefnet and dino3 models. (#14217) 2026-06-01 14:35:26 -07:00
conds.py Cleanups to the last PR. (#12646) 2026-02-26 01:30:31 -05:00
context_windows.py feat: Context windows - add causal_window_fix to improve blending of context windows (CORE-100) (#13563) 2026-05-05 16:40:53 -07:00
controlnet.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
deploy_environment.py Add deploy environment header (Comfy-Env) to partner node API calls (#13425) 2026-05-04 20:17:56 -07:00
diffusers_convert.py
diffusers_load.py
float.py float: use CK stochastic rounding cuda kernel (#13971) 2026-05-28 19:23:42 -07:00
gligen.py
hooks.py Fix typos (#10986) 2026-05-08 17:14:45 +08:00
latent_formats.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
lora_convert.py Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
lora.py Add LoRA key mapping for LTXV/LTXAV models (#14349) 2026-06-09 09:57:58 -04:00
memory_management.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_base.py Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
model_detection.py Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
model_management.py add --high-ram option (#14437) 2026-06-12 07:53:33 -07:00
model_patcher.py [Trainer/bug] Ensure model is not inference mode (CORE-72) (#13400) 2026-06-09 23:07:47 -04:00
model_prefetch.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00
model_sampling.py feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
multigpu.py fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split (#14235) 2026-06-02 10:05:24 -07:00
nested_tensor.py WIP way to support multi multi dimensional latents. (#10456) 2025-10-23 21:21:14 -04:00
ops.py add --high-ram option (#14437) 2026-06-12 07:53:33 -07:00
options.py
patcher_extension.py Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
pinned_memory.py Fix interoperation with external source of pinned memory pressure (#14252) 2026-06-05 08:39:35 -07:00
pixel_space_convert.py Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
quant_ops.py Enable triton comfy kitchen via cli-arg (#12730) 2026-05-03 14:07:21 -04:00
rmsnorm.py feat: Gemma4 text generation support (CORE-30) (#13376) 2026-05-02 22:46:15 -04:00
sample.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
sampler_helpers.py MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
samplers.py fix(multigpu): replace hardcoded torch.cuda.set_device with device-agnostic set_torch_device (#14191) 2026-05-30 21:18:42 -04:00
sd1_clip_config.json
sd1_clip.py feat: Support Qwen3.5 text generation models (#12771) 2026-03-25 22:48:28 -04:00
sd.py Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
sdxl_clip.py
supported_models_base.py Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
supported_models.py Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
utils.py Threaded Loader performance fixes / improvements (+ Aimdo 0.4.6) (#14116) 2026-05-30 15:20:04 -04:00