ComfyUI/comfy/ldm
huangfeice 5260e18cdf Add JoyImageEdit native model support
JoyImageEdit is an image-edit diffusion transformer from JD (jd-opensource),
Apache 2.0. This adds native ComfyUI support so it loads and runs like other
edit models (load checkpoint -> TextEncode + ReferenceLatent -> KSampler ->
VAEDecode), with no diffusers dependency.

Architecture:
- Transformer (comfy/ldm/joyimage/model.py): dual-stream (img/txt) DiT with a
  Conv3d patch embed (patch_size [1,2,2]), Wan-style learnable modulation,
  and 3D RoPE (rope_dim_list [16,56,56]). All attention goes through
  comfy.ldm.modules.attention.optimized_attention.
- Text encoder (comfy/text_encoders/{qwen3_vl,joyimage}.py): a reusable
  Qwen3-VL multimodal stack (vision tower + LM) in qwen3_vl.py, plus a thin
  JoyImage-specific layer (prompt templates, drop_idx, tokenizer, te() factory)
  in joyimage.py that depends on it. text_dim 4096.
- VAE: reuses the existing Wan 2.1 latent format (AutoencoderKLWan), no new
  latent format.
- Edit conditioning: reuses the reference_latents mechanism. Reference and
  noise latents are stacked on a new n-slot dimension and rotated at the model
  boundary (model_base.JoyImage), so the transformer stays 5D-in/5D-out.
  Guidance-rescale is built into the CFG path.

Model wiring:
- model_base.JoyImage uses ModelType.FLOW with sampling_settings
  multiplier=1000 (the time embedding is trained on t in [0,1000]) and
  shift=1.5; FLOW's linear time_snr_shift matches the diffusers
  FlowMatchEuler sigma schedule.
- model_detection sniffs the transformer state-dict (double_blocks.*,
  condition_embedder.*, 5D img_in Conv3d) to route image_model="joyimage".
- supported_models.JoyImage and the CLIPLoader "joyimage" type register it.

User-facing node TextEncodeJoyImageEdit (comfy_extras/nodes_joyimage.py)
bucket-resizes the input image to the nearest 1024-base bucket, encodes the
prompt with the image, and emits both the conditioning and the bucketed image
so the same pixels feed VAEEncode and the negative encode (JoyImage requires
noise and reference latents to share spatial dims).
2026-06-17 18:53:36 +08:00
..
ace Support Ace Step 1.5 XL model. (#13317) 2026-04-07 03:13:47 -04:00
anima Fix anima LLM adapter forward when manual cast (#12504) 2026-02-17 07:56:44 -08:00
audio Disable sage attention in stable audio dit and VAE. (#14148) 2026-05-27 20:35:03 -04:00
aura Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
cascade cascade: remove dead weight init code (#13026) 2026-03-17 20:59:10 -04:00
chroma Implement NAG on all the models based on the Flux code. (#12500) 2026-02-16 23:30:34 -05:00
chroma_radiance Radiance: support variant with nonzero txt_ids (#14206) 2026-06-01 22:07:48 -07:00
cogvideo Cogvideox (#13402) 2026-04-29 19:30:08 -04:00
cosmos Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192) 2026-05-30 17:53:37 -07:00
depth_anything_3 Depth anything 3 (Core-135) (#13853) 2026-06-10 09:28:24 +08:00
ernie Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192) 2026-05-30 17:53:37 -07:00
flux Remove old useless no comfy kitchen fallback. (#14245) 2026-06-02 17:52:41 -07:00
genmo Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hidream Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hidream_o1 feat: Support HiDream-O1-Image (CORE-187) (#13817) 2026-05-11 20:35:53 -07:00
hunyuan3d Enable Runtime Selection of Attention Functions (#9639) 2025-09-12 18:07:38 -04:00
hunyuan3dv2_1 MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063) 2026-05-25 18:26:40 -07:00
hunyuan_video Implement NAG on all the models based on the Flux code. (#12500) 2026-02-16 23:30:34 -05:00
hydit Change cosmos and hydit models to use the native RMSNorm. (#7934) 2025-05-04 06:26:20 -04:00
ideogram4 Fix potential dtype issue with ideogram 4. (#14436) 2026-06-12 07:51:12 -07:00
joyimage Add JoyImageEdit native model support 2026-06-17 18:53:36 +08:00
kandinsky5 Fix qwen scaled fp8 not working with kandinsky. Make basic t2i wf work. (#11162) 2025-12-06 17:50:10 -08:00
lens Lens: some cleanup (#14112) 2026-05-26 10:32:53 +03:00
lightricks fix: cross-attention AdaLN scale, shift, sigma parameters calculation (#14097) 2026-05-25 20:07:09 -07:00
lumina Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
mmaudio/vae Implement the mmaudio VAE. (#10300) 2025-10-11 22:57:23 -04:00
models Add support for small flux.2 decoder (#13314) 2026-04-07 03:44:18 -04:00
modules Revert "Add SeedVR2 support (CORE-6) (#14110)" (#14359) 2026-06-08 18:00:20 -04:00
moge Remove useless annotations imports. (#14105) 2026-05-25 19:23:29 -07:00
omnigen Use comfy kitchen apply rope in omnigen2 model. (#14442) 2026-06-13 09:38:39 +08:00
pixart Remove windows line endings. (#8866) 2025-07-11 02:37:51 -04:00
pixeldit Support context window for PiD and fix lq_latent rounding (#14136) 2026-05-27 12:08:06 -07:00
qwen_image fix: Add back apply_rotary_emb for Qwen Image (#14364) 2026-06-09 11:55:49 +08:00
rt_detr CORE-13 feat: Support RT-DETRv4 detection model (#12748) 2026-03-28 23:34:10 -04:00
sam3 Improve SAM3 large input handling (#13767) 2026-05-07 17:18:28 -07:00
supir feat: SUPIR model support (CORE-17) (#13250) 2026-04-18 23:02:01 -04:00
triposplat feat: Add TripoSplat support (#14210) 2026-06-01 07:01:50 -07:00
wan feat: Add Bernini-R model support (Wan video) (CORE-279) (#14216) 2026-06-10 07:47:34 +08:00
colormap.py Depth anything 3 (Core-135) (#13853) 2026-06-10 09:28:24 +08:00
common_dit.py add RMSNorm to comfy.ops 2025-04-14 18:00:33 -04:00
util.py New Year ruff cleanup. (#11595) 2026-01-01 22:06:14 -05:00