EasyAI代码托管平台

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-07-03 21:20:49 +08:00

History

huangfeice 5260e18cdf Add JoyImageEdit native model support JoyImageEdit is an image-edit diffusion transformer from JD (jd-opensource), Apache 2.0. This adds native ComfyUI support so it loads and runs like other edit models (load checkpoint -> TextEncode + ReferenceLatent -> KSampler -> VAEDecode), with no diffusers dependency. Architecture: - Transformer (comfy/ldm/joyimage/model.py): dual-stream (img/txt) DiT with a Conv3d patch embed (patch_size [1,2,2]), Wan-style learnable modulation, and 3D RoPE (rope_dim_list [16,56,56]). All attention goes through comfy.ldm.modules.attention.optimized_attention. - Text encoder (comfy/text_encoders/{qwen3_vl,joyimage}.py): a reusable Qwen3-VL multimodal stack (vision tower + LM) in qwen3_vl.py, plus a thin JoyImage-specific layer (prompt templates, drop_idx, tokenizer, te() factory) in joyimage.py that depends on it. text_dim 4096. - VAE: reuses the existing Wan 2.1 latent format (AutoencoderKLWan), no new latent format. - Edit conditioning: reuses the reference_latents mechanism. Reference and noise latents are stacked on a new n-slot dimension and rotated at the model boundary (model_base.JoyImage), so the transformer stays 5D-in/5D-out. Guidance-rescale is built into the CFG path. Model wiring: - model_base.JoyImage uses ModelType.FLOW with sampling_settings multiplier=1000 (the time embedding is trained on t in [0,1000]) and shift=1.5; FLOW's linear time_snr_shift matches the diffusers FlowMatchEuler sigma schedule. - model_detection sniffs the transformer state-dict (double_blocks., condition_embedder., 5D img_in Conv3d) to route image_model="joyimage". - supported_models.JoyImage and the CLIPLoader "joyimage" type register it. User-facing node TextEncodeJoyImageEdit (comfy_extras/nodes_joyimage.py) bucket-resizes the input image to the nearest 1024-base bucket, encodes the prompt with the image, and emits both the conditioning and the bucketed image so the same pixels feed VAEEncode and the negative encode (JoyImage requires noise and reference latents to share spatial dims).		2026-06-17 18:53:36 +08:00
..
ace	Support Ace Step 1.5 XL model. (#13317 )	2026-04-07 03:13:47 -04:00
anima	Fix anima LLM adapter forward when manual cast (#12504 )	2026-02-17 07:56:44 -08:00
audio	Disable sage attention in stable audio dit and VAE. (#14148 )	2026-05-27 20:35:03 -04:00
aura	Enable Runtime Selection of Attention Functions (#9639 )	2025-09-12 18:07:38 -04:00
cascade	cascade: remove dead weight init code (#13026 )	2026-03-17 20:59:10 -04:00
chroma	Implement NAG on all the models based on the Flux code. (#12500 )	2026-02-16 23:30:34 -05:00
chroma_radiance	Radiance: support variant with nonzero txt_ids (#14206 )	2026-06-01 22:07:48 -07:00
cogvideo	Cogvideox (#13402 )	2026-04-29 19:30:08 -04:00
cosmos	Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192 )	2026-05-30 17:53:37 -07:00
depth_anything_3	Depth anything 3 (Core-135) (#13853 )	2026-06-10 09:28:24 +08:00
ernie	Speed up ernie model by a bit on nvidia and use higher quality rope. (#14192 )	2026-05-30 17:53:37 -07:00
flux	Remove old useless no comfy kitchen fallback. (#14245 )	2026-06-02 17:52:41 -07:00
genmo	Enable Runtime Selection of Attention Functions (#9639 )	2025-09-12 18:07:38 -04:00
hidream	Enable Runtime Selection of Attention Functions (#9639 )	2025-09-12 18:07:38 -04:00
hidream_o1	feat: Support HiDream-O1-Image (CORE-187) (#13817 )	2026-05-11 20:35:53 -07:00
hunyuan3d	Enable Runtime Selection of Attention Functions (#9639 )	2025-09-12 18:07:38 -04:00
hunyuan3dv2_1	MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063 )	2026-05-25 18:26:40 -07:00
hunyuan_video	Implement NAG on all the models based on the Flux code. (#12500 )	2026-02-16 23:30:34 -05:00
hydit	Change cosmos and hydit models to use the native RMSNorm. (#7934 )	2025-05-04 06:26:20 -04:00
ideogram4	Fix potential dtype issue with ideogram 4. (#14436 )	2026-06-12 07:51:12 -07:00
joyimage	Add JoyImageEdit native model support	2026-06-17 18:53:36 +08:00
kandinsky5	Fix qwen scaled fp8 not working with kandinsky. Make basic t2i wf work. (#11162 )	2025-12-06 17:50:10 -08:00
lens	Lens: some cleanup (#14112 )	2026-05-26 10:32:53 +03:00
lightricks	fix: cross-attention AdaLN scale, shift, sigma parameters calculation (#14097 )	2026-05-25 20:07:09 -07:00
lumina	Remove useless annotations imports. (#14105 )	2026-05-25 19:23:29 -07:00
mmaudio/vae	Implement the mmaudio VAE. (#10300 )	2025-10-11 22:57:23 -04:00
models	Add support for small flux.2 decoder (#13314 )	2026-04-07 03:44:18 -04:00
modules	Revert "Add SeedVR2 support (CORE-6) (#14110 )" (#14359 )	2026-06-08 18:00:20 -04:00
moge	Remove useless annotations imports. (#14105 )	2026-05-25 19:23:29 -07:00
omnigen	Use comfy kitchen apply rope in omnigen2 model. (#14442 )	2026-06-13 09:38:39 +08:00
pixart	Remove windows line endings. (#8866 )	2025-07-11 02:37:51 -04:00
pixeldit	Support context window for PiD and fix lq_latent rounding (#14136 )	2026-05-27 12:08:06 -07:00
qwen_image	fix: Add back apply_rotary_emb for Qwen Image (#14364 )	2026-06-09 11:55:49 +08:00
rt_detr	CORE-13 feat: Support RT-DETRv4 detection model (#12748 )	2026-03-28 23:34:10 -04:00
sam3	Improve SAM3 large input handling (#13767 )	2026-05-07 17:18:28 -07:00
supir	feat: SUPIR model support (CORE-17) (#13250 )	2026-04-18 23:02:01 -04:00
triposplat	feat: Add TripoSplat support (#14210 )	2026-06-01 07:01:50 -07:00
wan	feat: Add Bernini-R model support (Wan video) (CORE-279) (#14216 )	2026-06-10 07:47:34 +08:00
colormap.py	Depth anything 3 (Core-135) (#13853 )	2026-06-10 09:28:24 +08:00
common_dit.py	add RMSNorm to comfy.ops	2025-04-14 18:00:33 -04:00
util.py	New Year ruff cleanup. (#11595 )	2026-01-01 22:06:14 -05:00