Upstream merged native Qwen3-VL support (#14298), adding
comfy/text_encoders/qwen3vl.py plus helpers in qwen_vl.py / llama.py /
qwen35.py. The JoyImage port previously shipped its own duplicate
Qwen3-VL implementation (comfy/text_encoders/qwen3_vl.py); that
duplication is now removed and the JoyImage text encoder rides on the
upstream stack.
- Delete comfy/text_encoders/qwen3_vl.py.
- Rewrite comfy/text_encoders/joyimage.py to subclass upstream
comfy.text_encoders.qwen3vl. The JoyImage checkpoint is a stock
qwen3vl_8b, so only JoyImage-specific behavior is overridden:
* Qwen3VL8B_JoyImage.forward builds the 3D MRoPE position ids and
injects deepstack visual features on the conditioning path. Upstream
Qwen3VL only does this inside generate() via build_image_inputs;
SDClipModel.forward never passes those kwargs. The JoyImage node
feeds an image through the encoder (clip.tokenize(prompt, images=[..])),
so the override reuses build_image_inputs to reproduce the multimodal
conditioning that Llama2_.forward already accepts kwargs for.
* preprocess_embed keeps JoyImage's bicubic+clamp image preprocessing
(process_qwen3vl_image) instead of upstream's bilinear path, to
preserve validated DiT numerics.
* JoyImageTokenizer keeps the JoyImage system-prompt templates,
suppresses the Qwen3 <think> block, and raises on image-placeholder
count mismatch.
* JoyImageTEModel keeps the drop_idx=34 system-prompt strip and the
pre-final-norm layer tap (layer="hidden", layer_idx=-1).
- sd.py QWEN3VL_8B_JOYIMAGE branch: apply the same state-dict prefix
remap the sibling QWEN3VL branch uses (model.language_model.->model.,
model.visual.->visual., lm_head.->model.lm_head.) so the checkpoint
loads into the upstream Qwen3VL namespace, then use the module-level
llama_detect. Detection ordering is preserved: the JoyImage
discriminator is checked before the generic Qwen3-VL deepstack key.
No changes to llama.py / qwen3vl.py / qwen_vl.py / qwen35.py.
JoyImageEdit is an image-edit diffusion transformer from JD (jd-opensource),
Apache 2.0. This adds native ComfyUI support so it loads and runs like other
edit models (load checkpoint -> TextEncode + ReferenceLatent -> KSampler ->
VAEDecode), with no diffusers dependency.
Architecture:
- Transformer (comfy/ldm/joyimage/model.py): dual-stream (img/txt) DiT with a
Conv3d patch embed (patch_size [1,2,2]), Wan-style learnable modulation,
and 3D RoPE (rope_dim_list [16,56,56]). All attention goes through
comfy.ldm.modules.attention.optimized_attention.
- Text encoder (comfy/text_encoders/{qwen3_vl,joyimage}.py): a reusable
Qwen3-VL multimodal stack (vision tower + LM) in qwen3_vl.py, plus a thin
JoyImage-specific layer (prompt templates, drop_idx, tokenizer, te() factory)
in joyimage.py that depends on it. text_dim 4096.
- VAE: reuses the existing Wan 2.1 latent format (AutoencoderKLWan), no new
latent format.
- Edit conditioning: reuses the reference_latents mechanism. Reference and
noise latents are stacked on a new n-slot dimension and rotated at the model
boundary (model_base.JoyImage), so the transformer stays 5D-in/5D-out.
Guidance-rescale is built into the CFG path.
Model wiring:
- model_base.JoyImage uses ModelType.FLOW with sampling_settings
multiplier=1000 (the time embedding is trained on t in [0,1000]) and
shift=1.5; FLOW's linear time_snr_shift matches the diffusers
FlowMatchEuler sigma schedule.
- model_detection sniffs the transformer state-dict (double_blocks.*,
condition_embedder.*, 5D img_in Conv3d) to route image_model="joyimage".
- supported_models.JoyImage and the CLIPLoader "joyimage" type register it.
User-facing node TextEncodeJoyImageEdit (comfy_extras/nodes_joyimage.py)
bucket-resizes the input image to the nearest 1024-base bucket, encodes the
prompt with the image, and emits both the conditioning and the bucketed image
so the same pixels feed VAEEncode and the negative encode (JoyImage requires
noise and reference latents to share spatial dims).
* main: implement --vram-headroom
Implement --vram-headroom for dynamic vram as a hybrid debug/diagnostic
option that can be used for people who still report shared VRAM spills.
They can trial and error the setting to maintain a bit more headroom to
avoid shared VRAM spills.
* main: implement --reserve-vram
Implement --reserve-vram as extra headroom on the simple method which
is semantically as close as possible to the stated functionality and
formet behaviour of non-dynamic VRAM.
Add this option for users who know they have so much ram they want
to pin everything or have a pagefile that outruns their disk speed.
The removes the RAM pressure caps completely and pins behind the
primary model load forcing all models to be permanently comitted
to RAM.
Some custom nodes .to weights completely out of load context which
can wreak havoc if its for a model that is not active. Detect this
condition and just let it fall-through to the non-dynamic loader
straight up.
Some custom nodes try to set this true globally. It messes with dynamic
VRAM with one-off spikes that can OOM but this is also very high risk
for windows where such allocations might get serviced by shared memory
fallback.
Trump it.
cleanup_models_gc can be called once per load_models_gpu via
free_memory, which in turn can de-activate an active model via
this reset_cast_buffers.
cleanup_models_gc() could also come via obscure garbage collector
paths so limit reset_cast_buffers to the post-node callsite instead.
* mm: split off registration helper to doer and headroom calc
* pinned_memory: implement registration comfy side
Move away from Aimdo buffer registrations which seem fraught with
danger and do it comfy side. Just start with the basic move.
* pinned_memory: do registrations as portable memory
* pinned_memory: discard async errors on registration fail
Like the good ol days.
* pinned_memory: implement abs shortfall retry
If pinned registration happens to fail despite the previous budget
ensures, consider the allocation shortfall, ensure it again, and
try again. This allows comfy pins to interoperate with other software
that might be doing substantive pinning.
* fix (MultiGPU): prevent freeze on manual abort when using MultiGPU CFG Split
Problem:
Upon manual abort application hangs indefinitely.
`InterruptProcessingException` inherits from `BaseException` and bypasses MultiGPU's worker error handling block so thread dies silently, leaving the main thread waiting forever for `result_q.get()`
Fix:
Catch `comfy.model_management.InterruptProcessingException` instead of `Exception` so it's caught and passed back via `result_q` to unblock the main thread when manual abort signal fires.
* oops
* mm: re-instantate smart memory for VRAM
* mm: restore non-dynamic smart memory
By popular demand. We aren't quite ready for the deprecation as non
dynamic enabled GPUs and some high-vram custom model loader setups
prefer the old full hands on.
* memory_management: Add direct to read GPU mode
Make destination optional (or make it optionally GPU) and use aimdo
to file_read direct to GPU.
* ops: Remove stream pin buffers and use aimdo reads
This consumed too much RAM and its better to just take the hit on
the CPU syncing back the stream on a short ring buffer. Aimdo
implements this so just rip the stream pin buffer from comfy.
* model_management: all active pin registration movement
Its better to just let the active model load past the pin limit as
pins and let the pins move around. The saves the HDD and SATA
people disk traffic while only costing a few GPU syncs.
* utils: use aimdo file handle
This opens on windows with more favourable flags
* mp: only count the model proper for loaded_ram and vram
Exclude live loras from the numbers to avoid the case where the reported
loaded memory exceeds the size of the model.
This causes me confusion in the Kijai visualizer when it looked fully
loaded but was hitting disk due to this accounding disrepency.
* utils: add bit reverse utility
useful for max scattering something ordered.
* pinned_memory: Implement offload balancing
Use a max scatter alogorithm to prioritize pins of the same size such
that when doing a little bit of offloading it gets scattered, allowing
the prefetcher to more evenly swollow the offload.
* comfy-aimdo 0.4.7
Aimdo 0.4.7 implement VRAM buffer exhaustion predection to avoid
early speculative load of weights that definately wont fix once the
inference gets further in.
* model-prefetch: consolidate pin ensures on the sync point
This could happen mid prefetch block, cause a sync of the entire
block and lose overlap. Get ahead of the problem with a free down
at the natural compute stream sync point.
* mm: Put a 2GB min on the pin ceiling
This is reasonably bad if it starts causing swap pressure, moreso than
during normal ram-cache proceedings. Clamp it.
* add --fast-disk