Merge f1c07a72c4 into 783782d5d7

Implement block prefetch + Lora Async load + and adopt in LTX (Speedup!) (CORE-111) (#13618 )
* mm: Use Aimdo raw allocator for cast buffers pytorch manages allocation of growing buffers on streams poorly. Pyt has no windows support for the expandable segments allocator (which is the right tool for this job), while also segmenting the memory by stream such that it can be generally re-used. So kick the problem to aimdo which can just grow a virtual region thats freed per stream. * plan * ops: move cpu handler up to the caller * ops: split up prefetch from weight prep block prefetching API Split up the casting and weight formating/lora stuff in prep for arbitrary prefetch support. * ops: implement block prefetching API allow a model to construct a prefetch list and operate it for increased async offload. * ltxv2: Implement block prefetching * Implement lora async offload Implement async offload of loras.
2026-05-24 07:57:29 +08:00 · 2026-05-03 08:28:16 +09:00 · 2026-05-02 19:23:24 -04:00 · 2026-05-01 20:19:46 -04:00 · 2026-05-01 20:19:32 -04:00 · 2026-05-02 06:37:18 +08:00
26 changed files with 991 additions and 602 deletions
--- a/.ci/windows_amd_base_files/run_amd_gpu_disable_smart_memory.bat
+++ b/.ci/windows_amd_base_files/run_amd_gpu_disable_smart_memory.bat
@ -1,2 +1,2 @@
-.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --disable-smart-memory
+.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --enable-dynamic-vram
 pause
--- a/2
+++ b/2
@ -1,2 +1,2 @@
 # Admins
-* @comfyanonymous @kosinkadink @guill @alexisrolland @rattus128
+* @comfyanonymous @kosinkadink @guill @alexisrolland @rattus128 @kijai
--- a/README.md
+++ b/README.md
@ -193,13 +193,15 @@ If you have trouble extracting it, right click the file -> properties -> unblock

 The portable above currently comes with python 3.13 and pytorch cuda 13.0. Update your Nvidia drivers if it doesn't start.

-#### Alternative Downloads:
+#### All Official Portable Downloads:

 [Portable for AMD GPUs](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_amd.7z)

-[Experimental portable for Intel GPUs](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_intel.7z)
+[Portable for Intel GPUs](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_intel.7z)

-[Portable with pytorch cuda 12.6 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu126.7z) (Supports Nvidia 10 series and older GPUs).
+[Portable for Nvidia GPUs](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z) (supports 20 series and above).
+
+[Portable for Nvidia GPUs with pytorch cuda 12.6 and python 3.12](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia_cu126.7z) (Supports Nvidia 10 series and older GPUs).

 #### How do I share models between another UI and ComfyUI?

--- a/comfy/cli_args.py
+++ b/comfy/cli_args.py
@ -90,7 +90,6 @@ parser.add_argument("--force-channels-last", action="store_true", help="Force ch
 parser.add_argument("--directml", type=int, nargs="?", metavar="DIRECTML_DEVICE", const=-1, help="Use torch-directml.")

 parser.add_argument("--oneapi-device-selector", type=str, default=None, metavar="SELECTOR_STRING", help="Sets the oneAPI device(s) this instance will use.")
-parser.add_argument("--disable-ipex-optimize", action="store_true", help="Disables ipex.optimize default when loading models with Intel's Extension for Pytorch.")
 parser.add_argument("--supports-fp8-compute", action="store_true", help="ComfyUI will act like if the device supports fp8 compute.")

 class LatentPreviewMethod(enum.Enum):
--- a/comfy/ldm/lightricks/av_model.py
+++ b/comfy/ldm/lightricks/av_model.py
@ -16,6 +16,7 @@ from comfy.ldm.lightricks.model import (
 from comfy.ldm.lightricks.symmetric_patchifier import AudioPatchifier
 from comfy.ldm.lightricks.embeddings_connector import Embeddings1DConnector
 import comfy.ldm.common_dit
+import comfy.model_prefetch

 class CompressedTimestep:
    """Store video timestep embeddings in compressed form using per-frame indexing."""
@ -907,9 +908,11 @@ class LTXAVModel(LTXVModel):
        """Process transformer blocks for LTXAV."""
        patches_replace = transformer_options.get("patches_replace", {})
        blocks_replace = patches_replace.get("dit", {})
+        prefetch_queue = comfy.model_prefetch.make_prefetch_queue(list(self.transformer_blocks), vx.device, transformer_options)

        # Process transformer blocks
        for i, block in enumerate(self.transformer_blocks):
+            comfy.model_prefetch.prefetch_queue_pop(prefetch_queue, vx.device, block)
            if ("double_block", i) in blocks_replace:

                def block_wrap(args):
@ -982,6 +985,8 @@ class LTXAVModel(LTXVModel):
                    a_prompt_timestep=a_prompt_timestep,
                )

+        comfy.model_prefetch.prefetch_queue_pop(prefetch_queue, vx.device, None)
+
        return [vx, ax]

    def _process_output(self, x, embedded_timestep, keyframe_idxs, **kwargs):
--- a/comfy/lora.py
+++ b/comfy/lora.py
@ -17,6 +17,7 @@
 """

 from __future__ import annotations
+import comfy.memory_management
 import comfy.utils
 import comfy.model_management
 import comfy.model_base
@ -473,3 +474,17 @@ def calculate_weight(patches, weight, key, intermediate_dtype=torch.float32, ori
            weight = old_weight

    return weight
+
+def prefetch_prepared_value(value, allocate_buffer, stream):
+    if isinstance(value, torch.Tensor):
+        dest = allocate_buffer(comfy.memory_management.vram_aligned_size(value))
+        comfy.model_management.cast_to_gathered([value], dest, non_blocking=True, stream=stream)
+        return comfy.memory_management.interpret_gathered_like([value], dest)[0]
+    elif isinstance(value, weight_adapter.WeightAdapterBase):
+        return type(value)(value.loaded_keys, prefetch_prepared_value(value.weights, allocate_buffer, stream))
+    elif isinstance(value, tuple):
+        return tuple(prefetch_prepared_value(item, allocate_buffer, stream) for item in value)
+    elif isinstance(value, list):
+        return [prefetch_prepared_value(item, allocate_buffer, stream) for item in value]
+
+    return value
--- a/comfy/model_base.py
+++ b/comfy/model_base.py
@ -214,6 +214,11 @@ class BaseModel(torch.nn.Module):
        if "latent_shapes" in extra_conds:
            xc = utils.unpack_latents(xc, extra_conds.pop("latent_shapes"))

+        transformer_options = transformer_options.copy()
+        transformer_options["prefetch_dynamic_vbars"] = (
+            self.current_patcher is not None and self.current_patcher.is_dynamic()
+        )
+
        model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
        if len(model_output) > 1 and not torch.is_tensor(model_output):
            model_output, _ = utils.pack_latents(model_output)
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@ -31,6 +31,7 @@ from contextlib import nullcontext
 import comfy.memory_management
 import comfy.utils
 import comfy.quant_ops
+import comfy_aimdo.vram_buffer

 class VRAMState(Enum):
    DISABLED = 0    #No vram present: no need to move models to vram
@ -112,10 +113,6 @@ if args.directml is not None:
    # torch_directml.disable_tiled_resources(True)
    lowvram_available = False #TODO: need to find a way to get free memory in directml before this can be enabled by default.

-try:
-    import intel_extension_for_pytorch as ipex  # noqa: F401
-except:
-    pass

 try:
    _ = torch.xpu.device_count()
@ -583,9 +580,6 @@ class LoadedModel:

        real_model = self.model.model

-        if is_intel_xpu() and not args.disable_ipex_optimize and 'ipex' in globals() and real_model is not None:
-            with torch.no_grad():
-                real_model = ipex.optimize(real_model.eval(), inplace=True, graph_mode=True, concat_linear=True)

        self.real_model = weakref.ref(real_model)
        self.model_finalizer = weakref.finalize(real_model, cleanup_models)
@ -1182,6 +1176,10 @@ stream_counters = {}

 STREAM_CAST_BUFFERS = {}
 LARGEST_CASTED_WEIGHT = (None, 0)
+STREAM_AIMDO_CAST_BUFFERS = {}
+LARGEST_AIMDO_CASTED_WEIGHT = (None, 0)
+
+DEFAULT_AIMDO_CAST_BUFFER_RESERVATION_SIZE = 16 * 1024 ** 3

 def get_cast_buffer(offload_stream, device, size, ref):
    global LARGEST_CASTED_WEIGHT
@ -1215,13 +1213,26 @@ def get_cast_buffer(offload_stream, device, size, ref):

    return cast_buffer

+def get_aimdo_cast_buffer(offload_stream, device):
+    cast_buffer = STREAM_AIMDO_CAST_BUFFERS.get(offload_stream, None)
+    if cast_buffer is None:
+        cast_buffer = comfy_aimdo.vram_buffer.VRAMBuffer(DEFAULT_AIMDO_CAST_BUFFER_RESERVATION_SIZE, device.index)
+        STREAM_AIMDO_CAST_BUFFERS[offload_stream] = cast_buffer
+
+    return cast_buffer
 def reset_cast_buffers():
    global LARGEST_CASTED_WEIGHT
+    global LARGEST_AIMDO_CASTED_WEIGHT
+
    LARGEST_CASTED_WEIGHT = (None, 0)
-    for offload_stream in STREAM_CAST_BUFFERS:
-        offload_stream.synchronize()
+    LARGEST_AIMDO_CASTED_WEIGHT = (None, 0)
+    for offload_stream in set(STREAM_CAST_BUFFERS) | set(STREAM_AIMDO_CAST_BUFFERS):
+        if offload_stream is not None:
+            offload_stream.synchronize()
    synchronize()
+
    STREAM_CAST_BUFFERS.clear()
+    STREAM_AIMDO_CAST_BUFFERS.clear()
    soft_empty_cache()

 def get_offload_stream(device):
@ -1581,10 +1592,7 @@ def should_use_fp16(device=None, model_params=0, prioritize_performance=True, ma
        return False

    if is_intel_xpu():
-        if torch_version_numeric < (2, 3):
-            return True
-        else:
-            return torch.xpu.get_device_properties(device).has_fp16
+        return torch.xpu.get_device_properties(device).has_fp16

    if is_ascend_npu():
        return True
@ -1650,10 +1658,7 @@ def should_use_bf16(device=None, model_params=0, prioritize_performance=True, ma
        return False

    if is_intel_xpu():
-        if torch_version_numeric < (2, 3):
-            return True
-        else:
-            return torch.xpu.is_bf16_supported()
+        return torch.xpu.is_bf16_supported()

    if is_ascend_npu():
        return True
@ -1784,6 +1789,7 @@ def soft_empty_cache(force=False):
    if cpu_state == CPUState.MPS:
        torch.mps.empty_cache()
    elif is_intel_xpu():
+        torch.xpu.synchronize()
        torch.xpu.empty_cache()
    elif is_ascend_npu():
        torch.npu.empty_cache()
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@ -121,9 +121,20 @@ class LowVramPatch:
        self.patches = patches
        self.convert_func = convert_func # TODO: remove
        self.set_func = set_func
+        self.prepared_patches = None
+
+    def prepare(self, allocate_buffer, stream):
+        self.prepared_patches = [
+            (patch[0], comfy.lora.prefetch_prepared_value(patch[1], allocate_buffer, stream), patch[2], patch[3], patch[4])
+            for patch in self.patches[self.key]
+        ]
+
+    def clear_prepared(self):
+        self.prepared_patches = None

    def __call__(self, weight):
-        return comfy.lora.calculate_weight(self.patches[self.key], weight, self.key, intermediate_dtype=weight.dtype)
+        patches = self.prepared_patches if self.prepared_patches is not None else self.patches[self.key]
+        return comfy.lora.calculate_weight(patches, weight, self.key, intermediate_dtype=weight.dtype)

 LOWVRAM_PATCH_ESTIMATE_MATH_FACTOR = 2

--- a/comfy/model_prefetch.py
+++ b/comfy/model_prefetch.py
@ -0,0 +1,65 @@
+import comfy_aimdo.model_vbar
+import comfy.model_management
+import comfy.ops
+
+PREFETCH_QUEUES = []
+
+def cleanup_prefetched_modules(comfy_modules):
+    for s in comfy_modules:
+        prefetch = getattr(s, "_prefetch", None)
+        if prefetch is None:
+            continue
+        for param_key in ("weight", "bias"):
+            lowvram_fn = getattr(s, param_key + "_lowvram_function", None)
+            if lowvram_fn is not None:
+                lowvram_fn.clear_prepared()
+        if prefetch["signature"] is not None:
+            comfy_aimdo.model_vbar.vbar_unpin(s._v)
+        delattr(s, "_prefetch")
+
+def cleanup_prefetch_queues():
+    global PREFETCH_QUEUES
+
+    for queue in PREFETCH_QUEUES:
+        for entry in queue:
+            if entry is None or not isinstance(entry, tuple):
+                continue
+            _, prefetch_state = entry
+            comfy_modules = prefetch_state[1]
+            if comfy_modules is not None:
+                cleanup_prefetched_modules(comfy_modules)
+    PREFETCH_QUEUES = []
+
+def prefetch_queue_pop(queue, device, module):
+    if queue is None:
+        return
+
+    consumed = queue.pop(0)
+    if consumed is not None:
+        offload_stream, prefetch_state = consumed
+        offload_stream.wait_stream(comfy.model_management.current_stream(device))
+        _, comfy_modules = prefetch_state
+        if comfy_modules is not None:
+            cleanup_prefetched_modules(comfy_modules)
+
+    prefetch = queue[0]
+    if prefetch is not None:
+        comfy_modules = []
+        for s in prefetch.modules():
+            if hasattr(s, "_v"):
+                comfy_modules.append(s)
+
+        offload_stream = comfy.ops.cast_modules_with_vbar(comfy_modules, None, device, None, True)
+        comfy.model_management.sync_stream(device, offload_stream)
+        queue[0] = (offload_stream, (prefetch, comfy_modules))
+
+def make_prefetch_queue(queue, device, transformer_options):
+    if (not transformer_options.get("prefetch_dynamic_vbars", False)
+        or comfy.model_management.NUM_STREAMS == 0
+        or comfy.model_management.is_device_cpu(device)
+        or not comfy.model_management.device_supports_non_blocking(device)):
+        return None
+
+    queue = [None] + queue + [None]
+    PREFETCH_QUEUES.append(queue)
+    return queue
--- a/comfy/ops.py
+++ b/comfy/ops.py
@ -86,38 +86,61 @@ def materialize_meta_param(s, param_keys):
            setattr(s, param_key, torch.nn.Parameter(torch.zeros(param.shape, dtype=param.dtype), requires_grad=param.requires_grad))


-def cast_bias_weight_with_vbar(s, dtype, device, bias_dtype, non_blocking, compute_dtype, want_requant):
-    #vbar doesn't support CPU weights, but some custom nodes have weird paths
-    #that might switch the layer to the CPU and expect it to work. We have to take
-    #a clone conservatively as we are mmapped and some SFT files are packed misaligned
-    #If you are a custom node author reading this, please move your layer to the GPU
-    #or declare your ModelPatcher as CPU in the first place.
-    if comfy.model_management.is_device_cpu(device):
-        materialize_meta_param(s, ["weight", "bias"])
-        weight = s.weight.to(dtype=dtype, copy=True)
-        if isinstance(weight, QuantizedTensor):
-            weight = weight.dequantize()
-        bias = None
-        if s.bias is not None:
-            bias = s.bias.to(dtype=bias_dtype, copy=True)
-        return weight, bias, (None, None, None)
-
+# FIXME: add n=1 cache hit fast path
+def cast_modules_with_vbar(comfy_modules, dtype, device, bias_dtype, non_blocking):
    offload_stream = None
-    xfer_dest = None
+    cast_buffer = None
+    cast_buffer_offset = 0
+
+    def ensure_offload_stream(module, required_size, check_largest):
+        nonlocal offload_stream
+        nonlocal cast_buffer
+
+        if offload_stream is None:
+            offload_stream = comfy.model_management.get_offload_stream(device)
+        if offload_stream is None or not check_largest or len(comfy_modules) != 1:
+            return
+
+        current_size = 0 if cast_buffer is None else cast_buffer.size()
+        if current_size < required_size and module is comfy.model_management.LARGEST_AIMDO_CASTED_WEIGHT[0]:
+            offload_stream = comfy.model_management.get_offload_stream(device)
+            cast_buffer = None
+        if required_size > comfy.model_management.LARGEST_AIMDO_CASTED_WEIGHT[1]:
+            comfy.model_management.LARGEST_AIMDO_CASTED_WEIGHT = (module, required_size)
+
+    def get_cast_buffer(buffer_size):
+        nonlocal offload_stream
+        nonlocal cast_buffer
+        nonlocal cast_buffer_offset
+
+        if buffer_size == 0:
+            return None
+
+        if offload_stream is None:
+            return torch.empty((buffer_size,), dtype=torch.uint8, device=device)
+
+        cast_buffer = comfy.model_management.get_aimdo_cast_buffer(offload_stream, device)
+        buffer = comfy_aimdo.torch.aimdo_to_tensor(cast_buffer.get(buffer_size, cast_buffer_offset), device)
+        cast_buffer_offset += buffer_size
+        return buffer
+
+    for s in comfy_modules:
+        signature = comfy_aimdo.model_vbar.vbar_fault(s._v)
+        resident = comfy_aimdo.model_vbar.vbar_signature_compare(signature, s._v_signature)
+        prefetch = {
+            "signature": signature,
+            "resident": resident,
+        }

-    signature = comfy_aimdo.model_vbar.vbar_fault(s._v)
-    resident = comfy_aimdo.model_vbar.vbar_signature_compare(signature, s._v_signature)
-    if signature is not None:
        if resident:
-            weight = s._v_weight
-            bias = s._v_bias
-        else:
-            xfer_dest = comfy_aimdo.torch.aimdo_to_tensor(s._v, device)
+            s._prefetch = prefetch
+            continue

-    if not resident:
        materialize_meta_param(s, ["weight", "bias"])
+        xfer_dest = comfy_aimdo.torch.aimdo_to_tensor(s._v, device) if signature is not None else None
        cast_geometry = comfy.memory_management.tensors_to_geometries([ s.weight, s.bias ])
        cast_dest = None
+        needs_cast = False

        xfer_source = [ s.weight, s.bias ]

@ -129,22 +152,15 @@ def cast_bias_weight_with_vbar(s, dtype, device, bias_dtype, non_blocking, compu
            if data is None:
                continue
            if data.dtype != geometry.dtype:
+                needs_cast = True
                cast_dest = xfer_dest
-                if cast_dest is None:
-                    cast_dest = torch.empty((comfy.memory_management.vram_aligned_size(cast_geometry),), dtype=torch.uint8, device=device)
                xfer_dest = None
                break

        dest_size = comfy.memory_management.vram_aligned_size(xfer_source)
-        offload_stream = comfy.model_management.get_offload_stream(device)
-        if xfer_dest is None and offload_stream is not None:
-                xfer_dest = comfy.model_management.get_cast_buffer(offload_stream, device, dest_size, s)
-                if xfer_dest is None:
-                    offload_stream = comfy.model_management.get_offload_stream(device)
-                    xfer_dest = comfy.model_management.get_cast_buffer(offload_stream, device, dest_size, s)
+        ensure_offload_stream(s, dest_size if xfer_dest is None else 0, True)
        if xfer_dest is None:
-            xfer_dest = torch.empty((dest_size,), dtype=torch.uint8, device=device)
-            offload_stream = None
+            xfer_dest = get_cast_buffer(dest_size)

        if signature is None and pin is None:
            comfy.pinned_memory.pin_memory(s)
@ -157,27 +173,54 @@ def cast_bias_weight_with_vbar(s, dtype, device, bias_dtype, non_blocking, compu
            xfer_source = [ pin ]
        #send it over
        comfy.model_management.cast_to_gathered(xfer_source, xfer_dest, non_blocking=non_blocking, stream=offload_stream)
-        comfy.model_management.sync_stream(device, offload_stream)

-        if cast_dest is not None:
+        for param_key in ("weight", "bias"):
+            lowvram_fn = getattr(s, param_key + "_lowvram_function", None)
+            if lowvram_fn is not None:
+                ensure_offload_stream(s, cast_buffer_offset, False)
+                lowvram_fn.prepare(lambda size: get_cast_buffer(size), offload_stream)
+
+        prefetch["xfer_dest"] = xfer_dest
+        prefetch["cast_dest"] = cast_dest
+        prefetch["cast_geometry"] = cast_geometry
+        prefetch["needs_cast"] = needs_cast
+        s._prefetch = prefetch
+
+    return offload_stream
+
+
+def resolve_cast_module_with_vbar(s, dtype, device, bias_dtype, compute_dtype, want_requant):
+
+    prefetch = getattr(s, "_prefetch", None)
+
+    if prefetch["resident"]:
+        weight = s._v_weight
+        bias = s._v_bias
+    else:
+        xfer_dest = prefetch["xfer_dest"]
+        if prefetch["needs_cast"]:
+            cast_dest = prefetch["cast_dest"] if prefetch["cast_dest"] is not None else torch.empty((comfy.memory_management.vram_aligned_size(prefetch["cast_geometry"]),), dtype=torch.uint8, device=device)
            for pre_cast, post_cast in zip(comfy.memory_management.interpret_gathered_like([s.weight, s.bias ], xfer_dest),
-                                           comfy.memory_management.interpret_gathered_like(cast_geometry, cast_dest)):
+                                           comfy.memory_management.interpret_gathered_like(prefetch["cast_geometry"], cast_dest)):
                if post_cast is not None:
                    post_cast.copy_(pre_cast)
            xfer_dest = cast_dest

-        params = comfy.memory_management.interpret_gathered_like(cast_geometry, xfer_dest)
+        params = comfy.memory_management.interpret_gathered_like(prefetch["cast_geometry"], xfer_dest)
        weight = params[0]
        bias = params[1]
-        if signature is not None:
+        if prefetch["signature"] is not None:
            s._v_weight = weight
            s._v_bias = bias
-        s._v_signature=signature
+        s._v_signature = prefetch["signature"]

    def post_cast(s, param_key, x, dtype, resident, update_weight):
        lowvram_fn = getattr(s, param_key + "_lowvram_function", None)
        fns = getattr(s, param_key + "_function", [])

+        if x is None:
+            return None
+
        orig = x

        def to_dequant(tensor, dtype):
@ -205,14 +248,12 @@ def cast_bias_weight_with_vbar(s, dtype, device, bias_dtype, non_blocking, compu
            x = f(x)
        return x

-    update_weight = signature is not None
+    update_weight = prefetch["signature"] is not None
+    weight = post_cast(s, "weight", weight, dtype, prefetch["resident"], update_weight)
+    if bias is not None:
+        bias = post_cast(s, "bias", bias, bias_dtype, prefetch["resident"], update_weight)

-    weight = post_cast(s, "weight", weight, dtype, resident, update_weight)
-    if s.bias is not None:
-        bias = post_cast(s, "bias", bias, bias_dtype, resident, update_weight)
-
-    #FIXME: weird offload return protocol
-    return weight, bias, (offload_stream, device if signature is not None else None, None)
+    return weight, bias


 def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None, offloadable=False, compute_dtype=None, want_requant=False):
@ -230,10 +271,46 @@ def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None, of
        if device is None:
            device = input.device

+    def format_return(result, offloadable):
+        weight, bias, offload_stream = result
+        return (weight, bias, offload_stream) if offloadable else (weight, bias)
+
    non_blocking = comfy.model_management.device_supports_non_blocking(device)

    if hasattr(s, "_v"):
-        return cast_bias_weight_with_vbar(s, dtype, device, bias_dtype, non_blocking, compute_dtype, want_requant)
+
+        #vbar doesn't support CPU weights, but some custom nodes have weird paths
+        #that might switch the layer to the CPU and expect it to work. We have to take
+        #a clone conservatively as we are mmapped and some SFT files are packed misaligned
+        #If you are a custom node author reading this, please move your layer to the GPU
+        #or declare your ModelPatcher as CPU in the first place.
+        if comfy.model_management.is_device_cpu(device):
+            materialize_meta_param(s, ["weight", "bias"])
+            weight = s.weight.to(dtype=dtype, copy=True)
+            if isinstance(weight, QuantizedTensor):
+                weight = weight.dequantize()
+            bias = s.bias.to(dtype=bias_dtype, copy=True) if s.bias is not None else None
+            return format_return((weight, bias, (None, None, None)), offloadable)
+
+        prefetched = hasattr(s, "_prefetch")
+        offload_stream = None
+        offload_device = None
+        if not prefetched:
+            offload_stream = cast_modules_with_vbar([s], dtype, device, bias_dtype, non_blocking)
+            comfy.model_management.sync_stream(device, offload_stream)
+
+        weight, bias = resolve_cast_module_with_vbar(s, dtype, device, bias_dtype, compute_dtype, want_requant)
+
+        if not prefetched:
+            if getattr(s, "_prefetch")["signature"] is not None:
+                offload_device = device
+            for param_key in ("weight", "bias"):
+                lowvram_fn = getattr(s, param_key + "_lowvram_function", None)
+                if lowvram_fn is not None:
+                    lowvram_fn.clear_prepared()
+            delattr(s, "_prefetch")
+        return format_return((weight, bias, (offload_stream, offload_device, None)), offloadable)
+

    if offloadable and (device != s.weight.device or
                        (s.bias is not None and device != s.bias.device)):
@ -280,11 +357,7 @@ def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None, of
        for f in s.weight_function:
            weight = f(weight)

-    if offloadable:
-        return weight, bias, (offload_stream, weight_a, bias_a)
-    else:
-        #Legacy function signature
-        return weight, bias
+    return format_return((weight, bias, (offload_stream, weight_a, bias_a)), offloadable)


 def uncast_bias_weight(s, weight, bias, offload_stream):
--- a/comfy_api_nodes/nodes_bytedance.py
+++ b/comfy_api_nodes/nodes_bytedance.py
@ -1403,7 +1403,6 @@ class ByteDance2TextToVideoNode(IO.ComfyNode):
            status_extractor=lambda r: r.status,
            price_extractor=_seedance2_price_extractor(model_id, has_video_input=False),
            poll_interval=9,
-            max_poll_attempts=180,
        )
        return IO.NodeOutput(await download_url_to_video_output(response.content.video_url))

@ -1585,7 +1584,6 @@ class ByteDance2FirstLastFrameNode(IO.ComfyNode):
            status_extractor=lambda r: r.status,
            price_extractor=_seedance2_price_extractor(model_id, has_video_input=False),
            poll_interval=9,
-            max_poll_attempts=180,
        )
        return IO.NodeOutput(await download_url_to_video_output(response.content.video_url))

@ -1907,7 +1905,6 @@ class ByteDance2ReferenceNode(IO.ComfyNode):
            status_extractor=lambda r: r.status,
            price_extractor=_seedance2_price_extractor(model_id, has_video_input=has_video_input),
            poll_interval=9,
-            max_poll_attempts=180,
        )
        return IO.NodeOutput(await download_url_to_video_output(response.content.video_url))

--- a/comfy_api_nodes/nodes_hitpaw.py
+++ b/comfy_api_nodes/nodes_hitpaw.py
@ -178,7 +178,6 @@ class HitPawGeneralImageEnhance(IO.ComfyNode):
            status_extractor=lambda x: x.data.status,
            price_extractor=lambda x: request_price,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        return IO.NodeOutput(await download_url_to_image_tensor(final_response.data.res_url))

@ -324,7 +323,6 @@ class HitPawVideoEnhance(IO.ComfyNode):
            status_extractor=lambda x: x.data.status,
            price_extractor=lambda x: request_price,
            poll_interval=10.0,
-            max_poll_attempts=320,
        )
        return IO.NodeOutput(await download_url_to_video_output(final_response.data.res_url))

--- a/comfy_api_nodes/nodes_kling.py
+++ b/comfy_api_nodes/nodes_kling.py
@ -276,7 +276,6 @@ async def finish_omni_video_task(cls: type[IO.ComfyNode], response: TaskStatusRe
        cls,
        ApiEndpoint(path=f"/proxy/kling/v1/videos/omni-video/{response.data.task_id}"),
        response_model=TaskStatusResponse,
-        max_poll_attempts=280,
        status_extractor=lambda r: (r.data.task_status if r.data else None),
    )
    return IO.NodeOutput(await download_url_to_video_output(final_response.data.task_result.videos[0].url))
@ -3062,7 +3061,6 @@ class KlingVideoNode(IO.ComfyNode):
            cls,
            ApiEndpoint(path=poll_path),
            response_model=TaskStatusResponse,
-            max_poll_attempts=280,
            status_extractor=lambda r: (r.data.task_status if r.data else None),
        )
        return IO.NodeOutput(await download_url_to_video_output(final_response.data.task_result.videos[0].url))
@ -3188,7 +3186,6 @@ class KlingFirstLastFrameNode(IO.ComfyNode):
            cls,
            ApiEndpoint(path=f"/proxy/kling/v1/videos/image2video/{response.data.task_id}"),
            response_model=TaskStatusResponse,
-            max_poll_attempts=280,
            status_extractor=lambda r: (r.data.task_status if r.data else None),
        )
        return IO.NodeOutput(await download_url_to_video_output(final_response.data.task_result.videos[0].url))
--- a/comfy_api_nodes/nodes_magnific.py
+++ b/comfy_api_nodes/nodes_magnific.py
@ -230,7 +230,6 @@ class MagnificImageUpscalerCreativeNode(IO.ComfyNode):
            status_extractor=lambda x: x.status,
            price_extractor=lambda _: price_usd,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))

@ -391,7 +390,6 @@ class MagnificImageUpscalerPreciseV2Node(IO.ComfyNode):
            status_extractor=lambda x: x.status,
            price_extractor=lambda _: price_usd,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))

@ -541,7 +539,6 @@ class MagnificImageStyleTransferNode(IO.ComfyNode):
            response_model=TaskResponse,
            status_extractor=lambda x: x.status,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))

@ -782,7 +779,6 @@ class MagnificImageRelightNode(IO.ComfyNode):
            response_model=TaskResponse,
            status_extractor=lambda x: x.status,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))

@ -924,7 +920,6 @@ class MagnificImageSkinEnhancerNode(IO.ComfyNode):
            response_model=TaskResponse,
            status_extractor=lambda x: x.status,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        return IO.NodeOutput(await download_url_to_image_tensor(final_response.generated[0]))

--- a/comfy_api_nodes/nodes_topaz.py
+++ b/comfy_api_nodes/nodes_topaz.py
@ -453,7 +453,6 @@ class TopazVideoEnhance(IO.ComfyNode):
            progress_extractor=lambda x: getattr(x, "progress", 0),
            price_extractor=lambda x: (x.estimates.cost[0] * 0.08 if x.estimates and x.estimates.cost[0] else None),
            poll_interval=10.0,
-            max_poll_attempts=320,
        )
        return IO.NodeOutput(await download_url_to_video_output(final_response.download.url))

--- a/comfy_api_nodes/nodes_vidu.py
+++ b/comfy_api_nodes/nodes_vidu.py
@ -38,7 +38,7 @@ async def execute_task(
    cls: type[IO.ComfyNode],
    vidu_endpoint: str,
    payload: TaskCreationRequest | TaskExtendCreationRequest | TaskMultiFrameCreationRequest,
-    max_poll_attempts: int = 320,
+    max_poll_attempts: int = 480,
 ) -> list[TaskResult]:
    task_creation_response = await sync_op(
        cls,
@ -1097,7 +1097,6 @@ class ViduExtendVideoNode(IO.ComfyNode):
                video_url=await upload_video_to_comfyapi(cls, video, wait_label="Uploading video"),
                images=[image_url] if image_url else None,
            ),
-            max_poll_attempts=480,
        )
        return IO.NodeOutput(await download_url_to_video_output(results[0].url))

--- a/comfy_api_nodes/nodes_wan.py
+++ b/comfy_api_nodes/nodes_wan.py
@ -818,7 +818,6 @@ class WanReferenceVideoApi(IO.ComfyNode):
            response_model=VideoTaskStatusResponse,
            status_extractor=lambda x: x.output.task_status,
            poll_interval=6,
-            max_poll_attempts=280,
        )
        return IO.NodeOutput(await download_url_to_video_output(response.output.video_url))

--- a/comfy_api_nodes/nodes_wavespeed.py
+++ b/comfy_api_nodes/nodes_wavespeed.py
@ -84,7 +84,6 @@ class WavespeedFlashVSRNode(IO.ComfyNode):
            response_model=TaskResultResponse,
            status_extractor=lambda x: "failed" if x.data is None else x.data.status,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        if final_response.code != 200:
            raise ValueError(
@ -156,7 +155,6 @@ class WavespeedImageUpscaleNode(IO.ComfyNode):
            response_model=TaskResultResponse,
            status_extractor=lambda x: "failed" if x.data is None else x.data.status,
            poll_interval=10.0,
-            max_poll_attempts=480,
        )
        if final_response.code != 200:
            raise ValueError(
--- a/comfy_api_nodes/util/client.py
+++ b/comfy_api_nodes/util/client.py
@ -148,7 +148,7 @@ async def poll_op(
    queued_statuses: list[str | int] | None = None,
    data: BaseModel | None = None,
    poll_interval: float = 5.0,
-    max_poll_attempts: int = 160,
+    max_poll_attempts: int = 480,
    timeout_per_poll: float = 120.0,
    max_retries_per_poll: int = 10,
    retry_delay_per_poll: float = 1.0,
@ -254,7 +254,7 @@ async def poll_op_raw(
    queued_statuses: list[str | int] | None = None,
    data: dict[str, Any] | BaseModel | None = None,
    poll_interval: float = 5.0,
-    max_poll_attempts: int = 160,
+    max_poll_attempts: int = 480,
    timeout_per_poll: float = 120.0,
    max_retries_per_poll: int = 10,
    retry_delay_per_poll: float = 1.0,
--- a/comfy_extras/nodes_model_merging.py
+++ b/comfy_extras/nodes_model_merging.py
@ -10,146 +10,198 @@ import json
 import os

 from comfy.cli_args import args
+from comfy_api.latest import io, ComfyExtension
+from typing_extensions import override

-class ModelMergeSimple:
+
+class ModelMergeSimple(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model1": ("MODEL",),
-                              "model2": ("MODEL",),
-                              "ratio": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01}),
-                              }}
-    RETURN_TYPES = ("MODEL",)
-    FUNCTION = "merge"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ModelMergeSimple",
+            category="advanced/model_merging",
+            inputs=[
+                io.Model.Input("model1"),
+                io.Model.Input("model2"),
+                io.Float.Input("ratio", default=1.0, min=0.0, max=1.0, step=0.01),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+        )

-    CATEGORY = "advanced/model_merging"
-
-    def merge(self, model1, model2, ratio):
+    @classmethod
+    def execute(cls, model1, model2, ratio) -> io.NodeOutput:
        m = model1.clone()
        kp = model2.get_key_patches("diffusion_model.")
        for k in kp:
            m.add_patches({k: kp[k]}, 1.0 - ratio, ratio)
-        return (m, )
+        return io.NodeOutput(m)

-class ModelSubtract:
+    merge = execute  # TODO: remove
+
+
+class ModelSubtract(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model1": ("MODEL",),
-                              "model2": ("MODEL",),
-                              "multiplier": ("FLOAT", {"default": 1.0, "min": -10.0, "max": 10.0, "step": 0.01}),
-                              }}
-    RETURN_TYPES = ("MODEL",)
-    FUNCTION = "merge"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ModelMergeSubtract",
+            category="advanced/model_merging",
+            inputs=[
+                io.Model.Input("model1"),
+                io.Model.Input("model2"),
+                io.Float.Input("multiplier", default=1.0, min=-10.0, max=10.0, step=0.01),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+        )

-    CATEGORY = "advanced/model_merging"
-
-    def merge(self, model1, model2, multiplier):
+    @classmethod
+    def execute(cls, model1, model2, multiplier) -> io.NodeOutput:
        m = model1.clone()
        kp = model2.get_key_patches("diffusion_model.")
        for k in kp:
            m.add_patches({k: kp[k]}, - multiplier, multiplier)
-        return (m, )
+        return io.NodeOutput(m)

-class ModelAdd:
+    merge = execute  # TODO: remove
+
+
+class ModelAdd(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model1": ("MODEL",),
-                              "model2": ("MODEL",),
-                              }}
-    RETURN_TYPES = ("MODEL",)
-    FUNCTION = "merge"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ModelMergeAdd",
+            category="advanced/model_merging",
+            inputs=[
+                io.Model.Input("model1"),
+                io.Model.Input("model2"),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+        )

-    CATEGORY = "advanced/model_merging"
-
-    def merge(self, model1, model2):
+    @classmethod
+    def execute(cls, model1, model2) -> io.NodeOutput:
        m = model1.clone()
        kp = model2.get_key_patches("diffusion_model.")
        for k in kp:
            m.add_patches({k: kp[k]}, 1.0, 1.0)
-        return (m, )
+        return io.NodeOutput(m)
+
+    merge = execute  # TODO: remove


-class CLIPMergeSimple:
+class CLIPMergeSimple(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "clip1": ("CLIP",),
-                              "clip2": ("CLIP",),
-                              "ratio": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01}),
-                              }}
-    RETURN_TYPES = ("CLIP",)
-    FUNCTION = "merge"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="CLIPMergeSimple",
+            category="advanced/model_merging",
+            inputs=[
+                io.Clip.Input("clip1"),
+                io.Clip.Input("clip2"),
+                io.Float.Input("ratio", default=1.0, min=0.0, max=1.0, step=0.01),
+            ],
+            outputs=[
+                io.Clip.Output(),
+            ],
+        )

-    CATEGORY = "advanced/model_merging"
-
-    def merge(self, clip1, clip2, ratio):
+    @classmethod
+    def execute(cls, clip1, clip2, ratio) -> io.NodeOutput:
        m = clip1.clone()
        kp = clip2.get_key_patches()
        for k in kp:
            if k.endswith(".position_ids") or k.endswith(".logit_scale"):
                continue
            m.add_patches({k: kp[k]}, 1.0 - ratio, ratio)
-        return (m, )
+        return io.NodeOutput(m)
+
+    merge = execute  # TODO: remove


-class CLIPSubtract:
-    SEARCH_ALIASES = ["clip difference", "text encoder subtract"]
+class CLIPSubtract(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "clip1": ("CLIP",),
-                              "clip2": ("CLIP",),
-                              "multiplier": ("FLOAT", {"default": 1.0, "min": -10.0, "max": 10.0, "step": 0.01}),
-                              }}
-    RETURN_TYPES = ("CLIP",)
-    FUNCTION = "merge"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="CLIPMergeSubtract",
+            search_aliases=["clip difference", "text encoder subtract"],
+            category="advanced/model_merging",
+            inputs=[
+                io.Clip.Input("clip1"),
+                io.Clip.Input("clip2"),
+                io.Float.Input("multiplier", default=1.0, min=-10.0, max=10.0, step=0.01),
+            ],
+            outputs=[
+                io.Clip.Output(),
+            ],
+        )

-    CATEGORY = "advanced/model_merging"
-
-    def merge(self, clip1, clip2, multiplier):
+    @classmethod
+    def execute(cls, clip1, clip2, multiplier) -> io.NodeOutput:
        m = clip1.clone()
        kp = clip2.get_key_patches()
        for k in kp:
            if k.endswith(".position_ids") or k.endswith(".logit_scale"):
                continue
            m.add_patches({k: kp[k]}, - multiplier, multiplier)
-        return (m, )
+        return io.NodeOutput(m)
+
+    merge = execute  # TODO: remove


-class CLIPAdd:
-    SEARCH_ALIASES = ["combine clip"]
+class CLIPAdd(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "clip1": ("CLIP",),
-                              "clip2": ("CLIP",),
-                              }}
-    RETURN_TYPES = ("CLIP",)
-    FUNCTION = "merge"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="CLIPMergeAdd",
+            search_aliases=["combine clip"],
+            category="advanced/model_merging",
+            inputs=[
+                io.Clip.Input("clip1"),
+                io.Clip.Input("clip2"),
+            ],
+            outputs=[
+                io.Clip.Output(),
+            ],
+        )

-    CATEGORY = "advanced/model_merging"
-
-    def merge(self, clip1, clip2):
+    @classmethod
+    def execute(cls, clip1, clip2) -> io.NodeOutput:
        m = clip1.clone()
        kp = clip2.get_key_patches()
        for k in kp:
            if k.endswith(".position_ids") or k.endswith(".logit_scale"):
                continue
            m.add_patches({k: kp[k]}, 1.0, 1.0)
-        return (m, )
+        return io.NodeOutput(m)
+
+    merge = execute  # TODO: remove


-class ModelMergeBlocks:
+class ModelMergeBlocks(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model1": ("MODEL",),
-                              "model2": ("MODEL",),
-                              "input": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01}),
-                              "middle": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01}),
-                              "out": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
-                              }}
-    RETURN_TYPES = ("MODEL",)
-    FUNCTION = "merge"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ModelMergeBlocks",
+            category="advanced/model_merging",
+            inputs=[
+                io.Model.Input("model1"),
+                io.Model.Input("model2"),
+                io.Float.Input("input", default=1.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("middle", default=1.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("out", default=1.0, min=0.0, max=1.0, step=0.01),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+        )

-    CATEGORY = "advanced/model_merging"
-
-    def merge(self, model1, model2, **kwargs):
+    @classmethod
+    def execute(cls, model1, model2, **kwargs) -> io.NodeOutput:
        m = model1.clone()
        kp = model2.get_key_patches("diffusion_model.")
        default_ratio = next(iter(kwargs.values()))
@ -165,7 +217,10 @@ class ModelMergeBlocks:
                    last_arg_size = len(arg)

            m.add_patches({k: kp[k]}, 1.0 - ratio, ratio)
-        return (m, )
+        return io.NodeOutput(m)
+
+    merge = execute  # TODO: remove
+

 def save_checkpoint(model, clip=None, vae=None, clip_vision=None, filename_prefix=None, output_dir=None, prompt=None, extra_pnginfo=None):
    full_output_folder, filename, counter, subfolder, filename_prefix = folder_paths.get_save_image_path(filename_prefix, output_dir)
@ -226,59 +281,65 @@ def save_checkpoint(model, clip=None, vae=None, clip_vision=None, filename_prefi

    comfy.sd.save_checkpoint(output_checkpoint, model, clip, vae, clip_vision, metadata=metadata, extra_keys=extra_keys)

-class CheckpointSave:
-    SEARCH_ALIASES = ["save model", "export checkpoint", "merge save"]
-    def __init__(self):
-        self.output_dir = folder_paths.get_output_directory()
+
+class CheckpointSave(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="CheckpointSave",
+            display_name="Save Checkpoint",
+            search_aliases=["save model", "export checkpoint", "merge save"],
+            category="advanced/model_merging",
+            inputs=[
+                io.Model.Input("model"),
+                io.Clip.Input("clip"),
+                io.Vae.Input("vae"),
+                io.String.Input("filename_prefix", default="checkpoints/ComfyUI"),
+            ],
+            hidden=[io.Hidden.prompt, io.Hidden.extra_pnginfo],
+            is_output_node=True,
+        )

    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model": ("MODEL",),
-                              "clip": ("CLIP",),
-                              "vae": ("VAE",),
-                              "filename_prefix": ("STRING", {"default": "checkpoints/ComfyUI"}),},
-                "hidden": {"prompt": "PROMPT", "extra_pnginfo": "EXTRA_PNGINFO"},}
-    RETURN_TYPES = ()
-    FUNCTION = "save"
-    OUTPUT_NODE = True
+    def execute(cls, model, clip, vae, filename_prefix) -> io.NodeOutput:
+        save_checkpoint(model, clip=clip, vae=vae, filename_prefix=filename_prefix, output_dir=folder_paths.get_output_directory(), prompt=cls.hidden.prompt, extra_pnginfo=cls.hidden.extra_pnginfo)
+        return io.NodeOutput()

-    CATEGORY = "advanced/model_merging"
+    save = execute  # TODO: remove

-    def save(self, model, clip, vae, filename_prefix, prompt=None, extra_pnginfo=None):
-        save_checkpoint(model, clip=clip, vae=vae, filename_prefix=filename_prefix, output_dir=self.output_dir, prompt=prompt, extra_pnginfo=extra_pnginfo)
-        return {}

-class CLIPSave:
-    def __init__(self):
-        self.output_dir = folder_paths.get_output_directory()
+class CLIPSave(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="CLIPSave",
+            category="advanced/model_merging",
+            inputs=[
+                io.Clip.Input("clip"),
+                io.String.Input("filename_prefix", default="clip/ComfyUI"),
+            ],
+            hidden=[io.Hidden.prompt, io.Hidden.extra_pnginfo],
+            is_output_node=True,
+        )

    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "clip": ("CLIP",),
-                              "filename_prefix": ("STRING", {"default": "clip/ComfyUI"}),},
-                "hidden": {"prompt": "PROMPT", "extra_pnginfo": "EXTRA_PNGINFO"},}
-    RETURN_TYPES = ()
-    FUNCTION = "save"
-    OUTPUT_NODE = True
-
-    CATEGORY = "advanced/model_merging"
-
-    def save(self, clip, filename_prefix, prompt=None, extra_pnginfo=None):
+    def execute(cls, clip, filename_prefix) -> io.NodeOutput:
        prompt_info = ""
-        if prompt is not None:
-            prompt_info = json.dumps(prompt)
+        if cls.hidden.prompt is not None:
+            prompt_info = json.dumps(cls.hidden.prompt)

        metadata = {}
        if not args.disable_metadata:
            metadata["format"] = "pt"
            metadata["prompt"] = prompt_info
-            if extra_pnginfo is not None:
-                for x in extra_pnginfo:
-                    metadata[x] = json.dumps(extra_pnginfo[x])
+            if cls.hidden.extra_pnginfo is not None:
+                for x in cls.hidden.extra_pnginfo:
+                    metadata[x] = json.dumps(cls.hidden.extra_pnginfo[x])

        comfy.model_management.load_models_gpu([clip.load_model()], force_patch_weights=True)
        clip_sd = clip.get_sd()

+        output_dir = folder_paths.get_output_directory()
        for prefix in ["clip_l.", "clip_g.", "clip_h.", "t5xxl.", "pile_t5xl.", "mt5xl.", "umt5xxl.", "t5base.", "gemma2_2b.", "llama.", "hydit_clip.", ""]:
            k = list(filter(lambda a: a.startswith(prefix), clip_sd.keys()))
            current_clip_sd = {}
@ -295,7 +356,7 @@ class CLIPSave:
                replace_prefix[prefix] = ""
            replace_prefix["transformer."] = ""

-            full_output_folder, filename, counter, subfolder, filename_prefix_ = folder_paths.get_save_image_path(filename_prefix_, self.output_dir)
+            full_output_folder, filename, counter, subfolder, filename_prefix_ = folder_paths.get_save_image_path(filename_prefix_, output_dir)

            output_checkpoint = f"{filename}_{counter:05}_.safetensors"
            output_checkpoint = os.path.join(full_output_folder, output_checkpoint)
@ -303,76 +364,88 @@ class CLIPSave:
            current_clip_sd = comfy.utils.state_dict_prefix_replace(current_clip_sd, replace_prefix)

            comfy.utils.save_torch_file(current_clip_sd, output_checkpoint, metadata=metadata)
-        return {}
+        return io.NodeOutput()

-class VAESave:
-    def __init__(self):
-        self.output_dir = folder_paths.get_output_directory()
+    save = execute  # TODO: remove
+
+
+class VAESave(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="VAESave",
+            category="advanced/model_merging",
+            inputs=[
+                io.Vae.Input("vae"),
+                io.String.Input("filename_prefix", default="vae/ComfyUI_vae"),
+            ],
+            hidden=[io.Hidden.prompt, io.Hidden.extra_pnginfo],
+            is_output_node=True,
+        )

    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "vae": ("VAE",),
-                              "filename_prefix": ("STRING", {"default": "vae/ComfyUI_vae"}),},
-                "hidden": {"prompt": "PROMPT", "extra_pnginfo": "EXTRA_PNGINFO"},}
-    RETURN_TYPES = ()
-    FUNCTION = "save"
-    OUTPUT_NODE = True
-
-    CATEGORY = "advanced/model_merging"
-
-    def save(self, vae, filename_prefix, prompt=None, extra_pnginfo=None):
-        full_output_folder, filename, counter, subfolder, filename_prefix = folder_paths.get_save_image_path(filename_prefix, self.output_dir)
+    def execute(cls, vae, filename_prefix) -> io.NodeOutput:
+        full_output_folder, filename, counter, subfolder, filename_prefix = folder_paths.get_save_image_path(filename_prefix, folder_paths.get_output_directory())
        prompt_info = ""
-        if prompt is not None:
-            prompt_info = json.dumps(prompt)
+        if cls.hidden.prompt is not None:
+            prompt_info = json.dumps(cls.hidden.prompt)

        metadata = {}
        if not args.disable_metadata:
            metadata["prompt"] = prompt_info
-            if extra_pnginfo is not None:
-                for x in extra_pnginfo:
-                    metadata[x] = json.dumps(extra_pnginfo[x])
+            if cls.hidden.extra_pnginfo is not None:
+                for x in cls.hidden.extra_pnginfo:
+                    metadata[x] = json.dumps(cls.hidden.extra_pnginfo[x])

        output_checkpoint = f"{filename}_{counter:05}_.safetensors"
        output_checkpoint = os.path.join(full_output_folder, output_checkpoint)

        comfy.utils.save_torch_file(vae.get_sd(), output_checkpoint, metadata=metadata)
-        return {}
+        return io.NodeOutput()

-class ModelSave:
-    SEARCH_ALIASES = ["export model", "checkpoint save"]
-    def __init__(self):
-        self.output_dir = folder_paths.get_output_directory()
+    save = execute  # TODO: remove
+
+
+class ModelSave(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ModelSave",
+            search_aliases=["export model", "checkpoint save"],
+            category="advanced/model_merging",
+            inputs=[
+                io.Model.Input("model"),
+                io.String.Input("filename_prefix", default="diffusion_models/ComfyUI"),
+            ],
+            hidden=[io.Hidden.prompt, io.Hidden.extra_pnginfo],
+            is_output_node=True,
+        )

    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model": ("MODEL",),
-                              "filename_prefix": ("STRING", {"default": "diffusion_models/ComfyUI"}),},
-                "hidden": {"prompt": "PROMPT", "extra_pnginfo": "EXTRA_PNGINFO"},}
-    RETURN_TYPES = ()
-    FUNCTION = "save"
-    OUTPUT_NODE = True
+    def execute(cls, model, filename_prefix) -> io.NodeOutput:
+        save_checkpoint(model, filename_prefix=filename_prefix, output_dir=folder_paths.get_output_directory(), prompt=cls.hidden.prompt, extra_pnginfo=cls.hidden.extra_pnginfo)
+        return io.NodeOutput()

-    CATEGORY = "advanced/model_merging"
+    save = execute  # TODO: remove

-    def save(self, model, filename_prefix, prompt=None, extra_pnginfo=None):
-        save_checkpoint(model, filename_prefix=filename_prefix, output_dir=self.output_dir, prompt=prompt, extra_pnginfo=extra_pnginfo)
-        return {}

-NODE_CLASS_MAPPINGS = {
-    "ModelMergeSimple": ModelMergeSimple,
-    "ModelMergeBlocks": ModelMergeBlocks,
-    "ModelMergeSubtract": ModelSubtract,
-    "ModelMergeAdd": ModelAdd,
-    "CheckpointSave": CheckpointSave,
-    "CLIPMergeSimple": CLIPMergeSimple,
-    "CLIPMergeSubtract": CLIPSubtract,
-    "CLIPMergeAdd": CLIPAdd,
-    "CLIPSave": CLIPSave,
-    "VAESave": VAESave,
-    "ModelSave": ModelSave,
-}
+class ModelMergingExtension(ComfyExtension):
+    @override
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        return [
+            ModelMergeSimple,
+            ModelMergeBlocks,
+            ModelSubtract,
+            ModelAdd,
+            CheckpointSave,
+            CLIPMergeSimple,
+            CLIPSubtract,
+            CLIPAdd,
+            CLIPSave,
+            VAESave,
+            ModelSave,
+        ]

-NODE_DISPLAY_NAME_MAPPINGS = {
-    "CheckpointSave": "Save Checkpoint",
-}
+
+async def comfy_entrypoint() -> ModelMergingExtension:
+    return ModelMergingExtension()
--- a/comfy_extras/nodes_model_merging_model_specific.py
+++ b/comfy_extras/nodes_model_merging_model_specific.py
@ -1,356 +1,455 @@
 import comfy_extras.nodes_model_merging

+from comfy_api.latest import io, ComfyExtension
+from typing_extensions import override
+
+
 class ModelMergeSD1(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["time_embed."] = argument
-        arg_dict["label_emb."] = argument
+        inputs.append(io.Float.Input("time_embed.", **argument))
+        inputs.append(io.Float.Input("label_emb.", **argument))

        for i in range(12):
-            arg_dict["input_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("input_blocks.{}.".format(i), **argument))

        for i in range(3):
-            arg_dict["middle_block.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("middle_block.{}.".format(i), **argument))

        for i in range(12):
-            arg_dict["output_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("output_blocks.{}.".format(i), **argument))

-        arg_dict["out."] = argument
+        inputs.append(io.Float.Input("out.", **argument))

-        return {"required": arg_dict}
+        return io.Schema(
+            node_id="ModelMergeSD1",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )
+
+
+class ModelMergeSD2(ModelMergeSD1):
+    # SD1 and SD2 have the same blocks
+    @classmethod
+    def define_schema(cls):
+        schema = ModelMergeSD1.define_schema()
+        schema.node_id = "ModelMergeSD2"
+        return schema


 class ModelMergeSDXL(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["time_embed."] = argument
-        arg_dict["label_emb."] = argument
+        inputs.append(io.Float.Input("time_embed.", **argument))
+        inputs.append(io.Float.Input("label_emb.", **argument))

        for i in range(9):
-            arg_dict["input_blocks.{}".format(i)] = argument
+            inputs.append(io.Float.Input("input_blocks.{}".format(i), **argument))

        for i in range(3):
-            arg_dict["middle_block.{}".format(i)] = argument
+            inputs.append(io.Float.Input("middle_block.{}".format(i), **argument))

        for i in range(9):
-            arg_dict["output_blocks.{}".format(i)] = argument
+            inputs.append(io.Float.Input("output_blocks.{}".format(i), **argument))

-        arg_dict["out."] = argument
+        inputs.append(io.Float.Input("out.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeSDXL",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeSD3_2B(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["pos_embed."] = argument
-        arg_dict["x_embedder."] = argument
-        arg_dict["context_embedder."] = argument
-        arg_dict["y_embedder."] = argument
-        arg_dict["t_embedder."] = argument
+        inputs.append(io.Float.Input("pos_embed.", **argument))
+        inputs.append(io.Float.Input("x_embedder.", **argument))
+        inputs.append(io.Float.Input("context_embedder.", **argument))
+        inputs.append(io.Float.Input("y_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))

        for i in range(24):
-            arg_dict["joint_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("joint_blocks.{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))

-        return {"required": arg_dict}
+        return io.Schema(
+            node_id="ModelMergeSD3_2B",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )


 class ModelMergeAuraflow(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["init_x_linear."] = argument
-        arg_dict["positional_encoding"] = argument
-        arg_dict["cond_seq_linear."] = argument
-        arg_dict["register_tokens"] = argument
-        arg_dict["t_embedder."] = argument
+        inputs.append(io.Float.Input("init_x_linear.", **argument))
+        inputs.append(io.Float.Input("positional_encoding", **argument))
+        inputs.append(io.Float.Input("cond_seq_linear.", **argument))
+        inputs.append(io.Float.Input("register_tokens", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))

        for i in range(4):
-            arg_dict["double_layers.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("double_layers.{}.".format(i), **argument))

        for i in range(32):
-            arg_dict["single_layers.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("single_layers.{}.".format(i), **argument))

-        arg_dict["modF."] = argument
-        arg_dict["final_linear."] = argument
+        inputs.append(io.Float.Input("modF.", **argument))
+        inputs.append(io.Float.Input("final_linear.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeAuraflow",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeFlux1(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["img_in."] = argument
-        arg_dict["time_in."] = argument
-        arg_dict["guidance_in"] = argument
-        arg_dict["vector_in."] = argument
-        arg_dict["txt_in."] = argument
+        inputs.append(io.Float.Input("img_in.", **argument))
+        inputs.append(io.Float.Input("time_in.", **argument))
+        inputs.append(io.Float.Input("guidance_in", **argument))
+        inputs.append(io.Float.Input("vector_in.", **argument))
+        inputs.append(io.Float.Input("txt_in.", **argument))

        for i in range(19):
-            arg_dict["double_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("double_blocks.{}.".format(i), **argument))

        for i in range(38):
-            arg_dict["single_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("single_blocks.{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeFlux1",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeSD35_Large(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["pos_embed."] = argument
-        arg_dict["x_embedder."] = argument
-        arg_dict["context_embedder."] = argument
-        arg_dict["y_embedder."] = argument
-        arg_dict["t_embedder."] = argument
+        inputs.append(io.Float.Input("pos_embed.", **argument))
+        inputs.append(io.Float.Input("x_embedder.", **argument))
+        inputs.append(io.Float.Input("context_embedder.", **argument))
+        inputs.append(io.Float.Input("y_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))

        for i in range(38):
-            arg_dict["joint_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("joint_blocks.{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeSD35_Large",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeMochiPreview(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["pos_frequencies."] = argument
-        arg_dict["t_embedder."] = argument
-        arg_dict["t5_y_embedder."] = argument
-        arg_dict["t5_yproj."] = argument
+        inputs.append(io.Float.Input("pos_frequencies.", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))
+        inputs.append(io.Float.Input("t5_y_embedder.", **argument))
+        inputs.append(io.Float.Input("t5_yproj.", **argument))

        for i in range(48):
-            arg_dict["blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("blocks.{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeMochiPreview",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeLTXV(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["patchify_proj."] = argument
-        arg_dict["adaln_single."] = argument
-        arg_dict["caption_projection."] = argument
+        inputs.append(io.Float.Input("patchify_proj.", **argument))
+        inputs.append(io.Float.Input("adaln_single.", **argument))
+        inputs.append(io.Float.Input("caption_projection.", **argument))

        for i in range(28):
-            arg_dict["transformer_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("transformer_blocks.{}.".format(i), **argument))

-        arg_dict["scale_shift_table"] = argument
-        arg_dict["proj_out."] = argument
+        inputs.append(io.Float.Input("scale_shift_table", **argument))
+        inputs.append(io.Float.Input("proj_out.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeLTXV",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeCosmos7B(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
-
-        arg_dict["pos_embedder."] = argument
-        arg_dict["extra_pos_embedder."] = argument
-        arg_dict["x_embedder."] = argument
-        arg_dict["t_embedder."] = argument
-        arg_dict["affline_norm."] = argument
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

+        inputs.append(io.Float.Input("pos_embedder.", **argument))
+        inputs.append(io.Float.Input("extra_pos_embedder.", **argument))
+        inputs.append(io.Float.Input("x_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))
+        inputs.append(io.Float.Input("affline_norm.", **argument))

        for i in range(28):
-            arg_dict["blocks.block{}.".format(i)] = argument
+            inputs.append(io.Float.Input("blocks.block{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeCosmos7B",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeCosmos14B(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
-
-        arg_dict["pos_embedder."] = argument
-        arg_dict["extra_pos_embedder."] = argument
-        arg_dict["x_embedder."] = argument
-        arg_dict["t_embedder."] = argument
-        arg_dict["affline_norm."] = argument
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

+        inputs.append(io.Float.Input("pos_embedder.", **argument))
+        inputs.append(io.Float.Input("extra_pos_embedder.", **argument))
+        inputs.append(io.Float.Input("x_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))
+        inputs.append(io.Float.Input("affline_norm.", **argument))

        for i in range(36):
-            arg_dict["blocks.block{}.".format(i)] = argument
+            inputs.append(io.Float.Input("blocks.block{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeCosmos14B",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeWAN2_1(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-    DESCRIPTION = "1.3B model has 30 blocks, 14B model has 40 blocks. Image to video model has the extra img_emb."
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["patch_embedding."] = argument
-        arg_dict["time_embedding."] = argument
-        arg_dict["time_projection."] = argument
-        arg_dict["text_embedding."] = argument
-        arg_dict["img_emb."] = argument
+        inputs.append(io.Float.Input("patch_embedding.", **argument))
+        inputs.append(io.Float.Input("time_embedding.", **argument))
+        inputs.append(io.Float.Input("time_projection.", **argument))
+        inputs.append(io.Float.Input("text_embedding.", **argument))
+        inputs.append(io.Float.Input("img_emb.", **argument))

        for i in range(40):
-            arg_dict["blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("blocks.{}.".format(i), **argument))

-        arg_dict["head."] = argument
+        inputs.append(io.Float.Input("head.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeWAN2_1",
+            category="advanced/model_merging/model_specific",
+            description="1.3B model has 30 blocks, 14B model has 40 blocks. Image to video model has the extra img_emb.",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeCosmosPredict2_2B(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
-
-        arg_dict["pos_embedder."] = argument
-        arg_dict["x_embedder."] = argument
-        arg_dict["t_embedder."] = argument
-        arg_dict["t_embedding_norm."] = argument
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

+        inputs.append(io.Float.Input("pos_embedder.", **argument))
+        inputs.append(io.Float.Input("x_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedding_norm.", **argument))

        for i in range(28):
-            arg_dict["blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("blocks.{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeCosmosPredict2_2B",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeCosmosPredict2_14B(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
-
-        arg_dict["pos_embedder."] = argument
-        arg_dict["x_embedder."] = argument
-        arg_dict["t_embedder."] = argument
-        arg_dict["t_embedding_norm."] = argument
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

+        inputs.append(io.Float.Input("pos_embedder.", **argument))
+        inputs.append(io.Float.Input("x_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedder.", **argument))
+        inputs.append(io.Float.Input("t_embedding_norm.", **argument))

        for i in range(36):
-            arg_dict["blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("blocks.{}.".format(i), **argument))

-        arg_dict["final_layer."] = argument
+        inputs.append(io.Float.Input("final_layer.", **argument))
+
+        return io.Schema(
+            node_id="ModelMergeCosmosPredict2_14B",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-        return {"required": arg_dict}

 class ModelMergeQwenImage(comfy_extras.nodes_model_merging.ModelMergeBlocks):
-    CATEGORY = "advanced/model_merging/model_specific"
-
    @classmethod
-    def INPUT_TYPES(s):
-        arg_dict = { "model1": ("MODEL",),
-                              "model2": ("MODEL",)}
+    def define_schema(cls):
+        inputs = [
+            io.Model.Input("model1"),
+            io.Model.Input("model2"),
+        ]

-        argument = ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01})
+        argument = dict(default=1.0, min=0.0, max=1.0, step=0.01)

-        arg_dict["pos_embeds."] = argument
-        arg_dict["img_in."] = argument
-        arg_dict["txt_norm."] = argument
-        arg_dict["txt_in."] = argument
-        arg_dict["time_text_embed."] = argument
+        inputs.append(io.Float.Input("pos_embeds.", **argument))
+        inputs.append(io.Float.Input("img_in.", **argument))
+        inputs.append(io.Float.Input("txt_norm.", **argument))
+        inputs.append(io.Float.Input("txt_in.", **argument))
+        inputs.append(io.Float.Input("time_text_embed.", **argument))

        for i in range(60):
-            arg_dict["transformer_blocks.{}.".format(i)] = argument
+            inputs.append(io.Float.Input("transformer_blocks.{}.".format(i), **argument))

-        arg_dict["proj_out."] = argument
+        inputs.append(io.Float.Input("proj_out.", **argument))

-        return {"required": arg_dict}
+        return io.Schema(
+            node_id="ModelMergeQwenImage",
+            category="advanced/model_merging/model_specific",
+            inputs=inputs,
+            outputs=[io.Model.Output()],
+        )

-NODE_CLASS_MAPPINGS = {
-    "ModelMergeSD1": ModelMergeSD1,
-    "ModelMergeSD2": ModelMergeSD1, #SD1 and SD2 have the same blocks
-    "ModelMergeSDXL": ModelMergeSDXL,
-    "ModelMergeSD3_2B": ModelMergeSD3_2B,
-    "ModelMergeAuraflow": ModelMergeAuraflow,
-    "ModelMergeFlux1": ModelMergeFlux1,
-    "ModelMergeSD35_Large": ModelMergeSD35_Large,
-    "ModelMergeMochiPreview": ModelMergeMochiPreview,
-    "ModelMergeLTXV": ModelMergeLTXV,
-    "ModelMergeCosmos7B": ModelMergeCosmos7B,
-    "ModelMergeCosmos14B": ModelMergeCosmos14B,
-    "ModelMergeWAN2_1": ModelMergeWAN2_1,
-    "ModelMergeCosmosPredict2_2B": ModelMergeCosmosPredict2_2B,
-    "ModelMergeCosmosPredict2_14B": ModelMergeCosmosPredict2_14B,
-    "ModelMergeQwenImage": ModelMergeQwenImage,
-}
+
+class ModelMergingModelSpecificExtension(ComfyExtension):
+    @override
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        return [
+            ModelMergeSD1,
+            ModelMergeSD2,
+            ModelMergeSDXL,
+            ModelMergeSD3_2B,
+            ModelMergeAuraflow,
+            ModelMergeFlux1,
+            ModelMergeSD35_Large,
+            ModelMergeMochiPreview,
+            ModelMergeLTXV,
+            ModelMergeCosmos7B,
+            ModelMergeCosmos14B,
+            ModelMergeWAN2_1,
+            ModelMergeCosmosPredict2_2B,
+            ModelMergeCosmosPredict2_14B,
+            ModelMergeQwenImage,
+        ]
+
+
+async def comfy_entrypoint() -> ModelMergingModelSpecificExtension:
+    return ModelMergingModelSpecificExtension()
--- a/comfy_extras/nodes_sdpose.py
+++ b/comfy_extras/nodes_sdpose.py
@ -459,27 +459,23 @@ class SDPoseKeypointExtractor(io.ComfyNode):
        total_images = image.shape[0]
        captured_feat = None

-        model_h = int(head.heatmap_size[0]) * 4   # e.g. 192 * 4 = 768
-        model_w = int(head.heatmap_size[1]) * 4   # e.g. 256 * 4 = 1024
+        model_w = int(head.heatmap_size[0]) * 4   # 192 * 4 = 768
+        model_h = int(head.heatmap_size[1]) * 4   # 256 * 4 = 1024

        def _resize_to_model(imgs):
-            """Aspect-preserving resize + zero-pad BHWC images to (model_h, model_w). Returns (resized_bhwc, scale, pad_top, pad_left)."""
+            """Stretch BHWC images to (model_h, model_w), model expects no aspect preservation."""
            h, w = imgs.shape[-3], imgs.shape[-2]
-            scale = min(model_h / h, model_w / w)
-            sh, sw = int(round(h * scale)), int(round(w * scale))
-            pt, pl = (model_h - sh) // 2, (model_w - sw) // 2
+            method = "area" if (model_h <= h and model_w <= w) else "bilinear"
            chw = imgs.permute(0, 3, 1, 2).float()
-            scaled = comfy.utils.common_upscale(chw, sw, sh, upscale_method="bilinear", crop="disabled")
-            padded = torch.zeros(scaled.shape[0], scaled.shape[1], model_h, model_w, dtype=scaled.dtype, device=scaled.device)
-            padded[:, :, pt:pt + sh, pl:pl + sw] = scaled
-            return padded.permute(0, 2, 3, 1), scale, pt, pl
+            scaled = comfy.utils.common_upscale(chw, model_w, model_h, upscale_method=method, crop="disabled")
+            return scaled.permute(0, 2, 3, 1), model_w / w, model_h / h

-        def _remap_keypoints(kp, scale, pad_top, pad_left, offset_x=0, offset_y=0):
+        def _remap_keypoints(kp, scale_x, scale_y, offset_x=0, offset_y=0):
            """Remap keypoints from model space back to original image space."""
            kp = kp.copy() if isinstance(kp, np.ndarray) else np.array(kp, dtype=np.float32)
            invalid = kp[..., 0] < 0
-            kp[..., 0] = (kp[..., 0] - pad_left) / scale + offset_x
-            kp[..., 1] = (kp[..., 1] - pad_top)  / scale + offset_y
+            kp[..., 0] = kp[..., 0] / scale_x + offset_x
+            kp[..., 1] = kp[..., 1] / scale_y + offset_y
            kp[invalid] = -1
            return kp

@ -529,18 +525,18 @@ class SDPoseKeypointExtractor(io.ComfyNode):
                            continue

                        crop = img[:, y1:y2, x1:x2, :]  # (1, crop_h, crop_w, C)
-                        crop_resized, scale, pad_top, pad_left = _resize_to_model(crop)
+                        crop_resized, sx, sy = _resize_to_model(crop)

                        latent_crop = vae.encode(crop_resized)
                        kp_batch, sc_batch = _run_on_latent(latent_crop)
-                        kp = _remap_keypoints(kp_batch[0], scale, pad_top, pad_left, x1, y1)
+                        kp = _remap_keypoints(kp_batch[0], sx, sy, x1, y1)
                        img_keypoints.append(kp)
                        img_scores.append(sc_batch[0])
                else:
-                    img_resized, scale, pad_top, pad_left = _resize_to_model(img)
+                    img_resized, sx, sy = _resize_to_model(img)
                    latent_img = vae.encode(img_resized)
                    kp_batch, sc_batch = _run_on_latent(latent_img)
-                    img_keypoints.append(_remap_keypoints(kp_batch[0], scale, pad_top, pad_left))
+                    img_keypoints.append(_remap_keypoints(kp_batch[0], sx, sy))
                    img_scores.append(sc_batch[0])

                all_keypoints.append(img_keypoints)
@ -549,12 +545,12 @@ class SDPoseKeypointExtractor(io.ComfyNode):

        else: # full-image mode, batched
            for batch_start in tqdm(range(0, total_images, batch_size), desc="Extracting keypoints"):
-                batch_resized, scale, pad_top, pad_left = _resize_to_model(image[batch_start:batch_start + batch_size])
+                batch_resized, sx, sy = _resize_to_model(image[batch_start:batch_start + batch_size])
                latent_batch = vae.encode(batch_resized)
                kp_batch, sc_batch = _run_on_latent(latent_batch)

                for kp, sc in zip(kp_batch, sc_batch):
-                    all_keypoints.append([_remap_keypoints(kp, scale, pad_top, pad_left)])
+                    all_keypoints.append([_remap_keypoints(kp, sx, sy)])
                    all_scores.append([sc])

                pbar.update(len(kp_batch))
@ -727,13 +723,13 @@ class CropByBBoxes(io.ComfyNode):
                scale = min(output_width / crop_w, output_height / crop_h)
                scaled_w = int(round(crop_w * scale))
                scaled_h = int(round(crop_h * scale))
-                scaled = comfy.utils.common_upscale(crop_chw, scaled_w, scaled_h, upscale_method="bilinear", crop="disabled")
+                scaled = comfy.utils.common_upscale(crop_chw, scaled_w, scaled_h, upscale_method="area", crop="disabled")
                pad_left = (output_width  - scaled_w) // 2
                pad_top  = (output_height - scaled_h) // 2
                resized = torch.zeros(1, num_ch, output_height, output_width, dtype=image.dtype, device=image.device)
                resized[:, :, pad_top:pad_top + scaled_h, pad_left:pad_left + scaled_w] = scaled
            else:  # "stretch"
-                resized = comfy.utils.common_upscale(crop_chw, output_width, output_height, upscale_method="bilinear", crop="disabled")
+                resized = comfy.utils.common_upscale(crop_chw, output_width, output_height, upscale_method="area", crop="disabled")
            crops.append(resized)

        if not crops:
--- a/comfy_extras/nodes_video_model.py
+++ b/comfy_extras/nodes_video_model.py
@ -6,44 +6,62 @@ import folder_paths
 import comfy_extras.nodes_model_merging
 import node_helpers

+from comfy_api.latest import io, ComfyExtension
+from typing_extensions import override

-class ImageOnlyCheckpointLoader:
+
+class ImageOnlyCheckpointLoader(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "ckpt_name": (folder_paths.get_filename_list("checkpoints"), ),
-                             }}
-    RETURN_TYPES = ("MODEL", "CLIP_VISION", "VAE")
-    FUNCTION = "load_checkpoint"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ImageOnlyCheckpointLoader",
+            display_name="Image Only Checkpoint Loader (img2vid model)",
+            category="loaders/video_models",
+            inputs=[
+                io.Combo.Input("ckpt_name", options=folder_paths.get_filename_list("checkpoints")),
+            ],
+            outputs=[
+                io.Model.Output(),
+                io.ClipVision.Output(),
+                io.Vae.Output(),
+            ],
+        )

-    CATEGORY = "loaders/video_models"
-
-    def load_checkpoint(self, ckpt_name, output_vae=True, output_clip=True):
+    @classmethod
+    def execute(cls, ckpt_name, output_vae=True, output_clip=True) -> io.NodeOutput:
        ckpt_path = folder_paths.get_full_path_or_raise("checkpoints", ckpt_name)
        out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=False, output_clipvision=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
-        return (out[0], out[3], out[2])
+        return io.NodeOutput(out[0], out[3], out[2])
+
+    load_checkpoint = execute  # TODO: remove


-class SVD_img2vid_Conditioning:
+class SVD_img2vid_Conditioning(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "clip_vision": ("CLIP_VISION",),
-                              "init_image": ("IMAGE",),
-                              "vae": ("VAE",),
-                              "width": ("INT", {"default": 1024, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 8}),
-                              "height": ("INT", {"default": 576, "min": 16, "max": nodes.MAX_RESOLUTION, "step": 8}),
-                              "video_frames": ("INT", {"default": 14, "min": 1, "max": 4096}),
-                              "motion_bucket_id": ("INT", {"default": 127, "min": 1, "max": 1023, "advanced": True}),
-                              "fps": ("INT", {"default": 6, "min": 1, "max": 1024}),
-                              "augmentation_level": ("FLOAT", {"default": 0.0, "min": 0.0, "max": 10.0, "step": 0.01, "advanced": True})
-                             }}
-    RETURN_TYPES = ("CONDITIONING", "CONDITIONING", "LATENT")
-    RETURN_NAMES = ("positive", "negative", "latent")
+    def define_schema(cls):
+        return io.Schema(
+            node_id="SVD_img2vid_Conditioning",
+            category="conditioning/video_models",
+            inputs=[
+                io.ClipVision.Input("clip_vision"),
+                io.Image.Input("init_image"),
+                io.Vae.Input("vae"),
+                io.Int.Input("width", default=1024, min=16, max=nodes.MAX_RESOLUTION, step=8),
+                io.Int.Input("height", default=576, min=16, max=nodes.MAX_RESOLUTION, step=8),
+                io.Int.Input("video_frames", default=14, min=1, max=4096),
+                io.Int.Input("motion_bucket_id", default=127, min=1, max=1023, advanced=True),
+                io.Int.Input("fps", default=6, min=1, max=1024),
+                io.Float.Input("augmentation_level", default=0.0, min=0.0, max=10.0, step=0.01, advanced=True),
+            ],
+            outputs=[
+                io.Conditioning.Output(display_name="positive"),
+                io.Conditioning.Output(display_name="negative"),
+                io.Latent.Output(display_name="latent"),
+            ],
+        )

-    FUNCTION = "encode"
-
-    CATEGORY = "conditioning/video_models"
-
-    def encode(self, clip_vision, init_image, vae, width, height, video_frames, motion_bucket_id, fps, augmentation_level):
+    @classmethod
+    def execute(cls, clip_vision, init_image, vae, width, height, video_frames, motion_bucket_id, fps, augmentation_level) -> io.NodeOutput:
        output = clip_vision.encode_image(init_image)
        pooled = output.image_embeds.unsqueeze(0)
        pixels = comfy.utils.common_upscale(init_image.movedim(-1,1), width, height, "bilinear", "center").movedim(1,-1)
@ -54,20 +72,28 @@ class SVD_img2vid_Conditioning:
        positive = [[pooled, {"motion_bucket_id": motion_bucket_id, "fps": fps, "augmentation_level": augmentation_level, "concat_latent_image": t}]]
        negative = [[torch.zeros_like(pooled), {"motion_bucket_id": motion_bucket_id, "fps": fps, "augmentation_level": augmentation_level, "concat_latent_image": torch.zeros_like(t)}]]
        latent = torch.zeros([video_frames, 4, height // 8, width // 8])
-        return (positive, negative, {"samples":latent})
+        return io.NodeOutput(positive, negative, {"samples":latent})

-class VideoLinearCFGGuidance:
+    encode = execute  # TODO: remove
+
+
+class VideoLinearCFGGuidance(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model": ("MODEL",),
-                              "min_cfg": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 100.0, "step":0.5, "round": 0.01, "advanced": True}),
-                              }}
-    RETURN_TYPES = ("MODEL",)
-    FUNCTION = "patch"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="VideoLinearCFGGuidance",
+            category="sampling/video_models",
+            inputs=[
+                io.Model.Input("model"),
+                io.Float.Input("min_cfg", default=1.0, min=0.0, max=100.0, step=0.5, round=0.01, advanced=True),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+        )

-    CATEGORY = "sampling/video_models"
-
-    def patch(self, model, min_cfg):
+    @classmethod
+    def execute(cls, model, min_cfg) -> io.NodeOutput:
        def linear_cfg(args):
            cond = args["cond"]
            uncond = args["uncond"]
@ -78,20 +104,28 @@ class VideoLinearCFGGuidance:

        m = model.clone()
        m.set_model_sampler_cfg_function(linear_cfg)
-        return (m, )
+        return io.NodeOutput(m)

-class VideoTriangleCFGGuidance:
+    patch = execute  # TODO: remove
+
+
+class VideoTriangleCFGGuidance(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model": ("MODEL",),
-                              "min_cfg": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 100.0, "step":0.5, "round": 0.01, "advanced": True}),
-                              }}
-    RETURN_TYPES = ("MODEL",)
-    FUNCTION = "patch"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="VideoTriangleCFGGuidance",
+            category="sampling/video_models",
+            inputs=[
+                io.Model.Input("model"),
+                io.Float.Input("min_cfg", default=1.0, min=0.0, max=100.0, step=0.5, round=0.01, advanced=True),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+        )

-    CATEGORY = "sampling/video_models"
-
-    def patch(self, model, min_cfg):
+    @classmethod
+    def execute(cls, model, min_cfg) -> io.NodeOutput:
        def linear_cfg(args):
            cond = args["cond"]
            uncond = args["uncond"]
@ -105,57 +139,79 @@ class VideoTriangleCFGGuidance:

        m = model.clone()
        m.set_model_sampler_cfg_function(linear_cfg)
-        return (m, )
+        return io.NodeOutput(m)

-class ImageOnlyCheckpointSave(comfy_extras.nodes_model_merging.CheckpointSave):
-    CATEGORY = "advanced/model_merging"
+    patch = execute  # TODO: remove
+
+
+class ImageOnlyCheckpointSave(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ImageOnlyCheckpointSave",
+            search_aliases=["save model", "export checkpoint", "merge save"],
+            category="advanced/model_merging",
+            inputs=[
+                io.Model.Input("model"),
+                io.ClipVision.Input("clip_vision"),
+                io.Vae.Input("vae"),
+                io.String.Input("filename_prefix", default="checkpoints/ComfyUI"),
+            ],
+            hidden=[io.Hidden.prompt, io.Hidden.extra_pnginfo],
+            is_output_node=True,
+        )

    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": { "model": ("MODEL",),
-                              "clip_vision": ("CLIP_VISION",),
-                              "vae": ("VAE",),
-                              "filename_prefix": ("STRING", {"default": "checkpoints/ComfyUI"}),},
-                "hidden": {"prompt": "PROMPT", "extra_pnginfo": "EXTRA_PNGINFO"},}
+    def execute(cls, model, clip_vision, vae, filename_prefix) -> io.NodeOutput:
+        comfy_extras.nodes_model_merging.save_checkpoint(model, clip_vision=clip_vision, vae=vae, filename_prefix=filename_prefix, output_dir=folder_paths.get_output_directory(), prompt=cls.hidden.prompt, extra_pnginfo=cls.hidden.extra_pnginfo)
+        return io.NodeOutput()

-    def save(self, model, clip_vision, vae, filename_prefix, prompt=None, extra_pnginfo=None):
-        comfy_extras.nodes_model_merging.save_checkpoint(model, clip_vision=clip_vision, vae=vae, filename_prefix=filename_prefix, output_dir=self.output_dir, prompt=prompt, extra_pnginfo=extra_pnginfo)
-        return {}
+    save = execute  # TODO: remove


-class ConditioningSetAreaPercentageVideo:
+class ConditioningSetAreaPercentageVideo(io.ComfyNode):
    @classmethod
-    def INPUT_TYPES(s):
-        return {"required": {"conditioning": ("CONDITIONING", ),
-                             "width": ("FLOAT", {"default": 1.0, "min": 0, "max": 1.0, "step": 0.01}),
-                             "height": ("FLOAT", {"default": 1.0, "min": 0, "max": 1.0, "step": 0.01}),
-                             "temporal": ("FLOAT", {"default": 1.0, "min": 0, "max": 1.0, "step": 0.01}),
-                             "x": ("FLOAT", {"default": 0, "min": 0, "max": 1.0, "step": 0.01}),
-                             "y": ("FLOAT", {"default": 0, "min": 0, "max": 1.0, "step": 0.01}),
-                             "z": ("FLOAT", {"default": 0, "min": 0, "max": 1.0, "step": 0.01}),
-                             "strength": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 10.0, "step": 0.01}),
-                             }}
-    RETURN_TYPES = ("CONDITIONING",)
-    FUNCTION = "append"
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ConditioningSetAreaPercentageVideo",
+            category="conditioning",
+            inputs=[
+                io.Conditioning.Input("conditioning"),
+                io.Float.Input("width", default=1.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("height", default=1.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("temporal", default=1.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("x", default=0.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("y", default=0.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("z", default=0.0, min=0.0, max=1.0, step=0.01),
+                io.Float.Input("strength", default=1.0, min=0.0, max=10.0, step=0.01),
+            ],
+            outputs=[
+                io.Conditioning.Output(),
+            ],
+        )

-    CATEGORY = "conditioning"
-
-    def append(self, conditioning, width, height, temporal, x, y, z, strength):
+    @classmethod
+    def execute(cls, conditioning, width, height, temporal, x, y, z, strength) -> io.NodeOutput:
        c = node_helpers.conditioning_set_values(conditioning, {"area": ("percentage", temporal, height, width, z, y, x),
                                                                "strength": strength,
                                                                "set_area_to_bounds": False})
-        return (c, )
+        return io.NodeOutput(c)
+
+    append = execute  # TODO: remove


-NODE_CLASS_MAPPINGS = {
-    "ImageOnlyCheckpointLoader": ImageOnlyCheckpointLoader,
-    "SVD_img2vid_Conditioning": SVD_img2vid_Conditioning,
-    "VideoLinearCFGGuidance": VideoLinearCFGGuidance,
-    "VideoTriangleCFGGuidance": VideoTriangleCFGGuidance,
-    "ImageOnlyCheckpointSave": ImageOnlyCheckpointSave,
-    "ConditioningSetAreaPercentageVideo": ConditioningSetAreaPercentageVideo,
-}
+class VideoModelExtension(ComfyExtension):
+    @override
+    async def get_node_list(self) -> list[type[io.ComfyNode]]:
+        return [
+            ImageOnlyCheckpointLoader,
+            SVD_img2vid_Conditioning,
+            VideoLinearCFGGuidance,
+            VideoTriangleCFGGuidance,
+            ImageOnlyCheckpointSave,
+            ConditioningSetAreaPercentageVideo,
+        ]

-NODE_DISPLAY_NAME_MAPPINGS = {
-    "ImageOnlyCheckpointLoader": "Image Only Checkpoint Loader (img2vid model)",
-}
+
+async def comfy_entrypoint() -> VideoModelExtension:
+    return VideoModelExtension()
--- a/execution.py
+++ b/execution.py
@ -15,6 +15,7 @@ import torch
 from comfy.cli_args import args
 import comfy.memory_management
 import comfy.model_management
+import comfy.model_prefetch
 import comfy_aimdo.model_vbar

 from latent_preview import set_preview_method
@ -537,6 +538,7 @@ async def execute(server, dynprompt, caches, current_item, extra_data, executed,
                    if args.verbose == "DEBUG":
                        comfy_aimdo.control.analyze()
                    comfy.model_management.reset_cast_buffers()
+                    comfy.model_prefetch.cleanup_prefetch_queues()
                    comfy_aimdo.model_vbar.vbars_reset_watermark_limits()

            if has_pending_tasks:
--- a/requirements.txt
+++ b/requirements.txt
@ -1,5 +1,5 @@
 comfyui-frontend-package==1.42.15
-comfyui-workflow-templates==0.9.65
+comfyui-workflow-templates==0.9.66
 comfyui-embedded-docs==0.4.4
 torch
 torchsde
Author	SHA1	Message	Date
Alexander Piskun	fcdf4f2b4f	Merge `f1c07a72c4` into `783782d5d7`	2026-05-03 08:28:16 +09:00
rattus	783782d5d7	Implement block prefetch + Lora Async load + and adopt in LTX (Speedup!) (CORE-111) (#13618 ) * mm: Use Aimdo raw allocator for cast buffers pytorch manages allocation of growing buffers on streams poorly. Pyt has no windows support for the expandable segments allocator (which is the right tool for this job), while also segmenting the memory by stream such that it can be generally re-used. So kick the problem to aimdo which can just grow a virtual region thats freed per stream. * plan * ops: move cpu handler up to the caller * ops: split up prefetch from weight prep block prefetching API Split up the casting and weight formating/lora stuff in prep for arbitrary prefetch support. * ops: implement block prefetching API allow a model to construct a prefetch list and operate it for increased async offload. * ltxv2: Implement block prefetching * Implement lora async offload Implement async offload of loras.	2026-05-02 19:23:24 -04:00
comfyanonymous	3e3ed8cc2a	Add script in AMD portable to launch with dynamic vram. (#13667 ) Some checks are pending Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run Details Execution Tests / test (macos-latest) (push) Waiting to run Details Execution Tests / test (ubuntu-latest) (push) Waiting to run Details Execution Tests / test (windows-latest) (push) Waiting to run Details Test server launches without errors / test (push) Waiting to run Details Unit Tests / test (macos-latest) (push) Waiting to run Details Unit Tests / test (ubuntu-latest) (push) Waiting to run Details Unit Tests / test (windows-2022) (push) Waiting to run Details	2026-05-01 20:19:46 -04:00
comfyanonymous	67f6cb3527	List all the portable downloads in the README section. (#13666 )	2026-05-01 20:19:32 -04:00
Alexis Rolland	0230e0e7cc	Adding kijai (#13664 ) Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>	2026-05-02 06:37:18 +08:00
Jukka Seppänen	b5921c8ac2	SDPose: resize fix (#13656 )	2026-05-01 14:17:25 -07:00
Simon Lui	63103d519e	Remove IPEX and clean up checks and add missing synchronize during empty cache. (#13653 )	2026-05-01 14:16:41 -07:00
Alexander Piskun	cf758bd256	chore(api-nodes): increase default timeout for partner API node tasks (#13663 ) Some checks failed Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run Details Execution Tests / test (macos-latest) (push) Waiting to run Details Execution Tests / test (ubuntu-latest) (push) Waiting to run Details Execution Tests / test (windows-latest) (push) Waiting to run Details Test server launches without errors / test (push) Waiting to run Details Unit Tests / test (macos-latest) (push) Waiting to run Details Unit Tests / test (ubuntu-latest) (push) Waiting to run Details Unit Tests / test (windows-2022) (push) Waiting to run Details Build package / Build Test (3.10) (push) Has been cancelled Details Build package / Build Test (3.11) (push) Has been cancelled Details Build package / Build Test (3.12) (push) Has been cancelled Details Build package / Build Test (3.13) (push) Has been cancelled Details Build package / Build Test (3.14) (push) Has been cancelled Details Signed-off-by: bigcat88 <bigcat88@icloud.com> Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>	2026-05-01 12:48:41 -07:00
Daxiong (Lin)	10b45a71cd	chore: update workflow templates to v0.9.66 (#13662 ) Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>	2026-05-01 12:11:30 -07:00
bigcat88	f1c07a72c4	convert model_merging and video_model nodes to V3 schema Some checks failed Python Linting / Run Ruff (push) Has been cancelled Details Python Linting / Run Pylint (push) Has been cancelled Details Build package / Build Test (3.10) (push) Has been cancelled Details Build package / Build Test (3.11) (push) Has been cancelled Details Build package / Build Test (3.12) (push) Has been cancelled Details Build package / Build Test (3.13) (push) Has been cancelled Details Build package / Build Test (3.14) (push) Has been cancelled Details	2026-03-11 12:25:59 +02:00