Merge branch 'master' into worksplit-multigpu

SelectXDevice: address code-review follow-ups
True reset semantics for "default": - On first selector application, cache the loader's original load_device / offload_device on the underlying model object (which is shared across patcher clones) and restore those base values when the user picks "default". Previously "default" meant "passthrough" so SelectXDevice(gpu:1) -> SelectXDevice(default) silently kept the gpu:1 routing. CPU + dynamic VRAM: - When SelectModelDevice / SelectCLIPDevice resolves to CPU on a ModelPatcherDynamic, also call clone(disable_dynamic=True) so the result is a plain ModelPatcher, matching ModelPatcherDynamic.__new__'s intent that CPU loads never run through the dynamic path. Fallback to the regular dynamic clone if disable_dynamic is unsupported on that patcher. MultiGPU collision pruning: - After SelectModelDevice retargets the primary patcher, drop any multigpu clone (from a prior MultiGPU CFG Split) whose load_device now matches the primary; otherwise two patchers would be bound to the same device. Logs the prune at info level. SelectVAEDevice: reject CPU at runtime: - The UI uses get_gpu_device_options_no_cpu(), but a workflow opened from another machine could still pass "cpu" through validate_inputs. Detect that case explicitly, log a "CPU is not a supported choice" passthrough message, and leave the VAE unchanged. Cosmetic: - Update VAE node docstring to accurately reflect the runtime CPU rejection rather than the older "intentionally not offered" claim. - Demote the fallback warnings inside resolve_gpu_device_option to no log at all; the Select*Device nodes now own a single context-rich info-level message per failed lookup, so there is no double logging. Amp-Thread-ID: https://ampcode.com/threads/T-019e52b4-31ee-72cd-996b-64ecd9420e13 Co-authored-by: Amp <amp@ampcode.com>
2026-06-19 14:29:33 +08:00 · 2026-05-22 23:05:58 -07:00 · 2026-05-22 22:29:45 -07:00 · 2026-05-22 21:50:29 -07:00 · 2026-05-22 21:46:07 -07:00 · 2026-05-22 21:39:18 -07:00
25 changed files with 4708 additions and 1003 deletions
--- a/.github/workflows/backport_release.yaml
+++ b/.github/workflows/backport_release.yaml
@ -0,0 +1,519 @@
+name: Backport Release
+
+on:
+  workflow_dispatch:
+    inputs:
+      commit:
+        description: 'Full 40-char SHA of the tip commit of the backport source branch (the PR head commit that passed tests). The branch is resolved from this SHA and must be unique.'
+        required: true
+        type: string
+
+permissions:
+  contents: read
+  pull-requests: read
+  checks: read
+
+jobs:
+  backport-release:
+    name: Create backport release
+    runs-on: ubuntu-latest
+    environment: backport release
+
+    steps:
+      - name: Generate GitHub App token
+        id: app-token
+        uses: actions/create-github-app-token@bcd2ba49218906704ab6c1aa796996da409d3eb1
+        with:
+          app-id: ${{ secrets.FEN_RELEASE_APP_ID }}
+          private-key: ${{ secrets.FEN_RELEASE_PRIVATE_KEY }}
+
+      - name: Checkout repository
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
+        with:
+          token: ${{ steps.app-token.outputs.token }}
+          fetch-depth: 0
+          fetch-tags: true
+
+      - name: Configure git
+        run: |
+          git config user.name  "fen-release[bot]"
+          git config user.email "fen-release[bot]@users.noreply.github.com"
+
+      - name: Resolve source branch from commit SHA
+        id: resolve
+        env:
+          SOURCE_COMMIT:  ${{ inputs.commit }}
+          DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
+        run: |
+          set -euo pipefail
+
+          # Require a full 40-char lowercase-hex SHA. Short SHAs are ambiguous
+          # and we will be comparing this value against API responses (PR head
+          # SHA, ref tips) that always return the full form.
+          if [[ ! "${SOURCE_COMMIT}" =~ ^[0-9a-f]{40}$ ]]; then
+            echo "::error::Input commit '${SOURCE_COMMIT}' is not a full 40-char lowercase hex SHA."
+            exit 1
+          fi
+
+          # Fetch all remote branches so we can search for which one(s) point
+          # at this SHA. `actions/checkout` with fetch-depth: 0 fetches full
+          # history of the checked-out ref but does not necessarily populate
+          # every refs/remotes/origin/*, so do it explicitly.
+          git fetch --prune origin '+refs/heads/*:refs/remotes/origin/*'
+
+          # Verify the commit actually exists in this repo's object DB.
+          if ! git cat-file -e "${SOURCE_COMMIT}^{commit}" 2>/dev/null; then
+            echo "::error::Commit ${SOURCE_COMMIT} was not found in the repository."
+            exit 1
+          fi
+
+          # Find every remote branch whose tip == SOURCE_COMMIT. Exactly one
+          # branch must point at it. If zero, the commit isn't anyone's tip
+          # (likely stale, force-pushed past, or never the PR head). If more
+          # than one, the (branch -> SHA) mapping is ambiguous and we refuse
+          # to guess — the operator must give us a unique branch to release.
+          mapfile -t matching_branches < <(
+            git for-each-ref \
+              --format='%(refname:strip=3)' \
+              --points-at="${SOURCE_COMMIT}" \
+              refs/remotes/origin/ \
+              | grep -vx 'HEAD' || true
+          )
+
+          if [[ "${#matching_branches[@]}" -eq 0 ]]; then
+            echo "::error::No branch on origin has ${SOURCE_COMMIT} as its tip."
+            echo "::error::Either the branch was updated after you copied this SHA, or this commit was never the head of a branch."
+            exit 1
+          fi
+
+          if [[ "${#matching_branches[@]}" -gt 1 ]]; then
+            echo "::error::More than one branch on origin has ${SOURCE_COMMIT} as its tip; cannot pick one:"
+            for b in "${matching_branches[@]}"; do
+              echo "::error::  - ${b}"
+            done
+            echo "::error::Refusing to proceed with an ambiguous source branch."
+            exit 1
+          fi
+
+          source_branch="${matching_branches[0]}"
+
+          if [[ "${source_branch}" == "${DEFAULT_BRANCH}" ]]; then
+            echo "::error::Source branch must not be the default branch ('${DEFAULT_BRANCH}')."
+            exit 1
+          fi
+
+          echo "Resolved commit ${SOURCE_COMMIT} to branch '${source_branch}'."
+          echo "source_branch=${source_branch}" >> "$GITHUB_OUTPUT"
+
+      - name: Determine latest stable release
+        id: latest
+        env:
+          GH_TOKEN: ${{ steps.app-token.outputs.token }}
+        run: |
+          set -euo pipefail
+
+          # List all tags matching vMAJOR.MINOR.PATCH and pick the highest by numeric
+          # comparison of each component. We DO NOT use `sort -V` because it treats
+          # v0.19.99 as higher than v0.20.1.
+          latest_tag="$(
+            git tag --list 'v[0-9]*.[0-9]*.[0-9]*' \
+              | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+$' \
+              | awk -F'[v.]' '{ printf "%010d %010d %010d %s\n", $2, $3, $4, $0 }' \
+              | sort -k1,1n -k2,2n -k3,3n \
+              | tail -n1 \
+              | awk '{print $4}'
+          )"
+
+          if [[ -z "${latest_tag}" ]]; then
+            echo "::error::No stable release tags (vMAJOR.MINOR.PATCH) were found."
+            exit 1
+          fi
+
+          # Parse components
+          ver="${latest_tag#v}"
+          major="${ver%%.*}"
+          rest="${ver#*.}"
+          minor="${rest%%.*}"
+          patch="${rest#*.}"
+
+          new_patch=$((patch + 1))
+          new_version="v${major}.${minor}.${new_patch}"
+          release_branch="release/v${major}.${minor}"
+
+          latest_sha="$(git rev-list -n 1 "refs/tags/${latest_tag}")"
+
+          echo "latest_tag=${latest_tag}"             >> "$GITHUB_OUTPUT"
+          echo "latest_sha=${latest_sha}"             >> "$GITHUB_OUTPUT"
+          echo "major=${major}"                       >> "$GITHUB_OUTPUT"
+          echo "minor=${minor}"                       >> "$GITHUB_OUTPUT"
+          echo "patch=${patch}"                       >> "$GITHUB_OUTPUT"
+          echo "new_version=${new_version}"           >> "$GITHUB_OUTPUT"
+          echo "new_version_no_v=${major}.${minor}.${new_patch}" >> "$GITHUB_OUTPUT"
+          echo "release_branch=${release_branch}"     >> "$GITHUB_OUTPUT"
+
+          echo "Latest stable release: ${latest_tag} (${latest_sha})"
+          echo "New version will be:   ${new_version}"
+          echo "Release branch:        ${release_branch}"
+
+      - name: Validate source branch is cut directly from the latest stable release
+        env:
+          SOURCE_BRANCH:   ${{ steps.resolve.outputs.source_branch }}
+          SOURCE_COMMIT:   ${{ inputs.commit }}
+          LATEST_TAG_SHA:  ${{ steps.latest.outputs.latest_sha }}
+          LATEST_TAG:      ${{ steps.latest.outputs.latest_tag }}
+        run: |
+          set -euo pipefail
+
+          # Use the user-provided SHA directly rather than re-resolving the branch
+          # tip — the resolve step already proved the branch tip equals SOURCE_COMMIT,
+          # and pinning to the SHA here makes the rest of the job TOCTOU-safe against
+          # someone pushing to the branch mid-run.
+          source_sha="${SOURCE_COMMIT}"
+
+          # Walking first-parent from the source tip must reach LATEST_TAG_SHA.
+          # We capture rev-list into a variable and grep against a here-string
+          # rather than piping `rev-list | grep -q`: under `set -o pipefail`,
+          # `grep -q` would exit on first match and SIGPIPE the still-streaming
+          # `rev-list`, propagating exit 141 as a spurious "not found".
+          first_parent_chain="$(git rev-list --first-parent "${source_sha}")"
+          if ! grep -Fxq "${LATEST_TAG_SHA}" <<< "${first_parent_chain}"; then
+            echo "::error::Source branch '${SOURCE_BRANCH}' is not cut from '${LATEST_TAG}'."
+            echo "::error::Its first-parent history does not include ${LATEST_TAG_SHA}."
+            exit 1
+          fi
+
+          # Additionally, every commit added on top of the tag (the set we are
+          # about to publish) must itself be a descendant of the tag along
+          # first-parent — i.e. no sibling commits from master sneak in via a
+          # non-first-parent path. Enforce by requiring that the symmetric
+          # difference is empty in one direction: commits in source that are
+          # NOT first-parent-reachable from source starting at the tag.
+          # We do this by intersecting:
+          #   A = commits reachable from source but not from tag (full DAG)
+          #   B = commits on the first-parent chain from source down to tag
+          # and requiring A == B.
+          all_added="$(git rev-list "${LATEST_TAG_SHA}..${source_sha}" | sort)"
+          first_parent_added="$(
+            git rev-list --first-parent "${LATEST_TAG_SHA}..${source_sha}" | sort
+          )"
+
+          if [[ "${all_added}" != "${first_parent_added}" ]]; then
+            echo "::error::Source branch '${SOURCE_BRANCH}' contains commits not on its first-parent chain from '${LATEST_TAG}'."
+            echo "::error::This usually means the branch was cut from master (not from the tag) or contains a merge from master."
+            echo "Commits reachable but not on first-parent chain:"
+            comm -23 <(printf '%s\n' "${all_added}") <(printf '%s\n' "${first_parent_added}") \
+              | while read -r sha; do
+                  echo "  $(git log -1 --format='%h %s' "${sha}")"
+                done
+            exit 1
+          fi
+
+          added_count="$(printf '%s\n' "${all_added}" | grep -c . || true)"
+          echo "Source branch is cut directly from ${LATEST_TAG} with ${added_count} commit(s) on top."
+
+      - name: Validate PR exists, is open, named correctly, has latest commit, and checks pass
+        env:
+          GH_TOKEN:      ${{ steps.app-token.outputs.token }}
+          SOURCE_BRANCH: ${{ steps.resolve.outputs.source_branch }}
+          SOURCE_COMMIT: ${{ inputs.commit }}
+          NEW_VERSION:   ${{ steps.latest.outputs.new_version }}
+          REPO:          ${{ github.repository }}
+        run: |
+          set -euo pipefail
+
+          expected_title="ComfyUI backport release ${NEW_VERSION}"
+
+          # Find open PRs from this branch into master. The --state open filter
+          # is load-bearing: a closed/merged PR with passing checks must not be
+          # accepted as authorization for a new release.
+          pr_json="$(
+            gh pr list \
+              --repo "${REPO}" \
+              --state open \
+              --head "${SOURCE_BRANCH}" \
+              --base master \
+              --json number,title,headRefOid,state \
+              --limit 10
+          )"
+
+          pr_count="$(echo "${pr_json}" | jq 'length')"
+          if [[ "${pr_count}" -eq 0 ]]; then
+            echo "::error::No open PR found from '${SOURCE_BRANCH}' into 'master'. The PR must exist and be open."
+            exit 1
+          fi
+
+          # Pick the PR matching the expected title
+          pr_number="$(echo "${pr_json}" | jq -r --arg t "${expected_title}" '
+            map(select(.title == $t)) | .[0].number // empty
+          ')"
+          pr_head_sha="$(echo "${pr_json}" | jq -r --arg t "${expected_title}" '
+            map(select(.title == $t)) | .[0].headRefOid // empty
+          ')"
+
+          if [[ -z "${pr_number}" ]]; then
+            echo "::error::No open PR from '${SOURCE_BRANCH}' into 'master' is titled '${expected_title}'."
+            echo "Found PRs:"
+            echo "${pr_json}" | jq -r '.[] | "  #\(.number): \(.title)"'
+            exit 1
+          fi
+
+          # The PR's current head commit must equal the SHA the operator gave us.
+          # This is what closes the door on releasing stale code: if anyone has
+          # pushed to the branch since the operator validated tests passed, the
+          # PR head will have advanced past SOURCE_COMMIT and we abort. (The
+          # resolve step already proved the branch tip == SOURCE_COMMIT; this
+          # ties that same SHA to the PR that authorizes the release.)
+          if [[ "${pr_head_sha}" != "${SOURCE_COMMIT}" ]]; then
+            echo "::error::PR #${pr_number} head commit is ${pr_head_sha}, but the operator-provided commit is ${SOURCE_COMMIT}."
+            echo "::error::The PR has new commits since this release was authorized. Re-run with the new head SHA after verifying its checks."
+            exit 1
+          fi
+
+          echo "Found open PR #${pr_number} titled '${expected_title}' at head ${pr_head_sha} (matches operator-provided commit)."
+
+          # Verify all check runs on the head commit have completed successfully.
+          # A check is considered passing if conclusion is success, neutral, or skipped.
+          checks_json="$(
+            gh api \
+              --paginate \
+              "repos/${REPO}/commits/${pr_head_sha}/check-runs" \
+              --jq '.check_runs[] | {name: .name, status: .status, conclusion: .conclusion}'
+          )"
+
+          if [[ -z "${checks_json}" ]]; then
+            echo "::error::No check runs found on PR head commit ${pr_head_sha}."
+            exit 1
+          fi
+
+          echo "Check runs on ${pr_head_sha}:"
+          echo "${checks_json}" | jq -s '.'
+
+          failing="$(echo "${checks_json}" | jq -s '
+            map(select(
+              .status != "completed"
+              or (.conclusion as $c
+                  | ["success","neutral","skipped"]
+                  | index($c) | not)
+            ))
+          ')"
+
+          failing_count="$(echo "${failing}" | jq 'length')"
+          if [[ "${failing_count}" -gt 0 ]]; then
+            echo "::error::One or more checks have not passed on PR head commit ${pr_head_sha}:"
+            echo "${failing}" | jq -r '.[] | "  - \(.name): status=\(.status) conclusion=\(.conclusion)"'
+            exit 1
+          fi
+
+          echo "All checks have passed on ${pr_head_sha}."
+
+      - name: Prepare release branch
+        id: prepare
+        env:
+          GH_TOKEN:        ${{ steps.app-token.outputs.token }}
+          REPO:            ${{ github.repository }}
+          RELEASE_BRANCH:  ${{ steps.latest.outputs.release_branch }}
+          LATEST_TAG:      ${{ steps.latest.outputs.latest_tag }}
+          LATEST_TAG_SHA:  ${{ steps.latest.outputs.latest_sha }}
+          PATCH:           ${{ steps.latest.outputs.patch }}
+        run: |
+          set -euo pipefail
+
+          # Try to fetch the release branch. If patch == 0, it shouldn't exist yet
+          # and we'll create it from the latest stable tag. If patch > 0, it must
+          # already exist and its tip must equal the latest stable tag commit (i.e.
+          # the previous patch release).
+          if git ls-remote --exit-code --heads origin "${RELEASE_BRANCH}" >/dev/null 2>&1; then
+            echo "Release branch '${RELEASE_BRANCH}' already exists on origin."
+            git fetch origin "refs/heads/${RELEASE_BRANCH}:refs/remotes/origin/${RELEASE_BRANCH}"
+            git checkout -B "${RELEASE_BRANCH}" "refs/remotes/origin/${RELEASE_BRANCH}"
+
+            current_tip="$(git rev-parse HEAD)"
+            if [[ "${current_tip}" != "${LATEST_TAG_SHA}" ]]; then
+              echo "::error::Release branch '${RELEASE_BRANCH}' tip (${current_tip}) is not at the latest stable release '${LATEST_TAG}' (${LATEST_TAG_SHA})."
+              echo "::error::Refusing to release on top of a divergent branch."
+              exit 1
+            fi
+            echo "branch_existed=true" >> "$GITHUB_OUTPUT"
+          else
+            if [[ "${PATCH}" != "0" ]]; then
+              echo "::error::Release branch '${RELEASE_BRANCH}' does not exist on origin, but the latest stable release '${LATEST_TAG}' has patch=${PATCH} (>0). This is inconsistent."
+              exit 1
+            fi
+            echo "Release branch '${RELEASE_BRANCH}' does not exist. Creating from ${LATEST_TAG}."
+            git checkout -B "${RELEASE_BRANCH}" "refs/tags/${LATEST_TAG}"
+            echo "branch_existed=false" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Fast-forward merge source branch into release branch
+        env:
+          SOURCE_BRANCH:  ${{ steps.resolve.outputs.source_branch }}
+          SOURCE_COMMIT:  ${{ inputs.commit }}
+          RELEASE_BRANCH: ${{ steps.latest.outputs.release_branch }}
+        run: |
+          set -euo pipefail
+
+          # --ff-only guarantees no merge commit is created. If a fast-forward is
+          # not possible (i.e. the release branch has commits the source branch
+          # doesn't), the merge will fail and we abort. Because we already validated
+          # that the source branch is rooted on the latest stable tag, and the
+          # release branch tip equals that same tag, this fast-forward should
+          # always succeed for a well-formed backport branch.
+          #
+          # We merge the operator-provided SHA, not the branch ref, so a push to
+          # the branch in the window between resolve and now cannot smuggle new
+          # commits into the release.
+          if ! git merge --ff-only "${SOURCE_COMMIT}"; then
+            echo "::error::Cannot fast-forward '${RELEASE_BRANCH}' to ${SOURCE_COMMIT} (tip of '${SOURCE_BRANCH}'). A merge commit would be required. Aborting."
+            exit 1
+          fi
+
+          echo "Fast-forwarded '${RELEASE_BRANCH}' to ${SOURCE_COMMIT} (tip of '${SOURCE_BRANCH}')."
+
+      - name: Bump version files
+        env:
+          NEW_VERSION_NO_V: ${{ steps.latest.outputs.new_version_no_v }}
+        run: |
+          set -euo pipefail
+
+          if [[ ! -f comfyui_version.py ]]; then
+            echo "::error::comfyui_version.py not found in repo root."
+            exit 1
+          fi
+          if [[ ! -f pyproject.toml ]]; then
+            echo "::error::pyproject.toml not found in repo root."
+            exit 1
+          fi
+
+          # Replace the version string in comfyui_version.py.
+          # Expected format:  __version__ = "X.Y.Z"
+          python3 - "$NEW_VERSION_NO_V" <<'PY'
+          import re, sys, pathlib
+          new = sys.argv[1]
+
+          p = pathlib.Path("comfyui_version.py")
+          src = p.read_text()
+          new_src, n = re.subn(
+              r'(__version__\s*=\s*[\'"])[^\'"]+([\'"])',
+              lambda m: f'{m.group(1)}{new}{m.group(2)}',
+              src,
+              count=1,
+          )
+          if n != 1:
+              sys.exit("Could not find __version__ assignment in comfyui_version.py")
+          p.write_text(new_src)
+
+          p = pathlib.Path("pyproject.toml")
+          src = p.read_text()
+          # Replace the first `version = "..."` inside [project] or [tool.poetry].
+          new_src, n = re.subn(
+              r'(?m)^(version\s*=\s*")[^"]+(")',
+              lambda m: f'{m.group(1)}{new}{m.group(2)}',
+              src,
+              count=1,
+          )
+          if n != 1:
+              sys.exit("Could not find version assignment in pyproject.toml")
+          p.write_text(new_src)
+          PY
+
+          echo "Updated version to ${NEW_VERSION_NO_V} in comfyui_version.py and pyproject.toml."
+          git --no-pager diff -- comfyui_version.py pyproject.toml
+
+      - name: Commit version bump and tag release
+        env:
+          NEW_VERSION: ${{ steps.latest.outputs.new_version }}
+        run: |
+          set -euo pipefail
+
+          git add comfyui_version.py pyproject.toml
+          git commit -m "ComfyUI ${NEW_VERSION}"
+
+          if git rev-parse -q --verify "refs/tags/${NEW_VERSION}" >/dev/null; then
+            echo "::error::Tag ${NEW_VERSION} already exists locally."
+            exit 1
+          fi
+          git tag "${NEW_VERSION}"
+
+      - name: Verify tag does not already exist on origin
+        env:
+          NEW_VERSION: ${{ steps.latest.outputs.new_version }}
+        run: |
+          set -euo pipefail
+          if git ls-remote --exit-code --tags origin "refs/tags/${NEW_VERSION}" >/dev/null 2>&1; then
+            echo "::error::Tag ${NEW_VERSION} already exists on origin. Aborting."
+            exit 1
+          fi
+
+      - name: Push release branch and tag
+        env:
+          RELEASE_BRANCH: ${{ steps.latest.outputs.release_branch }}
+          NEW_VERSION:    ${{ steps.latest.outputs.new_version }}
+        run: |
+          set -euo pipefail
+
+          # Push the branch first, then the tag. Atomic-ish: if the branch push
+          # fails we never publish the tag.
+          git push origin "refs/heads/${RELEASE_BRANCH}:refs/heads/${RELEASE_BRANCH}"
+          git push origin "refs/tags/${NEW_VERSION}"
+
+          echo "Released ${NEW_VERSION} on ${RELEASE_BRANCH}."
+
+      - name: Delete remote source branch
+        env:
+          GH_TOKEN:        ${{ steps.app-token.outputs.token }}
+          REPO:            ${{ github.repository }}
+          SOURCE_BRANCH:   ${{ steps.resolve.outputs.source_branch }}
+          SOURCE_COMMIT:   ${{ inputs.commit }}
+          RELEASE_BRANCH:  ${{ steps.latest.outputs.release_branch }}
+          DEFAULT_BRANCH:  ${{ github.event.repository.default_branch }}
+        run: |
+          set -euo pipefail
+
+          # Belt-and-braces: the resolve step already refuses the default branch,
+          # but never delete the default or the release branch under any
+          # circumstances.
+          if [[ "${SOURCE_BRANCH}" == "${DEFAULT_BRANCH}" || "${SOURCE_BRANCH}" == "${RELEASE_BRANCH}" ]]; then
+            echo "::error::Refusing to delete '${SOURCE_BRANCH}' (matches default or release branch)."
+            exit 1
+          fi
+
+          # Delete the source branch on origin, but only if its tip is still the
+          # SHA we released from. If someone pushed new commits to it after we
+          # resolved it, leave it alone — those commits would be silently lost.
+          current_tip="$(git ls-remote origin "refs/heads/${SOURCE_BRANCH}" | awk '{print $1}')"
+          if [[ -z "${current_tip}" ]]; then
+            echo "Source branch '${SOURCE_BRANCH}' no longer exists on origin; nothing to delete."
+            exit 0
+          fi
+          if [[ "${current_tip}" != "${SOURCE_COMMIT}" ]]; then
+            echo "::warning::Source branch '${SOURCE_BRANCH}' tip (${current_tip}) no longer matches released commit (${SOURCE_COMMIT}). Leaving it in place."
+            exit 0
+          fi
+
+          git push origin --delete "refs/heads/${SOURCE_BRANCH}"
+          echo "Deleted remote branch '${SOURCE_BRANCH}'."
+
+      - name: Summary
+        if: always()
+        env:
+          NEW_VERSION:    ${{ steps.latest.outputs.new_version }}
+          RELEASE_BRANCH: ${{ steps.latest.outputs.release_branch }}
+          LATEST_TAG:     ${{ steps.latest.outputs.latest_tag }}
+          SOURCE_BRANCH:  ${{ steps.resolve.outputs.source_branch }}
+          SOURCE_COMMIT:  ${{ inputs.commit }}
+        run: |
+          # SOURCE_BRANCH is empty if the resolve step never produced an output
+          # (e.g. the workflow failed in or before that step). Show a placeholder
+          # in that case so the summary table still renders cleanly.
+          source_branch_display="${SOURCE_BRANCH:-(unresolved)}"
+          {
+            echo "## Backport release"
+            echo ""
+            echo "| Field | Value |"
+            echo "|---|---|"
+            echo "| Source commit | \`${SOURCE_COMMIT}\` |"
+            echo "| Source branch | \`${source_branch_display}\` |"
+            echo "| Previous stable | \`${LATEST_TAG}\` |"
+            echo "| New version | \`${NEW_VERSION}\` |"
+            echo "| Release branch | \`${RELEASE_BRANCH}\` |"
+          } >> "$GITHUB_STEP_SUMMARY"
--- a/README.md
+++ b/README.md
@ -20,7 +20,7 @@
 [website-url]: https://www.comfy.org/
 <!-- Workaround to display total user from https://github.com/badges/shields/issues/4500#issuecomment-2060079995 -->
 [discord-shield]: https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fdiscord.com%2Fapi%2Finvites%2Fcomfyorg%3Fwith_counts%3Dtrue&query=%24.approximate_member_count&logo=discord&logoColor=white&label=Discord&color=green&suffix=%20total
-[discord-url]: https://www.comfy.org/discord
+[discord-url]: https://discord.com/invite/comfyorg
 [twitter-shield]: https://img.shields.io/twitter/follow/ComfyUI
 [twitter-url]: https://x.com/ComfyUI

--- a/app/frontend_management.py
+++ b/app/frontend_management.py
@ -62,6 +62,8 @@ def get_comfy_package_versions():
 def check_comfy_packages_versions():
    """Warn for every comfy* package whose installed version is below requirements.txt."""
    from packaging.version import InvalidVersion, parse as parse_pep440
+    outdated_packages = []
+
    for pkg in get_comfy_package_versions():
        installed_str = pkg["installed"]
        required_str = pkg["required"]
@ -73,19 +75,26 @@ def check_comfy_packages_versions():
            logging.error(f"Failed to check {pkg['name']} version: {e}")
            continue
        if outdated:
-            app.logger.log_startup_warning(
-                f"""
+            outdated_packages.append((pkg["name"], installed_str, required_str))
+        else:
+            logging.info("{} version: {}".format(pkg["name"], installed_str))
+
+    if outdated_packages:
+        package_warnings = "\n".join(
+            f"Installed {name} version {installed} is lower than the recommended version {required}."
+            for name, installed, required in outdated_packages
+        )
+        app.logger.log_startup_warning(
+            f"""
 ________________________________________________________________________
 WARNING WARNING WARNING WARNING WARNING

-Installed {pkg["name"]} version {installed_str} is lower than the recommended version {required_str}.
+{package_warnings}

 {get_missing_requirements_message()}
 ________________________________________________________________________
 """.strip()
-            )
-        else:
-            logging.info("{} version: {}".format(pkg["name"], installed_str))
+        )


 REQUEST_TIMEOUT = 10  # seconds
--- a/comfy/memory_management.py
+++ b/comfy/memory_management.py
@ -1,6 +1,5 @@
 import math
 import ctypes
-import threading
 import dataclasses
 import torch
 from typing import NamedTuple
@ -10,7 +9,7 @@ from comfy.quant_ops import QuantizedTensor

 class TensorFileSlice(NamedTuple):
    file_ref: object
-    thread_id: int
+    lock: object
    offset: int
    size: int

@ -43,7 +42,6 @@ def read_tensor_file_slice_into(tensor, destination, stream=None, destination2=N
    file_obj = info.file_ref
    if (destination.device.type != "cpu"
            or file_obj is None
-            or threading.get_ident() != info.thread_id
            or destination.numel() * destination.element_size() < info.size
            or tensor.numel() * tensor.element_size() != info.size
            or tensor.storage_offset() != 0
@ -57,27 +55,29 @@ def read_tensor_file_slice_into(tensor, destination, stream=None, destination2=N
    if hostbuf is not None:
        stream_ptr = getattr(stream, "cuda_stream", 0) if stream is not None else 0
        device_ptr = destination2.data_ptr() if destination2 is not None else 0
-        hostbuf.read_file_slice(file_obj, info.offset, info.size,
-                                offset=destination.data_ptr() - hostbuf.get_raw_address(),
-                                stream=stream_ptr,
-                                device_ptr=device_ptr,
-                                device=None if destination2 is None else destination2.device.index)
+        with info.lock:
+            hostbuf.read_file_slice(file_obj, info.offset, info.size,
+                                    offset=destination.data_ptr() - hostbuf.get_raw_address(),
+                                    stream=stream_ptr,
+                                    device_ptr=device_ptr,
+                                    device=None if destination2 is None else destination2.device.index)
        return True

    buf_type = ctypes.c_ubyte * info.size
    view = memoryview(buf_type.from_address(destination.data_ptr()))

    try:
-        file_obj.seek(info.offset)
-        done = 0
-        while done < info.size:
-            try:
-                n = file_obj.readinto(view[done:])
-            except OSError:
-                return False
-            if n <= 0:
-                return False
-            done += n
+        with info.lock:
+            file_obj.seek(info.offset)
+            done = 0
+            while done < info.size:
+                try:
+                    n = file_obj.readinto(view[done:])
+                except OSError:
+                    return False
+                if n <= 0:
+                    return False
+                done += n
        return True
    finally:
        view.release()
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@ -214,7 +214,10 @@ def get_all_torch_devices(exclude_current=False):
    global cpu_state
    devices = []
    if cpu_state == CPUState.GPU:
-        if is_nvidia():
+        # NVIDIA + AMD/ROCm both expose their GPUs through torch.cuda.*;
+        # without the AMD arm, single-GPU ROCm users get an empty list
+        # which silently turns unload_all_models() into a no-op.
+        if is_nvidia() or is_amd():
            for i in range(torch.cuda.device_count()):
                devices.append(torch.device("cuda", i))
        elif is_intel_xpu():
@ -223,6 +226,14 @@ def get_all_torch_devices(exclude_current=False):
        elif is_ascend_npu():
            for i in range(torch.npu.device_count()):
                devices.append(torch.device("npu", i))
+        elif is_mlu():
+            for i in range(torch.mlu.device_count()):
+                devices.append(torch.device("mlu", i))
+        else:
+            # Fallback for unhandled GPU backends (e.g. DirectML): at least
+            # report the current device so callers like unload_all_models()
+            # do not silently no-op.
+            devices.append(get_torch_device())
    else:
        devices.append(get_torch_device())
    if exclude_current:
@ -244,13 +255,23 @@ def get_gpu_device_options():
            options.append(f"gpu:{i}")
    return options

+def get_gpu_device_options_no_cpu():
+    """Variant of get_gpu_device_options that omits "cpu".
+
+    Intended for components like the VAE selector where running on CPU
+    is impractical and should not be offered as a choice.
+    """
+    return [o for o in get_gpu_device_options() if o != "cpu"]
+
 def resolve_gpu_device_option(option: str):
    """Resolve a device option string to a torch.device.

    Returns None for "default" (let the caller use its normal default).
    Returns torch.device("cpu") for "cpu".
-    For "gpu:N", returns the Nth torch device. Falls back to None if
-    the index is out of range (caller should use default).
+    For "gpu:N", returns the Nth torch device. Returns None if the
+    index is out of range, the option string is malformed, or
+    unrecognized (callers are expected to log their own context-rich
+    message before falling back to the default device).
    """
    if option is None or option == "default":
        return None
@ -259,16 +280,11 @@ def resolve_gpu_device_option(option: str):
    if option.startswith("gpu:"):
        try:
            idx = int(option[4:])
-            devices = get_all_torch_devices()
-            if 0 <= idx < len(devices):
-                return devices[idx]
-            else:
-                logging.warning(f"Device '{option}' not available (only {len(devices)} GPU(s)), using default.")
-                return None
-        except (ValueError, IndexError):
-            logging.warning(f"Invalid device option '{option}', using default.")
+        except ValueError:
            return None
-    logging.warning(f"Unrecognized device option '{option}', using default.")
+        devices = get_all_torch_devices()
+        if 0 <= idx < len(devices):
+            return devices[idx]
    return None

@contextmanager
--- a/comfy/model_patcher.py
+++ b/comfy/model_patcher.py
@ -1692,16 +1692,27 @@ class ModelPatcherDynamic(ModelPatcher):
            self.model.dynamic_vbars = {}
        if not hasattr(self.model, "dynamic_pins"):
            self.model.dynamic_pins = {}
-        if self.load_device not in self.model.dynamic_pins:
-            self.model.dynamic_pins[self.load_device] = {
+        self.register_load_device(self.load_device)
+        self.non_dynamic_delegate_model = None
+        assert load_device is not None
+
+    def register_load_device(self, device):
+        """Ensure dynamic_pins has an entry for *device*.
+
+        Called from __init__ and also from any code that retargets an
+        already-constructed patcher to a new load_device (e.g. the
+        Select{Model,CLIP,VAE}Device selector nodes); without this entry
+        partially_unload_ram() raises KeyError when it tries to read the
+        per-device pin state.
+        """
+        if device not in self.model.dynamic_pins:
+            self.model.dynamic_pins[device] = {
                "weights": (comfy_aimdo.host_buffer.HostBuffer(0, 0, 0), [], [-1], [0]),
                "patches": (comfy_aimdo.host_buffer.HostBuffer(0, 0, 0), [], [-1], [0]),
                "hostbufs_initialized": False,
                "failed": False,
                "active": False,
            }
-        self.non_dynamic_delegate_model = None
-        assert load_device is not None

    def is_dynamic(self):
        return True
--- a/comfy/multigpu.py
+++ b/comfy/multigpu.py
@ -1,5 +1,4 @@
 from __future__ import annotations
-import copy
 import queue
 import threading
 import torch
@ -176,87 +175,6 @@ def create_multigpu_deepclones(model: ModelPatcher, max_gpus: int, gpu_options:
    return model


-def create_upscale_model_multigpu_deepclones(upscale_model, max_gpus: int):
-    """Return a shallow copy of ``upscale_model`` with a ``multigpu_clones`` dict of CPU-resident
-    descriptor deepclones, one per extra CUDA device up to ``max_gpus``.
-    """
-    full_extra_devices = comfy.model_management.get_all_torch_devices(exclude_current=True)
-    limit_extra_devices = full_extra_devices[:max_gpus - 1]
-    cloned = copy.copy(upscale_model)
-    existing = getattr(upscale_model, 'multigpu_clones', None)
-    limit_extra_device_set = set(limit_extra_devices)
-    clones: dict[torch.device, object] = {d: c for d, c in dict(existing).items() if d in limit_extra_device_set} if existing else {}
-    if len(limit_extra_devices) == 0:
-        logging.info("No extra torch devices need initialization, skipping initializing MultiGPU upscale clones.")
-        if hasattr(cloned, 'multigpu_clones'):
-            del cloned.multigpu_clones
-        return cloned
-
-    for device in limit_extra_devices:
-        if device in clones:
-            continue
-        clone_source = copy.copy(upscale_model)
-        if hasattr(clone_source, 'multigpu_clones'):
-            del clone_source.multigpu_clones
-        clone_desc = copy.deepcopy(clone_source)
-        clone_desc.model.eval()
-        for p in clone_desc.model.parameters():
-            p.requires_grad_(False)
-        clone_desc.to("cpu")
-        clones[device] = clone_desc
-        logging.info(f"Created CPU upscale_model descriptor deepclone for {device}")
-
-    cloned.multigpu_clones = clones
-    return cloned
-
-
-def create_vae_multigpu_deepclones(vae, max_gpus: int):
-    """Return a shallow copy of ``vae`` with a ``multigpu_clones`` dict of CPU-resident VAE
-    deepclones, one per extra CUDA device up to ``max_gpus``.
-    """
-    vae.throw_exception_if_invalid()
-    vae_device = torch.device(vae.device)
-    cloned = copy.copy(vae)
-    if hasattr(cloned, 'multigpu_clones'):
-        del cloned.multigpu_clones
-    if vae_device.type == "cpu":
-        logging.info("CPU VAE selected, skipping initializing MultiGPU VAE clones.")
-        return cloned
-
-    full_extra_devices = comfy.model_management.get_all_torch_devices()
-
-    def is_vae_device(device):
-        return device.type == vae_device.type and device.index == vae_device.index
-
-    limit_extra_devices = [d for d in full_extra_devices if not is_vae_device(d)][:max_gpus - 1]
-    if len(limit_extra_devices) == 0:
-        logging.info("No extra torch devices need initialization, skipping initializing MultiGPU VAE clones.")
-        return cloned
-
-    existing = getattr(vae, 'multigpu_clones', None)
-    limit_extra_device_set = set(limit_extra_devices)
-    clones: dict[torch.device, object] = {d: c for d, c in dict(existing).items() if d in limit_extra_device_set} if existing else {}
-
-    for device in limit_extra_devices:
-        if device in clones:
-            continue
-        cloned_patcher = vae.patcher.deepclone_multigpu(new_load_device=device)
-        clone_vae = copy.copy(vae)
-        if hasattr(clone_vae, 'multigpu_clones'):
-            del clone_vae.multigpu_clones
-        clone_vae.first_stage_model = cloned_patcher.model
-        clone_vae.patcher = cloned_patcher
-        clone_vae.first_stage_model.eval()
-        for p in clone_vae.first_stage_model.parameters():
-            p.requires_grad_(False)
-        clone_vae.first_stage_model.to("cpu")
-        clones[device] = clone_vae
-        logging.info(f"Created CPU VAE deepclone for {device}")
-
-    cloned.multigpu_clones = clones
-    return cloned
-
-
 LoadBalance = namedtuple('LoadBalance', ['work_per_device', 'idle_time'])
 def load_balance_devices(model_options: dict[str], total_work: int, return_idle_time=False, work_normalized: int=None):
    'Optimize work assigned to different devices, accounting for their relative speeds and splittable work.'
--- a/comfy/samplers.py
+++ b/comfy/samplers.py
@ -275,7 +275,6 @@ def _calc_cond_batch(model: BaseModel, conds: list[list[dict]], x_in: torch.Tens
                input_shape = [len(batch_amount) * first_shape[0]] + list(first_shape)[1:]
                cond_shapes = collections.defaultdict(list)
                for tt in batch_amount:
-                    cond = {k: v.size() for k, v in to_run[tt][0].conditioning.items()}
                    for k, v in to_run[tt][0].conditioning.items():
                        cond_shapes[k].append(v.size())

--- a/comfy/sd.py
+++ b/comfy/sd.py
@ -972,26 +972,6 @@ class VAE:
        pbar = comfy.utils.ProgressBar(steps)

        decode_fn = lambda a: self.first_stage_model.decode(a.to(self.vae_dtype).to(self.device)).to(dtype=self.vae_output_dtype())
-
-        multigpu_clones = getattr(self, 'multigpu_clones', None)
-        if multigpu_clones:
-            functions = {self.device: decode_fn}
-            try:
-                for dev, c in multigpu_clones.items():
-                    model_management.free_memory(c.model_size() + c.memory_used_decode(samples.shape, c.vae_dtype), dev)
-                    c.first_stage_model.to(dev)
-                for dev, c in multigpu_clones.items():
-                    functions[dev] = lambda a, _c=c, _dev=dev: _c.first_stage_model.decode(a.to(_c.vae_dtype).to(_dev)).to(dtype=_c.vae_output_dtype())
-                output = self.process_output(
-                    (comfy.utils.tiled_scale_multidim_multigpu(samples, functions, tile=(tile_y * 2, tile_x // 2), overlap=overlap, upscale_amount=self.upscale_ratio, output_device=self.output_device, pbar=pbar) +
-                     comfy.utils.tiled_scale_multidim_multigpu(samples, functions, tile=(tile_y // 2, tile_x * 2), overlap=overlap, upscale_amount=self.upscale_ratio, output_device=self.output_device, pbar=pbar) +
-                     comfy.utils.tiled_scale_multidim_multigpu(samples, functions, tile=(tile_y, tile_x), overlap=overlap, upscale_amount=self.upscale_ratio, output_device=self.output_device, pbar=pbar))
-                    / 3.0)
-                return output
-            finally:
-                for c in multigpu_clones.values():
-                    c.first_stage_model.to("cpu")
-
        output = self.process_output(
            (comfy.utils.tiled_scale(samples, decode_fn, tile_x // 2, tile_y * 2, overlap, upscale_amount = self.upscale_ratio, output_device=self.output_device, pbar = pbar) +
            comfy.utils.tiled_scale(samples, decode_fn, tile_x * 2, tile_y // 2, overlap, upscale_amount = self.upscale_ratio, output_device=self.output_device, pbar = pbar) +
@ -1001,49 +981,16 @@ class VAE:

    def decode_tiled_1d(self, samples, tile_x=256, overlap=32):
        if samples.ndim == 3:
-            memory_shape = samples.shape
            decode_fn = lambda a: self.first_stage_model.decode(a.to(self.vae_dtype).to(self.device)).to(dtype=self.vae_output_dtype())
-            clone_decode_fn_factory = lambda c, dev: (lambda a: c.first_stage_model.decode(a.to(c.vae_dtype).to(dev)).to(dtype=c.vae_output_dtype()))
        else:
            og_shape = samples.shape
-            memory_shape = og_shape
            samples = samples.reshape((og_shape[0], og_shape[1] * og_shape[2], -1))
            decode_fn = lambda a: self.first_stage_model.decode(a.reshape((-1, og_shape[1], og_shape[2], a.shape[-1])).to(self.vae_dtype).to(self.device)).to(dtype=self.vae_output_dtype())
-            clone_decode_fn_factory = lambda c, dev: (lambda a: c.first_stage_model.decode(a.reshape((-1, og_shape[1], og_shape[2], a.shape[-1])).to(c.vae_dtype).to(dev)).to(dtype=c.vae_output_dtype()))
-
-        multigpu_clones = getattr(self, 'multigpu_clones', None)
-        if multigpu_clones:
-            functions = {self.device: decode_fn}
-            try:
-                for dev, c in multigpu_clones.items():
-                    model_management.free_memory(c.model_size() + c.memory_used_decode(memory_shape, c.vae_dtype), dev)
-                    c.first_stage_model.to(dev)
-                for dev, c in multigpu_clones.items():
-                    functions[dev] = clone_decode_fn_factory(c, dev)
-                return self.process_output(comfy.utils.tiled_scale_multidim_multigpu(samples, functions, tile=(tile_x,), overlap=overlap, upscale_amount=self.upscale_ratio, out_channels=self.output_channels, output_device=self.output_device))
-            finally:
-                for c in multigpu_clones.values():
-                    c.first_stage_model.to("cpu")

        return self.process_output(comfy.utils.tiled_scale_multidim(samples, decode_fn, tile=(tile_x,), overlap=overlap, upscale_amount=self.upscale_ratio, out_channels=self.output_channels, output_device=self.output_device))

    def decode_tiled_3d(self, samples, tile_t=999, tile_x=32, tile_y=32, overlap=(1, 8, 8)):
        decode_fn = lambda a: self.first_stage_model.decode(a.to(self.vae_dtype).to(self.device)).to(dtype=self.vae_output_dtype())
-
-        multigpu_clones = getattr(self, 'multigpu_clones', None)
-        if multigpu_clones:
-            functions = {self.device: decode_fn}
-            try:
-                for dev, c in multigpu_clones.items():
-                    model_management.free_memory(c.model_size() + c.memory_used_decode(samples.shape, c.vae_dtype), dev)
-                    c.first_stage_model.to(dev)
-                for dev, c in multigpu_clones.items():
-                    functions[dev] = lambda a, _c=c, _dev=dev: _c.first_stage_model.decode(a.to(_c.vae_dtype).to(_dev)).to(dtype=_c.vae_output_dtype())
-                return self.process_output(comfy.utils.tiled_scale_multidim_multigpu(samples, functions, tile=(tile_t, tile_x, tile_y), overlap=overlap, upscale_amount=self.upscale_ratio, out_channels=self.output_channels, index_formulas=self.upscale_index_formula, output_device=self.output_device))
-            finally:
-                for c in multigpu_clones.values():
-                    c.first_stage_model.to("cpu")
-
        return self.process_output(comfy.utils.tiled_scale_multidim(samples, decode_fn, tile=(tile_t, tile_x, tile_y), overlap=overlap, upscale_amount=self.upscale_ratio, out_channels=self.output_channels, index_formulas=self.upscale_index_formula, output_device=self.output_device))

    def encode_tiled_(self, pixel_samples, tile_x=512, tile_y=512, overlap = 64):
@ -1053,25 +1000,6 @@ class VAE:
        pbar = comfy.utils.ProgressBar(steps)

        encode_fn = lambda a: self.first_stage_model.encode((self.process_input(a)).to(self.vae_dtype).to(self.device)).to(dtype=self.vae_output_dtype())
-
-        multigpu_clones = getattr(self, 'multigpu_clones', None)
-        if multigpu_clones:
-            functions = {self.device: encode_fn}
-            try:
-                for dev, c in multigpu_clones.items():
-                    model_management.free_memory(c.model_size() + c.memory_used_encode(pixel_samples.shape, c.vae_dtype), dev)
-                    c.first_stage_model.to(dev)
-                for dev, c in multigpu_clones.items():
-                    functions[dev] = lambda a, _c=c, _dev=dev: _c.first_stage_model.encode((_c.process_input(a)).to(_c.vae_dtype).to(_dev)).to(dtype=_c.vae_output_dtype())
-                samples = comfy.utils.tiled_scale_multidim_multigpu(pixel_samples, functions, tile=(tile_y, tile_x), overlap=overlap, upscale_amount=(1/self.downscale_ratio), out_channels=self.latent_channels, output_device=self.output_device, pbar=pbar)
-                samples += comfy.utils.tiled_scale_multidim_multigpu(pixel_samples, functions, tile=(tile_y // 2, tile_x * 2), overlap=overlap, upscale_amount=(1/self.downscale_ratio), out_channels=self.latent_channels, output_device=self.output_device, pbar=pbar)
-                samples += comfy.utils.tiled_scale_multidim_multigpu(pixel_samples, functions, tile=(tile_y * 2, tile_x // 2), overlap=overlap, upscale_amount=(1/self.downscale_ratio), out_channels=self.latent_channels, output_device=self.output_device, pbar=pbar)
-                samples /= 3.0
-                return samples
-            finally:
-                for c in multigpu_clones.values():
-                    c.first_stage_model.to("cpu")
-
        samples = comfy.utils.tiled_scale(pixel_samples, encode_fn, tile_x, tile_y, overlap, upscale_amount = (1/self.downscale_ratio), out_channels=self.latent_channels, output_device=self.output_device, pbar=pbar)
        samples += comfy.utils.tiled_scale(pixel_samples, encode_fn, tile_x * 2, tile_y // 2, overlap, upscale_amount = (1/self.downscale_ratio), out_channels=self.latent_channels, output_device=self.output_device, pbar=pbar)
        samples += comfy.utils.tiled_scale(pixel_samples, encode_fn, tile_x // 2, tile_y * 2, overlap, upscale_amount = (1/self.downscale_ratio), out_channels=self.latent_channels, output_device=self.output_device, pbar=pbar)
@ -1081,7 +1009,6 @@ class VAE:
    def encode_tiled_1d(self, samples, tile_x=256 * 2048, overlap=64 * 2048):
        if self.latent_dim == 1:
            encode_fn = lambda a: self.first_stage_model.encode((self.process_input(a)).to(self.vae_dtype).to(self.device)).to(dtype=self.vae_output_dtype())
-            clone_encode_fn_factory = lambda c, dev: (lambda a: c.first_stage_model.encode((c.process_input(a)).to(c.vae_dtype).to(dev)).to(dtype=c.vae_output_dtype()))
            out_channels = self.latent_channels
            upscale_amount = 1 / self.downscale_ratio
        else:
@ -1091,24 +1018,8 @@ class VAE:
            overlap = overlap // extra_channel_size
            upscale_amount = 1 / self.downscale_ratio
            encode_fn = lambda a: self.first_stage_model.encode((self.process_input(a)).to(self.vae_dtype).to(self.device)).reshape(1, out_channels, -1).to(dtype=self.vae_output_dtype())
-            clone_encode_fn_factory = lambda c, dev: (lambda a: c.first_stage_model.encode((c.process_input(a)).to(c.vae_dtype).to(dev)).reshape(1, out_channels, -1).to(dtype=c.vae_output_dtype()))
-
-        multigpu_clones = getattr(self, 'multigpu_clones', None)
-        if multigpu_clones:
-            functions = {self.device: encode_fn}
-            try:
-                for dev, c in multigpu_clones.items():
-                    model_management.free_memory(c.model_size() + c.memory_used_encode(samples.shape, c.vae_dtype), dev)
-                    c.first_stage_model.to(dev)
-                for dev, c in multigpu_clones.items():
-                    functions[dev] = clone_encode_fn_factory(c, dev)
-                out = comfy.utils.tiled_scale_multidim_multigpu(samples, functions, tile=(tile_x,), overlap=overlap, upscale_amount=upscale_amount, out_channels=out_channels, output_device=self.output_device)
-            finally:
-                for c in multigpu_clones.values():
-                    c.first_stage_model.to("cpu")
-        else:
-            out = comfy.utils.tiled_scale_multidim(samples, encode_fn, tile=(tile_x,), overlap=overlap, upscale_amount=upscale_amount, out_channels=out_channels, output_device=self.output_device)

+        out = comfy.utils.tiled_scale_multidim(samples, encode_fn, tile=(tile_x,), overlap=overlap, upscale_amount=upscale_amount, out_channels=out_channels, output_device=self.output_device)
        if self.latent_dim == 1:
            return out
        else:
@ -1116,21 +1027,6 @@ class VAE:

    def encode_tiled_3d(self, samples, tile_t=9999, tile_x=512, tile_y=512, overlap=(1, 64, 64)):
        encode_fn = lambda a: self.first_stage_model.encode((self.process_input(a)).to(self.vae_dtype).to(self.device)).to(dtype=self.vae_output_dtype())
-
-        multigpu_clones = getattr(self, 'multigpu_clones', None)
-        if multigpu_clones:
-            functions = {self.device: encode_fn}
-            try:
-                for dev, c in multigpu_clones.items():
-                    model_management.free_memory(c.model_size() + c.memory_used_encode(samples.shape, c.vae_dtype), dev)
-                    c.first_stage_model.to(dev)
-                for dev, c in multigpu_clones.items():
-                    functions[dev] = lambda a, _c=c, _dev=dev: _c.first_stage_model.encode((_c.process_input(a)).to(_c.vae_dtype).to(_dev)).to(dtype=_c.vae_output_dtype())
-                return comfy.utils.tiled_scale_multidim_multigpu(samples, functions, tile=(tile_t, tile_x, tile_y), overlap=overlap, upscale_amount=self.downscale_ratio, out_channels=self.latent_channels, downscale=True, index_formulas=self.downscale_index_formula, output_device=self.output_device)
-            finally:
-                for c in multigpu_clones.values():
-                    c.first_stage_model.to("cpu")
-
        return comfy.utils.tiled_scale_multidim(samples, encode_fn, tile=(tile_t, tile_x, tile_y), overlap=overlap, upscale_amount=self.downscale_ratio, out_channels=self.latent_channels, downscale=True, index_formulas=self.downscale_index_formula, output_device=self.output_device)

    def decode(self, samples_in, vae_options={}):
@ -1831,14 +1727,8 @@ def load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, o
        raise RuntimeError("ERROR: Could not detect model type of: {}\n{}".format(ckpt_path, model_detection_error_hint(ckpt_path, sd)))
    if out[0] is not None:
        out[0].cached_patcher_init = (load_checkpoint_guess_config, (ckpt_path, False, False, False, embedding_directory, output_model, model_options, te_model_options), 0)
-    if output_vae and out[2] is not None and hasattr(out[2], "patcher"):
-        out[2].patcher.cached_patcher_init = (load_checkpoint_vae_patcher, (ckpt_path, embedding_directory, model_options, te_model_options, disable_dynamic))
    return out

-def load_checkpoint_vae_patcher(ckpt_path, embedding_directory=None, model_options={}, te_model_options={}, disable_dynamic=False):
-    _, _, vae, _ = load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=False, output_clipvision=False, embedding_directory=embedding_directory, output_model=False, model_options=model_options, te_model_options=te_model_options, disable_dynamic=disable_dynamic)
-    return vae.patcher
-
 def load_checkpoint_guess_config_model_only(ckpt_path, embedding_directory=None, model_options={}, te_model_options={}, disable_dynamic=False):
    model, *_ = load_checkpoint_guess_config(ckpt_path, False, False, False,
            embedding_directory=embedding_directory,
@ -2064,26 +1954,6 @@ def load_diffusion_model(unet_path, model_options={}, disable_dynamic=False):
    model.cached_patcher_init = (load_diffusion_model, (unet_path, model_options))
    return model

-def load_vae_patcher(vae_path, metadata=None, device=None):
-    """Reload a VAE from disk and return its patcher.
-
-    Used as the ``cached_patcher_init`` factory on ``VAE.patcher`` so that
-    :meth:`comfy.model_patcher.ModelPatcher.deepclone_multigpu` can produce a
-    fresh VAE patcher with no inherited source-device storage tracking. The
-    optional device matches the source loader's VAE initialization path; the
-    cloned patcher's load_device still controls the device targeted by the
-    multigpu clone. Without this, bare ``copy.deepcopy`` of the VAE wrapper
-    carries dynamic-VRAM allocator state forward to the clone, which causes
-    per-device worker threads in tiled encode/decode dispatch to access weights
-    through the source-device buffer."""
-    if metadata is None:
-        sd, metadata = comfy.utils.load_torch_file(vae_path, return_metadata=True)
-    else:
-        sd = comfy.utils.load_torch_file(vae_path)
-    vae = VAE(sd=sd, metadata=metadata, device=device)
-    vae.throw_exception_if_invalid()
-    return vae.patcher
-
 def load_unet(unet_path, dtype=None):
    logging.warning("The load_unet function has been deprecated and will be removed please switch to: load_diffusion_model")
    return load_diffusion_model(unet_path, model_options={"dtype": dtype})
--- a/comfy/utils.py
+++ b/comfy/utils.py
@ -28,13 +28,13 @@ import numpy as np
 from PIL import Image
 import logging
 import itertools
-import threading
 from torch.nn.functional import interpolate
 from tqdm.auto import trange
 from einops import rearrange
 from comfy.cli_args import args
 import json
 import time
+import threading
 import warnings

 MMAP_TORCH_FILES = args.mmap_torch_files
@ -86,6 +86,7 @@ def load_safetensors(ckpt):
    import comfy_aimdo.model_mmap

    f = open(ckpt, "rb", buffering=0)
+    file_lock = threading.Lock()
    model_mmap = comfy_aimdo.model_mmap.ModelMMAP(ckpt)
    file_size = os.path.getsize(ckpt)
    mv = memoryview((ctypes.c_uint8 * file_size).from_address(model_mmap.get()))
@ -111,7 +112,7 @@ def load_safetensors(ckpt):
                storage = tensor.untyped_storage()
                setattr(storage,
                        "_comfy_tensor_file_slice",
-                        comfy.memory_management.TensorFileSlice(f, threading.get_ident(), data_base_offset + start, end - start))
+                        comfy.memory_management.TensorFileSlice(f, file_lock, data_base_offset + start, end - start))
                setattr(storage, "_comfy_tensor_mmap_refs", (model_mmap, mv))
                sd[name] = tensor

@ -1186,161 +1187,6 @@ def tiled_scale_multidim(samples, function, tile=(64, 64), overlap=8, upscale_am
 def tiled_scale(samples, function, tile_x=64, tile_y=64, overlap = 8, upscale_amount = 4, out_channels = 3, output_device="cpu", pbar = None):
    return tiled_scale_multidim(samples, function, (tile_y, tile_x), overlap=overlap, upscale_amount=upscale_amount, out_channels=out_channels, output_device=output_device, pbar=pbar)

-
-def tiled_scale_multidim_multigpu(samples, functions, tile=(64, 64), overlap=8, upscale_amount=4, out_channels=3, output_device="cpu", downscale=False, index_formulas=None, pbar=None):
-    """Multigpu variant of tiled_scale_multidim. ``functions`` is a dict[torch.device, callable].
-
-    Round-robin dispatches tile positions across devices via threading. Each thread maintains
-    its own per-device CPU output and divisor buffer, applying the same feathered overlap mask
-    formula as the single-device path. Buffers are summed at the end, producing output that is
-    bit-equivalent to ``tiled_scale_multidim`` within fp32 add-order noise.
-
-    Falls back to ``tiled_scale_multidim`` with the only function when ``len(functions) < 2``.
-    Falls back to single-device on the "whole input fits in one tile" branch (no parallelism
-    available at that granularity).
-    """
-    devices = list(functions.keys())
-    if len(devices) < 2:
-        only_fn = next(iter(functions.values())) if functions else None
-        return tiled_scale_multidim(samples, only_fn, tile=tile, overlap=overlap,
-                                    upscale_amount=upscale_amount, out_channels=out_channels,
-                                    output_device=output_device, downscale=downscale,
-                                    index_formulas=index_formulas, pbar=pbar)
-
-    dims = len(tile)
-
-    if not (isinstance(upscale_amount, (tuple, list))):
-        upscale_amount = [upscale_amount] * dims
-    if not (isinstance(overlap, (tuple, list))):
-        overlap = [overlap] * dims
-    if index_formulas is None:
-        index_formulas = upscale_amount
-    if not (isinstance(index_formulas, (tuple, list))):
-        index_formulas = [index_formulas] * dims
-
-    def get_upscale(dim, val):
-        up = upscale_amount[dim]
-        return up(val) if callable(up) else up * val
-
-    def get_downscale(dim, val):
-        up = upscale_amount[dim]
-        return up(val) if callable(up) else val / up
-
-    def get_upscale_pos(dim, val):
-        up = index_formulas[dim]
-        return up(val) if callable(up) else up * val
-
-    def get_downscale_pos(dim, val):
-        up = index_formulas[dim]
-        return up(val) if callable(up) else val / up
-
-    if downscale:
-        get_scale = get_downscale
-        get_pos = get_downscale_pos
-    else:
-        get_scale = get_upscale
-        get_pos = get_upscale_pos
-
-    def mult_list_upscale(a):
-        return [round(get_scale(i, a[i])) for i in range(len(a))]
-
-    output = torch.empty([samples.shape[0], out_channels] + mult_list_upscale(samples.shape[2:]), device=output_device)
-    merge_device = torch.device("cpu")
-
-    pbar_lock = threading.Lock() if pbar is not None else None
-    primary_device = devices[0]
-
-    samples_staged = samples if samples.device.type == "cpu" else samples.to("cpu", non_blocking=False)
-
-    for b in range(samples_staged.shape[0]):
-        s = samples_staged[b:b+1]
-
-        if all(s.shape[d+2] <= tile[d] for d in range(dims)):
-            with torch.inference_mode():
-                output[b:b+1] = functions[primary_device](s.to(primary_device, non_blocking=True)).to(output_device)
-            if pbar is not None:
-                pbar.update(1)
-            continue
-
-        positions = [range(0, s.shape[d+2] - overlap[d], tile[d] - overlap[d]) if s.shape[d+2] > tile[d] else [0] for d in range(dims)]
-        split = {devices[i]: itertools.islice(itertools.product(*positions), i, None, len(devices)) for i in range(len(devices))}
-
-        out_shape = [s.shape[0], out_channels] + mult_list_upscale(s.shape[2:])
-        div_shape = [s.shape[0], 1] + mult_list_upscale(s.shape[2:])
-        bufs = {d: torch.zeros(out_shape, device=merge_device) for d in devices}
-        divs = {d: torch.zeros(div_shape, device=merge_device) for d in devices}
-
-        worker_errors: list[BaseException] = []
-        worker_lock = threading.Lock()
-
-        def worker(device, my_positions):
-            try:
-                if device.type == "cuda":
-                    torch.cuda.set_device(device)
-                fn = functions[device]
-                local_buf = bufs[device]
-                local_div = divs[device]
-                with torch.inference_mode():
-                    for it in my_positions:
-                        s_in = s
-                        upscaled = []
-                        for d in range(dims):
-                            pos = max(0, min(s.shape[d + 2] - overlap[d], it[d]))
-                            l = min(tile[d], s.shape[d + 2] - pos)
-                            s_in = s_in.narrow(d + 2, pos, l)
-                            upscaled.append(round(get_pos(d, pos)))
-
-                        s_in_dev = s_in.to(device, non_blocking=True)
-                        ps = fn(s_in_dev).to(merge_device)
-                        mask = torch.ones([1, 1] + list(ps.shape[2:]), device=merge_device)
-
-                        for d in range(2, dims + 2):
-                            feather = round(get_scale(d - 2, overlap[d - 2]))
-                            if feather >= mask.shape[d]:
-                                continue
-                            for t in range(feather):
-                                a = (t + 1) / feather
-                                mask.narrow(d, t, 1).mul_(a)
-                                mask.narrow(d, mask.shape[d] - 1 - t, 1).mul_(a)
-
-                        o = local_buf
-                        o_d = local_div
-                        ps_view = ps
-                        mask_view = mask
-                        for d in range(dims):
-                            l = min(ps_view.shape[d + 2], o.shape[d + 2] - upscaled[d])
-                            o = o.narrow(d + 2, upscaled[d], l)
-                            o_d = o_d.narrow(d + 2, upscaled[d], l)
-                            if l < ps_view.shape[d + 2]:
-                                ps_view = ps_view.narrow(d + 2, 0, l)
-                                mask_view = mask_view.narrow(d + 2, 0, l)
-
-                        o.add_(ps_view * mask_view)
-                        o_d.add_(mask_view)
-
-                        if pbar is not None:
-                            with pbar_lock:
-                                pbar.update(1)
-                if device.type == "cuda":
-                    torch.cuda.synchronize(device)
-            except BaseException as e:
-                with worker_lock:
-                    worker_errors.append(e)
-
-        threads = [threading.Thread(target=worker, args=(d, split[d])) for d in devices]
-        for t in threads:
-            t.start()
-        for t in threads:
-            t.join()
-        if worker_errors:
-            raise worker_errors[0]
-
-        combined_buf = sum(bufs.values())
-        combined_div = sum(divs.values())
-        output[b:b+1] = combined_buf / combined_div
-
-    return output
-
 def model_trange(*args, **kwargs):
    if not comfy.memory_management.aimdo_enabled:
        return trange(*args, **kwargs)
--- a/comfy_api_nodes/apis/rodin.py
+++ b/comfy_api_nodes/apis/rodin.py
@ -1,7 +1,5 @@
-from __future__ import annotations
-
 from enum import Enum
-from typing import Optional, List
+
 from pydantic import BaseModel, Field


@ -11,44 +9,76 @@ class Rodin3DGenerateRequest(BaseModel):
    material: str = Field(..., description="The material type.")
    quality_override: int = Field(..., description="The poly count of the mesh.")
    mesh_mode: str = Field(..., description="It controls the type of faces of generated models.")
-    TAPose: Optional[bool] = Field(None, description="")
+    TAPose: bool | None = Field(None, description="")
+
+
+class Rodin3DGen25Request(BaseModel):
+
+    tier: str = Field(..., description="Gen-2.5 tier (e.g. Gen-2.5-High).")
+    prompt: str | None = Field(None, description="Required for Text-to-3D; ignored otherwise.")
+    seed: int | None = Field(None, description="0-65535.")
+    material: str | None = Field(None, description="PBR | Shaded | All | None.")
+    geometry_file_format: str | None = Field(None, description="glb | usdz | fbx | obj | stl.")
+    texture_mode: str | None = Field(None, description="legacy | extreme-low | low | medium | high.")
+    mesh_mode: str | None = Field(None, description="Raw (triangular) | Quad.")
+    quality_override: int | None = Field(None, description="Mesh face count override.")
+    geometry_instruct_mode: str | None = Field(None, description="faithful | creative.")
+    bbox_condition: list[int] | None = Field(None, description="Bounding box [Width(Y), Height(Z), Length(X)] in cm.")
+    height: int | None = Field(None, description="Approximate model height in cm.")
+    TAPose: bool | None = Field(None, description="T/A pose for human-like models.")
+    hd_texture: bool | None = Field(None, description="Enhanced texture quality.")
+    texture_delight: bool | None = Field(None, description="Remove baked lighting from textures.")
+    is_micro: bool | None = Field(None, description="Micro detail (Extreme-High only).")
+    use_original_alpha: bool | None = Field(None, description="Preserve image transparency.")
+    preview_render: bool | None = Field(None, description="Generate high-quality preview render.")
+    addons: list[str] | None = Field(None, description='Optional addons, e.g. ["HighPack"].')
+

 class GenerateJobsData(BaseModel):
-    uuids: List[str] = Field(..., description="str LIST")
+    uuids: list[str] = Field(..., description="str LIST")
    subscription_key: str = Field(..., description="subscription key")

+
 class Rodin3DGenerateResponse(BaseModel):
-    message: Optional[str] = Field(None, description="Return message.")
-    prompt: Optional[str] = Field(None, description="Generated Prompt from image.")
-    submit_time: Optional[str] = Field(None, description="Submit Time")
-    uuid: Optional[str] = Field(None, description="Task str")
-    jobs: Optional[GenerateJobsData] = Field(None, description="Details of jobs")
+    message: str | None = Field(None, description="Return message.")
+    prompt: str | None = Field(None, description="Generated Prompt from image.")
+    submit_time: str | None = Field(None, description="Submit Time")
+    uuid: str | None = Field(None, description="Task str")
+    jobs: GenerateJobsData | None = Field(None, description="Details of jobs")
+

 class JobStatus(str, Enum):
    """
    Status for jobs
    """
+
    Done = "Done"
    Failed = "Failed"
    Generating = "Generating"
    Waiting = "Waiting"

+
 class Rodin3DCheckStatusRequest(BaseModel):
    subscription_key: str = Field(..., description="subscription from generate endpoint")

+
 class JobItem(BaseModel):
    uuid: str = Field(..., description="uuid")
-    status: JobStatus = Field(...,description="Status Currently")
+    status: JobStatus = Field(..., description="Status Currently")
+

 class Rodin3DCheckStatusResponse(BaseModel):
-    jobs: List[JobItem] = Field(..., description="Job status List")
+    jobs: list[JobItem] = Field(..., description="Job status List")
+

 class Rodin3DDownloadRequest(BaseModel):
    task_uuid: str = Field(..., description="Task str")

+
 class RodinResourceItem(BaseModel):
    url: str = Field(..., description="Download Url")
    name: str = Field(..., description="File name with ext")

+
 class Rodin3DDownloadResponse(BaseModel):
-    list: List[RodinResourceItem] = Field(..., description="Source List")
+    items: list[RodinResourceItem] = Field(..., alias="list", description="Source List")
--- a/comfy_api_nodes/nodes_kling.py
+++ b/comfy_api_nodes/nodes_kling.py
@ -276,7 +276,6 @@ async def finish_omni_video_task(cls: type[IO.ComfyNode], response: TaskStatusRe
        cls,
        ApiEndpoint(path=f"/proxy/kling/v1/videos/omni-video/{response.data.task_id}"),
        response_model=TaskStatusResponse,
-        max_poll_attempts=280,
        status_extractor=lambda r: (r.data.task_status if r.data else None),
    )
    return IO.NodeOutput(await download_url_to_video_output(final_response.data.task_result.videos[0].url))
@ -3066,7 +3065,6 @@ class KlingVideoNode(IO.ComfyNode):
            cls,
            ApiEndpoint(path=poll_path),
            response_model=TaskStatusResponse,
-            max_poll_attempts=280,
            status_extractor=lambda r: (r.data.task_status if r.data else None),
        )
        return IO.NodeOutput(await download_url_to_video_output(final_response.data.task_result.videos[0].url))
@ -3192,7 +3190,6 @@ class KlingFirstLastFrameNode(IO.ComfyNode):
            cls,
            ApiEndpoint(path=f"/proxy/kling/v1/videos/image2video/{response.data.task_id}"),
            response_model=TaskStatusResponse,
-            max_poll_attempts=280,
            status_extractor=lambda r: (r.data.task_status if r.data else None),
        )
        return IO.NodeOutput(await download_url_to_video_output(final_response.data.task_result.videos[0].url))
--- a/comfy_api_nodes/nodes_rodin.py
+++ b/comfy_api_nodes/nodes_rodin.py
@ -5,32 +5,37 @@ Rodin API docs: https://developer.hyper3d.ai/

 """

-from inspect import cleandoc
-import folder_paths as comfy_paths
-import os
 import logging
 import math
+import os
+from inspect import cleandoc
 from io import BytesIO
-from typing_extensions import override
+from typing import Any
+
+import aiohttp
 from PIL import Image
+from typing_extensions import override
+
+import folder_paths as comfy_paths
+from comfy_api.latest import IO, ComfyExtension, Types
 from comfy_api_nodes.apis.rodin import (
-    Rodin3DGenerateRequest,
-    Rodin3DGenerateResponse,
+    JobStatus,
    Rodin3DCheckStatusRequest,
    Rodin3DCheckStatusResponse,
    Rodin3DDownloadRequest,
    Rodin3DDownloadResponse,
-    JobStatus,
+    Rodin3DGen25Request,
+    Rodin3DGenerateRequest,
+    Rodin3DGenerateResponse,
 )
 from comfy_api_nodes.util import (
-    sync_op,
-    poll_op,
    ApiEndpoint,
    download_url_to_bytesio,
    download_url_to_file_3d,
+    poll_op,
+    sync_op,
+    validate_string,
 )
-from comfy_api.latest import ComfyExtension, IO, Types
-

 COMMON_PARAMETERS = [
    IO.Int.Input(
@ -51,40 +56,30 @@ COMMON_PARAMETERS = [
 ]


-def get_quality_mode(poly_count):
-    polycount = poly_count.split("-")
-    poly = polycount[1]
-    count = polycount[0]
-    if poly == "Triangle":
-        mesh_mode = "Raw"
-    elif poly == "Quad":
-        mesh_mode = "Quad"
-    else:
-        mesh_mode = "Quad"
-
-    if count == "4K":
-        quality_override = 4000
-    elif count == "8K":
-        quality_override = 8000
-    elif count == "18K":
-        quality_override = 18000
-    elif count == "50K":
-        quality_override = 50000
-    elif count == "2K":
-        quality_override = 2000
-    elif count == "20K":
-        quality_override = 20000
-    elif count == "150K":
-        quality_override = 150000
-    elif count == "500K":
-        quality_override = 500000
-    else:
-        quality_override = 18000
-
-    return mesh_mode, quality_override
+_QUALITY_MESH_OPTIONS: dict[str, tuple[str, int]] = {
+    "4K-Quad":       ("Quad", 4000),
+    "8K-Quad":       ("Quad", 8000),
+    "18K-Quad":      ("Quad", 18000),
+    "50K-Quad":      ("Quad", 50000),
+    "200K-Quad":     ("Quad", 200000),
+    "2K-Triangle":   ("Raw", 2000),
+    "20K-Triangle":  ("Raw", 20000),
+    "150K-Triangle": ("Raw", 150000),
+    "200K-Triangle": ("Raw", 200000),
+    "500K-Triangle": ("Raw", 500000),
+    "1M-Triangle":   ("Raw", 1000000),
+}


-def tensor_to_filelike(tensor, max_pixels: int = 2048*2048):
+def get_quality_mode(poly_count: str) -> tuple[str, int]:
+    """Map a polygon-count preset like '18K-Quad' to (mesh_mode, quality_override).
+
+    Falls back to ('Quad', 18000) for unknown labels; legacy parity.
+    """
+    return _QUALITY_MESH_OPTIONS.get(poly_count, ("Quad", 18000))
+
+
+def tensor_to_filelike(tensor, max_pixels: int = 2048 * 2048):
    """
    Converts a PyTorch tensor to a file-like object.

@ -96,8 +91,8 @@ def tensor_to_filelike(tensor, max_pixels: int = 2048*2048):
    - io.BytesIO: A file-like object containing the image data.
    """
    array = tensor.cpu().numpy()
-    array = (array * 255).astype('uint8')
-    image = Image.fromarray(array, 'RGB')
+    array = (array * 255).astype("uint8")
+    image = Image.fromarray(array, "RGB")

    original_width, original_height = image.size
    original_pixels = original_width * original_height
@ -112,7 +107,7 @@ def tensor_to_filelike(tensor, max_pixels: int = 2048*2048):
        image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)

    img_byte_arr = BytesIO()
-    image.save(img_byte_arr, format='PNG')  # PNG is used for lossless compression
+    image.save(img_byte_arr, format="PNG")  # PNG is used for lossless compression
    img_byte_arr.seek(0)
    return img_byte_arr

@ -145,11 +140,9 @@ async def create_generate_task(
            TAPose=ta_pose,
        ),
        files=[
-            (
-                "images",
-                open(image, "rb") if isinstance(image, str) else tensor_to_filelike(image)
-            )
-            for image in images if image is not None
+            ("images", open(image, "rb") if isinstance(image, str) else tensor_to_filelike(image))
+            for image in images
+            if image is not None
        ],
        content_type="multipart/form-data",
    )
@ -177,6 +170,7 @@ def check_rodin_status(response: Rodin3DCheckStatusResponse) -> str:
        return "DONE"
    return "Generating"

+
 def extract_progress(response: Rodin3DCheckStatusResponse) -> int | None:
    if not response.jobs:
        return None
@ -214,7 +208,7 @@ async def download_files(url_list, task_uuid: str) -> tuple[str | None, Types.Fi
    model_file_path = None
    file_3d = None

-    for i in url_list.list:
+    for i in url_list.items:
        file_path = os.path.join(save_path, i.name)
        if i.name.lower().endswith(".glb"):
            model_file_path = os.path.join(result_folder_name, i.name)
@ -489,7 +483,16 @@ class Rodin3D_Gen2(IO.ComfyNode):
                IO.Combo.Input("Material_Type", options=["PBR", "Shaded"], default="PBR", optional=True),
                IO.Combo.Input(
                    "Polygon_count",
-                    options=["4K-Quad", "8K-Quad", "18K-Quad", "50K-Quad", "2K-Triangle", "20K-Triangle", "150K-Triangle", "500K-Triangle"],
+                    options=[
+                        "4K-Quad",
+                        "8K-Quad",
+                        "18K-Quad",
+                        "50K-Quad",
+                        "2K-Triangle",
+                        "20K-Triangle",
+                        "150K-Triangle",
+                        "500K-Triangle",
+                    ],
                    default="500K-Triangle",
                    optional=True,
                ),
@ -542,6 +545,566 @@ class Rodin3D_Gen2(IO.ComfyNode):
        return IO.NodeOutput(model_path, file_3d)


+def _rodin_multipart_parser(data: dict[str, Any]) -> aiohttp.FormData:
+    """Convert a Rodin request dict to an aiohttp form, fixing bool/list serialization.
+
+    Booleans --> "true"/"false". Lists --> one field per element.
+    """
+    form = aiohttp.FormData(default_to_multipart=True)
+    for key, value in data.items():
+        if value is None:
+            continue
+        if isinstance(value, bool):
+            form.add_field(key, "true" if value else "false")
+        elif isinstance(value, list):
+            for item in value:
+                form.add_field(key, str(item))
+        elif isinstance(value, (bytes, bytearray)):
+            form.add_field(key, value)
+        else:
+            form.add_field(key, str(value))
+    return form
+
+
+async def _create_gen25_task(
+    cls: type[IO.ComfyNode],
+    request: Rodin3DGen25Request,
+    images: list | None,
+) -> tuple[str, str]:
+    """Submit a Gen-2.5 generate job; returns (task_uuid, subscription_key)."""
+
+    if images is not None and len(images) > 5:
+        raise ValueError("Rodin Gen-2.5 supports at most 5 input images.")
+
+    files = None
+    if images:
+        files = [
+            (
+                "images",
+                open(image, "rb") if isinstance(image, str) else tensor_to_filelike(image),
+            )
+            for image in images
+            if image is not None
+        ]
+
+    response = await sync_op(
+        cls,
+        ApiEndpoint(path="/proxy/rodin/api/v2/rodin", method="POST"),
+        response_model=Rodin3DGenerateResponse,
+        data=request,
+        files=files,
+        content_type="multipart/form-data",
+        multipart_parser=_rodin_multipart_parser,
+    )
+
+    if not response.uuid or not response.jobs or not response.jobs.subscription_key:
+        raise RuntimeError(f"Rodin Gen-2.5 submit failed: message={response.message!r}")
+    return response.uuid, response.jobs.subscription_key
+
+
+_PREVIEWABLE_3D_EXTS = {".glb", ".obj", ".fbx", ".stl", ".gltf"}
+
+
+async def _download_gen25_files(
+    download_list: Rodin3DDownloadResponse,
+    task_uuid: str,
+    geometry_file_format: str,
+) -> Types.File3D | None:
+    """Download every file in the list; return the File3D matching the chosen format."""
+
+    folder_name = f"Rodin3D_Gen25_{task_uuid}"
+    save_dir = os.path.join(comfy_paths.get_output_directory(), folder_name)
+    os.makedirs(save_dir, exist_ok=True)
+
+    target_ext = f".{geometry_file_format.lower().lstrip('.')}"
+    file_3d: Types.File3D | None = None
+
+    for item in download_list.items:
+        file_path = os.path.join(save_dir, item.name)
+        ext = os.path.splitext(item.name.lower())[1]
+        # Prefer the file matching the user's chosen format; fall back below.
+        if file_3d is None and ext == target_ext and ext in _PREVIEWABLE_3D_EXTS:
+            file_3d = await download_url_to_file_3d(item.url, target_ext.lstrip("."))
+            with open(file_path, "wb") as f:
+                f.write(file_3d.get_bytes())
+            continue
+        await download_url_to_bytesio(item.url, file_path)
+
+    # If the chosen format wasn't found, surface any model file we did get.
+    if file_3d is None:
+        for item in download_list.items:
+            ext = os.path.splitext(item.name.lower())[1]
+            if ext in _PREVIEWABLE_3D_EXTS:
+                file_3d = await download_url_to_file_3d(item.url, ext.lstrip("."))
+                break
+    return file_3d
+
+
+_MODE_REGULAR = "Regular"
+_MODE_FAST = "Fast"
+_MODE_EXTREME_HIGH = "Extreme-High"
+
+_REGULAR_POLY_OPTIONS = [
+    "Default",
+    "4K-Quad",
+    "8K-Quad",
+    "18K-Quad",
+    "50K-Quad",
+    "2K-Triangle",
+    "20K-Triangle",
+    "150K-Triangle",
+    "500K-Triangle",
+    "1M-Triangle",
+]
+
+_TEXTURE_MODE_OPTIONS = ["Default", "legacy", "extreme-low", "low", "medium", "high"]
+_GEOMETRY_FORMAT_OPTIONS = ["glb", "fbx", "obj", "stl"]
+_MATERIAL_OPTIONS = ["PBR", "Shaded", "All", "None"]
+
+
+def _build_mode_input(name: str = "mode") -> IO.DynamicCombo.Input:
+    return IO.DynamicCombo.Input(
+        name,
+        options=[
+            IO.DynamicCombo.Option(
+                _MODE_REGULAR,
+                [
+                    IO.Combo.Input(
+                        "tier",
+                        options=["Gen-2.5-Low", "Gen-2.5-Medium", "Gen-2.5-High"],
+                        default="Gen-2.5-High",
+                        tooltip="Quality tier. Higher tiers produce higher-fidelity geometry.",
+                    ),
+                    IO.Combo.Input(
+                        "polygon_count",
+                        options=_REGULAR_POLY_OPTIONS,
+                        default="Default",
+                        tooltip="Preset face count. 'Default' uses the server's default for the selected tier.",
+                    ),
+                    IO.Boolean.Input(
+                        "creative",
+                        default=False,
+                        tooltip="Creative mode (Medium/High only). Enhances generative robustness.",
+                    ),
+                ],
+            ),
+            IO.DynamicCombo.Option(
+                _MODE_FAST,
+                [
+                    IO.Combo.Input(
+                        "tier",
+                        options=[
+                            "Gen-2.5-Extreme-Low",
+                            "Gen-2.5-Low",
+                            "Gen-2.5-Medium",
+                            "Gen-2.5-High",
+                        ],
+                        default="Gen-2.5-Low",
+                    ),
+                    IO.Int.Input(
+                        "mesh_faces",
+                        default=20000,
+                        min=1000,
+                        max=20000,
+                        display_mode=IO.NumberDisplay.number,
+                        tooltip="Mesh face count (1K-20K in Fast mode).",
+                    ),
+                ],
+            ),
+            IO.DynamicCombo.Option(
+                _MODE_EXTREME_HIGH,
+                [
+                    IO.Combo.Input("mesh_mode", options=["Raw", "Quad"], default="Raw"),
+                    IO.Int.Input(
+                        "mesh_faces",
+                        default=1000000,
+                        min=20000,
+                        max=2000000,
+                        display_mode=IO.NumberDisplay.number,
+                        tooltip=(
+                            "Mesh face count. Raw mode: 20K-2M. "
+                            "Quad mode: keep under 200K (upstream may reject higher values)."
+                        ),
+                    ),
+                    IO.Boolean.Input(
+                        "is_micro",
+                        default=False,
+                        tooltip="Enable micro detail (Extreme-High only).",
+                    ),
+                    IO.Boolean.Input(
+                        "creative",
+                        default=False,
+                        tooltip="Creative mode. Enhances generative robustness.",
+                    ),
+                ],
+            ),
+        ],
+        tooltip=(
+            "Generation mode. Regular = balanced. Fast = 1K-20K faces for rapid prototyping. "
+            "Extreme-High = 20K-2M faces with optional micro details."
+        ),
+    )
+
+
+def _build_common_inputs(*, include_image_only: bool) -> list:
+    inputs: list = [
+        IO.Combo.Input("material", options=_MATERIAL_OPTIONS, default="Shaded"),
+        IO.Combo.Input("geometry_file_format", options=_GEOMETRY_FORMAT_OPTIONS, default="glb"),
+        IO.Combo.Input(
+            "texture_mode",
+            options=_TEXTURE_MODE_OPTIONS,
+            default="Default",
+            optional=True,
+            tooltip="Texture quality preset. 'Default' uses the server's default for the selected tier.",
+        ),
+        IO.Int.Input(
+            "seed",
+            default=0,
+            min=0,
+            max=65535,
+            display_mode=IO.NumberDisplay.number,
+            control_after_generate=True,
+            optional=True,
+        ),
+        IO.Boolean.Input(
+            "TAPose", default=False, optional=True, advanced=True, tooltip="T/A pose for human-like models."
+        ),
+        IO.Boolean.Input(
+            "hd_texture", default=False, optional=True, advanced=True, tooltip="High-quality texture enhancement."
+        ),
+        IO.Boolean.Input(
+            "texture_delight",
+            default=False,
+            optional=True,
+            advanced=True,
+            tooltip="Remove baked lighting from textures.",
+        ),
+    ]
+    if include_image_only:
+        inputs.append(
+            IO.Boolean.Input(
+                "use_original_alpha",
+                default=False,
+                optional=True,
+                advanced=True,
+                tooltip="Preserve image transparency.",
+            )
+        )
+    inputs.extend(
+        [
+            IO.Boolean.Input(
+                "addon_highpack",
+                default=False,
+                optional=True,
+                advanced=True,
+                tooltip="HighPack addon: 4K textures and ~16x faces in Quad mode.",
+            ),
+            IO.Int.Input(
+                "bbox_width",
+                default=0,
+                min=0,
+                max=300,
+                display_mode=IO.NumberDisplay.number,
+                optional=True,
+                advanced=True,
+                tooltip="Bounding-box width (Y axis). Set to 0 with the others to skip bbox.",
+            ),
+            IO.Int.Input(
+                "bbox_height",
+                default=0,
+                min=0,
+                max=300,
+                display_mode=IO.NumberDisplay.number,
+                optional=True,
+                advanced=True,
+                tooltip="Bounding-box height (Z axis).",
+            ),
+            IO.Int.Input(
+                "bbox_length",
+                default=0,
+                min=0,
+                max=300,
+                display_mode=IO.NumberDisplay.number,
+                optional=True,
+                advanced=True,
+                tooltip="Bounding-box length (X axis).",
+            ),
+            IO.Int.Input(
+                "height_cm",
+                default=0,
+                min=0,
+                max=10000,
+                display_mode=IO.NumberDisplay.number,
+                optional=True,
+                advanced=True,
+                tooltip="Approximate model height in centimeters (0 to skip).",
+            ),
+        ]
+    )
+    return inputs
+
+
+_PRICE_EXPR = """
+(
+  $baseCredits := widgets.mode = "extreme-high" ? 1.0 : 0.5;
+  $addonCredits := widgets.addon_highpack ? 1.0 : 0.0;
+  $total := ($baseCredits * 1.5) + ($addonCredits * 0.8);
+  {"type":"usd","usd": $total}
+)
+"""
+
+
+def _resolve_mode_params(mode_input: dict) -> dict:
+    """Translate the DynamicCombo `mode` payload into Gen-2.5 request fields.
+
+    Returns a dict with: tier, quality_override, mesh_mode, geometry_instruct_mode, is_micro.
+    Missing keys mean "do not send" (so we don't override server defaults).
+    """
+    selected = mode_input["mode"]
+    out: dict = {}
+
+    if selected == _MODE_REGULAR:
+        out["tier"] = mode_input["tier"]
+        polygon = mode_input.get("polygon_count", "Default")
+        if polygon != "Default":
+            mesh_mode, faces = get_quality_mode(polygon)
+            out["mesh_mode"] = mesh_mode
+            out["quality_override"] = faces
+        if mode_input.get("creative"):
+            out["geometry_instruct_mode"] = "creative"
+
+    elif selected == _MODE_FAST:
+        out["tier"] = mode_input["tier"]
+        out["mesh_mode"] = "Raw"
+        out["quality_override"] = int(mode_input["mesh_faces"])
+
+    elif selected == _MODE_EXTREME_HIGH:
+        out["tier"] = "Gen-2.5-Extreme-High"
+        out["mesh_mode"] = mode_input["mesh_mode"]
+        out["quality_override"] = int(mode_input["mesh_faces"])
+        if mode_input.get("is_micro"):
+            out["is_micro"] = True
+        if mode_input.get("creative"):
+            out["geometry_instruct_mode"] = "creative"
+    return out
+
+
+def _build_request(
+    *,
+    mode_input: dict,
+    material: str,
+    geometry_file_format: str,
+    texture_mode: str,
+    seed: int,
+    TAPose: bool,
+    hd_texture: bool,
+    texture_delight: bool,
+    addon_highpack: bool,
+    bbox_width: int,
+    bbox_height: int,
+    bbox_length: int,
+    height_cm: int,
+    prompt: str | None = None,
+    use_original_alpha: bool = False,
+) -> Rodin3DGen25Request:
+    mode_params = _resolve_mode_params(mode_input)
+
+    bbox = None
+    if bbox_width and bbox_height and bbox_length:
+        bbox = [bbox_width, bbox_height, bbox_length]
+
+    return Rodin3DGen25Request(
+        tier=mode_params["tier"],
+        prompt=prompt or None,
+        seed=seed,
+        material=material,
+        geometry_file_format=geometry_file_format,
+        texture_mode=None if texture_mode == "Default" else texture_mode,
+        mesh_mode=mode_params.get("mesh_mode"),
+        quality_override=mode_params.get("quality_override"),
+        geometry_instruct_mode=mode_params.get("geometry_instruct_mode"),
+        bbox_condition=bbox,
+        height=height_cm or None,
+        TAPose=TAPose or None,
+        hd_texture=hd_texture or None,
+        texture_delight=texture_delight or None,
+        is_micro=mode_params.get("is_micro"),
+        use_original_alpha=use_original_alpha or None,
+        addons=["HighPack"] if addon_highpack else None,
+    )
+
+
+class Rodin3D_Gen25_Image(IO.ComfyNode):
+
+    @classmethod
+    def define_schema(cls) -> IO.Schema:
+        return IO.Schema(
+            node_id="Rodin3D_Gen25_Image",
+            display_name="Rodin 3D Gen-2.5 - Image to 3D",
+            category="api node/3d/Rodin",
+            description=(
+                "Generate a 3D model from 1-5 reference images via Rodin Gen-2.5. "
+                "Pick a mode (Fast / Regular / Extreme-High) to tune quality vs. cost."
+            ),
+            inputs=[
+                IO.Autogrow.Input(
+                    "images",
+                    template=IO.Autogrow.TemplatePrefix(IO.Image.Input("image"), prefix="image", min=1, max=5),
+                    tooltip="1-5 images. The first image is used for materials when multi-view.",
+                ),
+                _build_mode_input(),
+                *_build_common_inputs(include_image_only=True),
+            ],
+            outputs=[IO.File3DAny.Output(display_name="model_file")],
+            hidden=[
+                IO.Hidden.auth_token_comfy_org,
+                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
+            ],
+            is_api_node=True,
+            price_badge=IO.PriceBadge(
+                depends_on=IO.PriceBadgeDepends(widgets=["mode", "addon_highpack"]),
+                expr=_PRICE_EXPR,
+            ),
+        )
+
+    @classmethod
+    async def execute(
+        cls,
+        images: IO.Autogrow.Type,
+        mode: dict,
+        material: str,
+        geometry_file_format: str,
+        texture_mode: str,
+        seed: int,
+        TAPose: bool,
+        hd_texture: bool,
+        texture_delight: bool,
+        use_original_alpha: bool,
+        addon_highpack: bool,
+        bbox_width: int,
+        bbox_height: int,
+        bbox_length: int,
+        height_cm: int,
+    ) -> IO.NodeOutput:
+        image_tensors = [img for img in images.values() if img is not None]
+        if not image_tensors:
+            raise ValueError("Rodin Gen-2.5 Image-to-3D requires at least one image.")
+
+        # Flatten multi-image tensors into individual frames; the API accepts each as a separate part.
+        flat_images: list = []
+        for tensor in image_tensors:
+            if hasattr(tensor, "shape") and len(tensor.shape) == 4:
+                for i in range(tensor.shape[0]):
+                    flat_images.append(tensor[i])
+            else:
+                flat_images.append(tensor)
+
+        if len(flat_images) > 5:
+            raise ValueError(f"Rodin Gen-2.5 accepts at most 5 images; received {len(flat_images)}.")
+
+        request = _build_request(
+            mode_input=mode,
+            material=material,
+            geometry_file_format=geometry_file_format,
+            texture_mode=texture_mode,
+            seed=seed,
+            TAPose=TAPose,
+            hd_texture=hd_texture,
+            texture_delight=texture_delight,
+            addon_highpack=addon_highpack,
+            bbox_width=bbox_width,
+            bbox_height=bbox_height,
+            bbox_length=bbox_length,
+            height_cm=height_cm,
+            prompt=None,
+            use_original_alpha=use_original_alpha,
+        )
+
+        task_uuid, subscription_key = await _create_gen25_task(cls, request, flat_images)
+        await poll_for_task_status(subscription_key, cls)
+        download_list = await get_rodin_download_list(task_uuid, cls)
+        file_3d = await _download_gen25_files(download_list, task_uuid, geometry_file_format)
+        return IO.NodeOutput(file_3d)
+
+
+class Rodin3D_Gen25_Text(IO.ComfyNode):
+
+    @classmethod
+    def define_schema(cls) -> IO.Schema:
+        return IO.Schema(
+            node_id="Rodin3D_Gen25_Text",
+            display_name="Rodin 3D Gen-2.5 - Text to 3D",
+            category="api node/3d/Rodin",
+            description=(
+                "Generate a 3D model from a text prompt via Rodin Gen-2.5. "
+                "Pick a mode (Fast / Regular / Extreme-High) to tune quality vs. cost."
+            ),
+            inputs=[
+                IO.String.Input(
+                    "prompt",
+                    multiline=True,
+                    default="",
+                    tooltip="Text prompt for the 3D model.",
+                ),
+                _build_mode_input(),
+                *_build_common_inputs(include_image_only=False),
+            ],
+            outputs=[IO.File3DAny.Output(display_name="model_file")],
+            hidden=[
+                IO.Hidden.auth_token_comfy_org,
+                IO.Hidden.api_key_comfy_org,
+                IO.Hidden.unique_id,
+            ],
+            is_api_node=True,
+            price_badge=IO.PriceBadge(
+                depends_on=IO.PriceBadgeDepends(widgets=["mode", "addon_highpack"]),
+                expr=_PRICE_EXPR,
+            ),
+        )
+
+    @classmethod
+    async def execute(
+        cls,
+        prompt: str,
+        mode: dict,
+        material: str,
+        geometry_file_format: str,
+        texture_mode: str,
+        seed: int,
+        TAPose: bool,
+        hd_texture: bool,
+        texture_delight: bool,
+        addon_highpack: bool,
+        bbox_width: int,
+        bbox_height: int,
+        bbox_length: int,
+        height_cm: int,
+    ) -> IO.NodeOutput:
+        validate_string(prompt, field_name="prompt", min_length=1, max_length=2500)
+        request = _build_request(
+            mode_input=mode,
+            material=material,
+            geometry_file_format=geometry_file_format,
+            texture_mode=texture_mode,
+            seed=seed,
+            TAPose=TAPose,
+            hd_texture=hd_texture,
+            texture_delight=texture_delight,
+            addon_highpack=addon_highpack,
+            bbox_width=bbox_width,
+            bbox_height=bbox_height,
+            bbox_length=bbox_length,
+            height_cm=height_cm,
+            prompt=prompt,
+        )
+        task_uuid, subscription_key = await _create_gen25_task(cls, request, images=None)
+        await poll_for_task_status(subscription_key, cls)
+        download_list = await get_rodin_download_list(task_uuid, cls)
+        file_3d = await _download_gen25_files(download_list, task_uuid, geometry_file_format)
+        return IO.NodeOutput(file_3d)
+
+
 class Rodin3DExtension(ComfyExtension):
    @override
    async def get_node_list(self) -> list[type[IO.ComfyNode]]:
@ -551,6 +1114,8 @@ class Rodin3DExtension(ComfyExtension):
            Rodin3D_Smooth,
            Rodin3D_Sketch,
            Rodin3D_Gen2,
+            Rodin3D_Gen25_Image,
+            Rodin3D_Gen25_Text,
        ]


--- a/comfy_extras/nodes_logic.py
+++ b/comfy_extras/nodes_logic.py
@ -8,6 +8,82 @@ from comfy_api.latest import _io
 MISSING = object()


+class NotNode(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="ComfyNotNode",
+            display_name="Not",
+            category="utils/logic",
+            description="Logical NOT operation. Returns true if the value is falsy. Uses Python's rules for truthiness.",
+            search_aliases=["invert", "toggle", "negate", "flip boolean"],
+            inputs=[
+                io.AnyType.Input("value"),
+            ],
+            outputs=[
+                io.Boolean.Output(),
+            ],
+        )
+
+    @classmethod
+    def execute(cls, value) -> io.NodeOutput:
+        return io.NodeOutput(not value)
+
+
+class AndNode(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        template = io.Autogrow.TemplatePrefix(
+            input=io.AnyType.Input("value"),
+            prefix="value",
+            min=1,
+        )
+        return io.Schema(
+            node_id="ComfyAndNode",
+            display_name="And",
+            category="utils/logic",
+            description="Logical AND operation. Returns true if all of the values are truthy. Uses Python's rules for truthiness.",
+            search_aliases=["all", "every"],
+            inputs=[
+                io.Autogrow.Input("values", template=template),
+            ],
+            outputs=[
+                io.Boolean.Output(),
+            ],
+        )
+
+    @classmethod
+    def execute(cls, values: io.Autogrow.Type) -> io.NodeOutput:
+        return io.NodeOutput(all(values.values()))
+
+
+class OrNode(io.ComfyNode):
+    @classmethod
+    def define_schema(cls):
+        template = io.Autogrow.TemplatePrefix(
+            input=io.AnyType.Input("value"),
+            prefix="value",
+            min=1,
+        )
+        return io.Schema(
+            node_id="ComfyOrNode",
+            display_name="Or",
+            category="utils/logic",
+            description="Logical OR operation. Returns true if any of the values are truthy. Uses Python's rules for truthiness.",
+            search_aliases=["any", "some"],
+            inputs=[
+                io.Autogrow.Input("values", template=template),
+            ],
+            outputs=[
+                io.Boolean.Output(),
+            ],
+        )
+
+    @classmethod
+    def execute(cls, values: io.Autogrow.Type) -> io.NodeOutput:
+        return io.NodeOutput(any(values.values()))
+
+
 class SwitchNode(io.ComfyNode):
    @classmethod
    def define_schema(cls):
@ -15,7 +91,7 @@ class SwitchNode(io.ComfyNode):
        return io.Schema(
            node_id="ComfySwitchNode",
            display_name="Switch",
-            category="logic",
+            category="utils/logic",
            is_experimental=True,
            inputs=[
                io.Boolean.Input("switch"),
@ -46,7 +122,7 @@ class SoftSwitchNode(io.ComfyNode):
        return io.Schema(
            node_id="ComfySoftSwitchNode",
            display_name="Soft Switch",
-            category="logic",
+            category="utils/logic",
            is_experimental=True,
            inputs=[
                io.Boolean.Input("switch"),
@ -136,7 +212,7 @@ class DCTestNode(io.ComfyNode):
        return io.Schema(
            node_id="DCTestNode",
            display_name="DCTest",
-            category="logic",
+            category="utils/logic",
            is_output_node=True,
            inputs=[io.DynamicCombo.Input("combo", options=[
                io.DynamicCombo.Option("option1", [io.String.Input("string")]),
@ -174,7 +250,7 @@ class AutogrowNamesTestNode(io.ComfyNode):
        return io.Schema(
            node_id="AutogrowNamesTestNode",
            display_name="AutogrowNamesTest",
-            category="logic",
+            category="utils/logic",
            inputs=[
                _io.Autogrow.Input("autogrow", template=template)
            ],
@ -194,7 +270,7 @@ class AutogrowPrefixTestNode(io.ComfyNode):
        return io.Schema(
            node_id="AutogrowPrefixTestNode",
            display_name="AutogrowPrefixTest",
-            category="logic",
+            category="utils/logic",
            inputs=[
                _io.Autogrow.Input("autogrow", template=template)
            ],
@ -213,7 +289,7 @@ class ComboOutputTestNode(io.ComfyNode):
        return io.Schema(
            node_id="ComboOptionTestNode",
            display_name="ComboOptionTest",
-            category="logic",
+            category="utils/logic",
            inputs=[io.Combo.Input("combo", options=["option1", "option2", "option3"]),
                    io.Combo.Input("combo2", options=["option4", "option5", "option6"])],
            outputs=[io.Combo.Output(), io.Combo.Output()],
@ -230,7 +306,7 @@ class ConvertStringToComboNode(io.ComfyNode):
            node_id="ConvertStringToComboNode",
            search_aliases=["string to dropdown", "text to combo"],
            display_name="Convert String to Combo",
-            category="logic",
+            category="utils/logic",
            inputs=[io.String.Input("string")],
            outputs=[io.Combo.Output()],
        )
@ -246,7 +322,7 @@ class InvertBooleanNode(io.ComfyNode):
            node_id="InvertBooleanNode",
            search_aliases=["not", "toggle", "negate", "flip boolean"],
            display_name="Invert Boolean",
-            category="logic",
+            category="utils/logic",
            inputs=[io.Boolean.Input("boolean")],
            outputs=[io.Boolean.Output()],
        )
@ -261,6 +337,9 @@ class LogicExtension(ComfyExtension):
        return [
            SwitchNode,
            CustomComboNode,
+            NotNode,
+            AndNode,
+            OrNode,
            # SoftSwitchNode,
            # ConvertStringToComboNode,
            # DCTestNode,
--- a/comfy_extras/nodes_lt_audio.py
+++ b/comfy_extras/nodes_lt_audio.py
@ -182,7 +182,7 @@ class LTXAVTextEncoderLoader(io.ComfyNode):
                ),
                io.Combo.Input(
                    "device",
-                    options=comfy.model_management.get_gpu_device_options(),
+                    options=["default", "cpu"],
                    advanced=True,
                )
            ],
@ -197,12 +197,8 @@ class LTXAVTextEncoderLoader(io.ComfyNode):
        clip_path2 = folder_paths.get_full_path_or_raise("checkpoints", ckpt_name)

        model_options = {}
-        resolved = comfy.model_management.resolve_gpu_device_option(device)
-        if resolved is not None:
-            if resolved.type == "cpu":
-                model_options["load_device"] = model_options["offload_device"] = resolved
-            else:
-                model_options["load_device"] = resolved
+        if device == "cpu":
+            model_options["load_device"] = model_options["offload_device"] = torch.device("cpu")

        clip = comfy.sd.load_clip(ckpt_paths=[clip_path1, clip_path2], embedding_directory=folder_paths.get_folder_paths("embeddings"), clip_type=clip_type, model_options=model_options)
        return io.NodeOutput(clip)
--- a/comfy_extras/nodes_math.py
+++ b/comfy_extras/nodes_math.py
@ -70,7 +70,7 @@ class MathExpressionNode(io.ComfyNode):
        return io.Schema(
            node_id="ComfyMathExpression",
            display_name="Math Expression",
-            category="logic",
+            category="utils",
            search_aliases=[
                "expression", "formula", "calculate", "calculator",
                "eval", "math",
--- a/comfy_extras/nodes_multigpu.py
+++ b/comfy_extras/nodes_multigpu.py
@ -1,5 +1,7 @@
 from __future__ import annotations

+import copy
+import logging
 from inspect import cleandoc
 from typing import TYPE_CHECKING
 from typing_extensions import override
@ -8,47 +10,268 @@ from comfy_api.latest import ComfyExtension, io

 if TYPE_CHECKING:
    from comfy.model_patcher import ModelPatcher
+    from comfy.sd import CLIP, VAE
+import comfy.model_management
 import comfy.multigpu


 class MultiGPUCFGSplitNode(io.ComfyNode):
    """
-    Attaches per-device deepclones to any connected MODEL, UPSCALE_MODEL, and/or VAE so
-    downstream nodes that recognize the attached state dispatch their work across multiple GPUs.
+    Prepares model to have sampling accelerated via splitting work units.

-    Place after nodes that modify the model object itself (compile, attention-switch, etc.).
-    Otherwise position is not order-sensitive.
+    Should be placed after nodes that modify the model object itself, such as compile or attention-switch nodes.
+
+    Other than those exceptions, this node can be placed in any order.
    """

    @classmethod
    def define_schema(cls):
        return io.Schema(
            node_id="MultiGPU_WorkUnits",
-            display_name="MultiGPU Work Units",
+            display_name="MultiGPU CFG Split",
            category="advanced/multigpu",
            description=cleandoc(cls.__doc__),
            inputs=[
-                io.Model.Input("model", optional=True),
-                io.UpscaleModel.Input("upscale_model", optional=True),
-                io.Vae.Input("vae", optional=True),
+                io.Model.Input("model"),
                io.Int.Input("max_gpus", default=2, min=1, step=1),
            ],
            outputs=[
                io.Model.Output(),
-                io.UpscaleModel.Output(),
+            ],
+        )
+
+    @classmethod
+    def execute(cls, model: ModelPatcher, max_gpus: int) -> io.NodeOutput:
+        model = comfy.multigpu.create_multigpu_deepclones(model, max_gpus, reuse_loaded=True)
+        return io.NodeOutput(model)
+
+
+def _remember_base_devices(patcher: ModelPatcher):
+    """Stash the original load/offload device on the underlying model.
+
+    Stored on patcher.model (which is shared across patcher clones), so
+    repeated selector applications can recover the loader's original
+    routing when the user picks "default".
+    """
+    if not hasattr(patcher.model, "_select_base_load_device"):
+        patcher.model._select_base_load_device = patcher.load_device
+        patcher.model._select_base_offload_device = patcher.offload_device
+
+
+def _apply_patcher_device(patcher: ModelPatcher, resolved, base_offload_override=None):
+    """Apply *resolved* to a freshly-cloned patcher; respect base devices on default.
+
+    Returns the (possibly newly-replaced) patcher. For CPU on a dynamic
+    patcher, also tries to downgrade to a plain ModelPatcher so the
+    dynamic-only code paths are bypassed (best-effort: silently keeps
+    the dynamic patcher if downgrade is not supported).
+    """
+    _remember_base_devices(patcher)
+    base_load = patcher.model._select_base_load_device
+    base_offload = base_offload_override if base_offload_override is not None else patcher.model._select_base_offload_device
+
+    if resolved is None:
+        # "default" -> reset routing to whatever the loader produced
+        patcher.load_device = base_load
+        patcher.offload_device = base_offload
+    elif resolved.type == "cpu":
+        if patcher.is_dynamic():
+            try:
+                patcher = patcher.clone(disable_dynamic=True)
+            except Exception:
+                # Downgrade unavailable (no cached_patcher_init); fall
+                # back to the existing dynamic patcher.
+                pass
+        patcher.load_device = resolved
+        patcher.offload_device = resolved
+    else:
+        patcher.load_device = resolved
+        patcher.offload_device = base_offload
+
+    if hasattr(patcher, "register_load_device"):
+        patcher.register_load_device(patcher.load_device)
+    return patcher
+
+
+def _prune_multigpu_collision(model: ModelPatcher, primary_device):
+    """Drop any multigpu clone whose load_device matches *primary_device*.
+
+    Without pruning, MultiGPU CFG Split would have stacked a clone on
+    the same device the primary now occupies (i.e. the workflow places
+    MultiGPU CFG Split before Select Model Device). Keeps the clone set
+    consistent with the new primary placement.
+    """
+    multigpu_models = model.get_additional_models_with_key("multigpu")
+    if not multigpu_models:
+        return
+    filtered = [m for m in multigpu_models if m.load_device != primary_device]
+    if len(filtered) != len(multigpu_models):
+        logging.info(f"Select Model Device: pruning MultiGPU clone on {primary_device} that now collides with the primary model.")
+        model.set_additional_models("multigpu", filtered)
+        if hasattr(model, "match_multigpu_clones"):
+            model.match_multigpu_clones()
+
+
+class SelectModelDeviceNode(io.ComfyNode):
+    """
+    Place the diffusion model on a specific device (default / cpu / gpu:N).
+
+    - "default" restores the device assigned by the loader (even after a
+      prior Select Model Device call).
+    - "cpu" pins both the load and offload device to CPU.
+    - "gpu:N" pins the load device to the Nth available GPU; the offload
+      device is restored to the loader's original choice.
+
+    If the workflow already has MultiGPU CFG Split applied and the chosen
+    GPU collides with one of the existing multigpu clones, that clone is
+    dropped so two patchers don't end up bound to the same device.
+
+    When the selected device does not exist on the current machine
+    (e.g. a workflow built on a 2-GPU box opened on a 1-GPU box),
+    the node passes the model through unchanged and logs a message
+    instead of failing.
+    """
+
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="SelectModelDevice",
+            display_name="Select Model Device",
+            category="advanced/multigpu",
+            description=cleandoc(cls.__doc__),
+            inputs=[
+                io.Model.Input("model"),
+                io.Combo.Input("device", options=comfy.model_management.get_gpu_device_options()),
+            ],
+            outputs=[
+                io.Model.Output(),
+            ],
+        )
+
+    @classmethod
+    def validate_inputs(cls, device="default"):
+        # Allow unknown gpu:N values so portable workflows do not error
+        # at validation time; runtime fallback will handle them.
+        return True
+
+    @classmethod
+    def execute(cls, model: ModelPatcher, device: str = "default") -> io.NodeOutput:
+        model = model.clone()
+        resolved = comfy.model_management.resolve_gpu_device_option(device)
+        if resolved is None and device not in (None, "default"):
+            logging.info(f"Select Model Device: requested device '{device}' not available, passing through unchanged.")
+            return io.NodeOutput(model)
+        model = _apply_patcher_device(model, resolved)
+        if resolved is not None:
+            _prune_multigpu_collision(model, model.load_device)
+        return io.NodeOutput(model)
+
+
+class SelectCLIPDeviceNode(io.ComfyNode):
+    """
+    Place the CLIP text encoder on a specific device (default / cpu / gpu:N).
+
+    - "default" restores the device assigned by the loader.
+    - "cpu" pins both the load and offload device to CPU.
+    - "gpu:N" pins the load device to the Nth available GPU.
+
+    When the selected device does not exist on the current machine
+    (e.g. a workflow built on a 2-GPU box opened on a 1-GPU box),
+    the node passes the CLIP through unchanged and logs a message
+    instead of failing.
+    """
+
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="SelectCLIPDevice",
+            display_name="Select CLIP Device",
+            category="advanced/multigpu",
+            description=cleandoc(cls.__doc__),
+            inputs=[
+                io.Clip.Input("clip"),
+                io.Combo.Input("device", options=comfy.model_management.get_gpu_device_options()),
+            ],
+            outputs=[
+                io.Clip.Output(),
+            ],
+        )
+
+    @classmethod
+    def validate_inputs(cls, device="default"):
+        return True
+
+    @classmethod
+    def execute(cls, clip: CLIP, device: str = "default") -> io.NodeOutput:
+        clip = clip.clone()
+        resolved = comfy.model_management.resolve_gpu_device_option(device)
+        if resolved is None and device not in (None, "default"):
+            logging.info(f"Select CLIP Device: requested device '{device}' not available, passing through unchanged.")
+            return io.NodeOutput(clip)
+        clip.patcher = _apply_patcher_device(clip.patcher, resolved)
+        return io.NodeOutput(clip)
+
+
+class SelectVAEDeviceNode(io.ComfyNode):
+    """
+    Place the VAE on a specific device (default / gpu:N).
+
+    - "default" restores the device assigned by the loader.
+    - "gpu:N" pins the load device to the Nth available GPU; the offload
+      device is set to the standard VAE offload device.
+
+    CPU is intentionally not exposed in the UI for the VAE; if a workflow
+    supplies "cpu" anyway (e.g. opened from another machine), the request
+    is dropped with a log message and the VAE is passed through unchanged.
+
+    When the selected device does not exist on the current machine
+    (e.g. a workflow built on a 2-GPU box opened on a 1-GPU box),
+    the node passes the VAE through unchanged and logs a message
+    instead of failing.
+    """
+
+    @classmethod
+    def define_schema(cls):
+        return io.Schema(
+            node_id="SelectVAEDevice",
+            display_name="Select VAE Device",
+            category="advanced/multigpu",
+            description=cleandoc(cls.__doc__),
+            inputs=[
+                io.Vae.Input("vae"),
+                io.Combo.Input("device", options=comfy.model_management.get_gpu_device_options_no_cpu()),
+            ],
+            outputs=[
                io.Vae.Output(),
            ],
        )

    @classmethod
-    def execute(cls, max_gpus: int, model: ModelPatcher = None, upscale_model=None, vae=None) -> io.NodeOutput:
-        if model is not None:
-            model = comfy.multigpu.create_multigpu_deepclones(model, max_gpus, reuse_loaded=True)
-        if upscale_model is not None:
-            upscale_model = comfy.multigpu.create_upscale_model_multigpu_deepclones(upscale_model, max_gpus)
-        if vae is not None:
-            vae = comfy.multigpu.create_vae_multigpu_deepclones(vae, max_gpus)
-        return io.NodeOutput(model, upscale_model, vae)
+    def validate_inputs(cls, device="default"):
+        return True
+
+    @classmethod
+    def execute(cls, vae: VAE, device: str = "default") -> io.NodeOutput:
+        # VAE has no .clone(); shallow-copy the wrapper and clone the patcher
+        # so we can retarget load/offload device without affecting the input VAE.
+        vae = copy.copy(vae)
+        vae.patcher = vae.patcher.clone()
+        resolved = comfy.model_management.resolve_gpu_device_option(device)
+        if resolved is None and device not in (None, "default"):
+            logging.info(f"Select VAE Device: requested device '{device}' not available, passing through unchanged.")
+            return io.NodeOutput(vae)
+        if resolved is not None and resolved.type == "cpu":
+            logging.info("Select VAE Device: CPU is not a supported choice, passing through unchanged.")
+            return io.NodeOutput(vae)
+        vae.patcher = _apply_patcher_device(
+            vae.patcher, resolved,
+            base_offload_override=comfy.model_management.vae_offload_device(),
+        )
+        # VAE caches the working device separately from its patcher.
+        if not hasattr(vae, "_select_base_device"):
+            vae._select_base_device = vae.device
+        vae.device = vae._select_base_device if resolved is None else resolved
+        return io.NodeOutput(vae)


 class MultiGPUOptionsNode(io.ComfyNode):
@ -101,6 +324,9 @@ class MultiGPUExtension(ComfyExtension):
    async def get_node_list(self) -> list[type[io.ComfyNode]]:
        return [
            MultiGPUCFGSplitNode,
+            SelectModelDeviceNode,
+            SelectCLIPDeviceNode,
+            SelectVAEDeviceNode,
            # MultiGPUOptionsNode,
        ]

--- a/comfy_extras/nodes_toolkit.py
+++ b/comfy_extras/nodes_toolkit.py
@ -14,7 +14,7 @@ class CreateList(io.ComfyNode):
        return io.Schema(
            node_id="CreateList",
            display_name="Create List",
-            category="logic",
+            category="utils",
            is_input_list=True,
            search_aliases=["Image Iterator", "Text Iterator", "Iterator"],
            inputs=[io.Autogrow.Input("inputs", template=template_autogrow)],
--- a/comfy_extras/nodes_upscale_model.py
+++ b/comfy_extras/nodes_upscale_model.py
@ -81,33 +81,13 @@ class ImageUpscaleWithModel(io.ComfyNode):

        output_device = comfy.model_management.intermediate_device()

-        multigpu_clones = getattr(upscale_model, 'multigpu_clones', None)
-        if multigpu_clones:
-            for dev, desc in multigpu_clones.items():
-                model_management.free_memory(memory_required, dev)
-                desc.to(dev)
-
        oom = True
        try:
            while oom:
                try:
                    steps = in_img.shape[0] * comfy.utils.get_tiled_scale_steps(in_img.shape[3], in_img.shape[2], tile_x=tile, tile_y=tile, overlap=overlap)
                    pbar = comfy.utils.ProgressBar(steps)
-                    if multigpu_clones:
-                        functions = {device: lambda a: upscale_model(a.float())}
-                        for dev, desc in multigpu_clones.items():
-                            functions[dev] = lambda a, d=desc: d(a.float())
-                        s = comfy.utils.tiled_scale_multidim_multigpu(
-                            in_img,
-                            functions,
-                            tile=(tile, tile),
-                            overlap=overlap,
-                            upscale_amount=upscale_model.scale,
-                            pbar=pbar,
-                            output_device=output_device,
-                        )
-                    else:
-                        s = comfy.utils.tiled_scale(in_img, lambda a: upscale_model(a.float()), tile_x=tile, tile_y=tile, overlap=overlap, upscale_amount=upscale_model.scale, pbar=pbar, output_device=output_device)
+                    s = comfy.utils.tiled_scale(in_img, lambda a: upscale_model(a.float()), tile_x=tile, tile_y=tile, overlap=overlap, upscale_amount=upscale_model.scale, pbar=pbar, output_device=output_device)
                    oom = False
                except Exception as e:
                    model_management.raise_non_oom(e)
@ -116,9 +96,6 @@ class ImageUpscaleWithModel(io.ComfyNode):
                        raise e
        finally:
            upscale_model.to("cpu")
-            if multigpu_clones:
-                for desc in multigpu_clones.values():
-                    desc.to("cpu")

        s = torch.clamp(s.movedim(-3,-1), min=0, max=1.0).to(comfy.model_management.intermediate_dtype())
        return io.NodeOutput(s)
--- a/comfy_extras/nodes_video_model.py
+++ b/comfy_extras/nodes_video_model.py
@ -23,69 +23,6 @@ class ImageOnlyCheckpointLoader:
        return (out[0], out[3], out[2])


-class ImageOnlyCheckpointLoaderDevice:
-    @classmethod
-    def INPUT_TYPES(s):
-        device_options = comfy.model_management.get_gpu_device_options()
-        return {
-            "required": {
-                "ckpt_name": (folder_paths.get_filename_list("checkpoints"), ),
-            },
-            "optional": {
-                "model_device": (device_options, {"advanced": True, "tooltip": "Device for the diffusion model (UNET)."}),
-                "clip_vision_device": (device_options, {"advanced": True, "tooltip": "Device for the CLIP vision encoder."}),
-                "vae_device": (device_options, {"advanced": True, "tooltip": "Device for the VAE."}),
-            }
-        }
-    RETURN_TYPES = ("MODEL", "CLIP_VISION", "VAE")
-    FUNCTION = "load_checkpoint"
-
-    CATEGORY = "loaders/video_models"
-
-    @classmethod
-    def VALIDATE_INPUTS(cls, model_device="default", clip_vision_device="default", vae_device="default"):
-        return True
-
-    def load_checkpoint(self, ckpt_name, output_vae=True, output_clip=True, model_device="default", clip_vision_device="default", vae_device="default"):
-        ckpt_path = folder_paths.get_full_path_or_raise("checkpoints", ckpt_name)
-
-        model_options = {}
-        resolved_model = comfy.model_management.resolve_gpu_device_option(model_device)
-        if resolved_model is not None:
-            if resolved_model.type == "cpu":
-                model_options["load_device"] = model_options["offload_device"] = resolved_model
-            else:
-                model_options["load_device"] = resolved_model
-
-        cv_model_options = {}
-        resolved_clip = comfy.model_management.resolve_gpu_device_option(clip_vision_device)
-        if resolved_clip is not None:
-            if resolved_clip.type == "cpu":
-                cv_model_options["load_device"] = cv_model_options["offload_device"] = resolved_clip
-            else:
-                cv_model_options["load_device"] = resolved_clip
-
-        # VAE device is passed via model_options["load_device"] which
-        # load_state_dict_guess_config forwards to the VAE constructor.
-        # If vae_device differs from model_device, we override after loading.
-        resolved_vae = comfy.model_management.resolve_gpu_device_option(vae_device)
-
-        out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=False, output_clipvision=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
-        model_patcher, clip, vae, clip_vision = out[:4]
-
-        # Apply VAE device override if it differs from the model device
-        if resolved_vae is not None and vae is not None:
-            vae.device = resolved_vae
-            if resolved_vae.type == "cpu":
-                offload = resolved_vae
-            else:
-                offload = comfy.model_management.vae_offload_device()
-            vae.patcher.load_device = resolved_vae
-            vae.patcher.offload_device = offload
-
-        return (model_patcher, clip_vision, vae)
-
-
 class SVD_img2vid_Conditioning:
    @classmethod
    def INPUT_TYPES(s):
@ -212,7 +149,6 @@ class ConditioningSetAreaPercentageVideo:

 NODE_CLASS_MAPPINGS = {
    "ImageOnlyCheckpointLoader": ImageOnlyCheckpointLoader,
-    "ImageOnlyCheckpointLoaderDevice": ImageOnlyCheckpointLoaderDevice,
    "SVD_img2vid_Conditioning": SVD_img2vid_Conditioning,
    "VideoLinearCFGGuidance": VideoLinearCFGGuidance,
    "VideoTriangleCFGGuidance": VideoTriangleCFGGuidance,
@ -222,7 +158,6 @@ NODE_CLASS_MAPPINGS = {

 NODE_DISPLAY_NAME_MAPPINGS = {
    "ImageOnlyCheckpointLoader": "Load Checkpoint Image Only (img2vid model)",
-    "ImageOnlyCheckpointLoaderDevice": "Image Only Checkpoint Loader (Device)",
    "VideoLinearCFGGuidance": "Video Linear CFG Guidance",
    "VideoTriangleCFGGuidance": "Video Triangle CFG Guidance",
 }
--- a/main.py
+++ b/main.py
@ -200,7 +200,7 @@ import gc
 if 'torch' in sys.modules:
    logging.warning("WARNING: Potential Error in code: Torch already imported, torch should never be imported before this point.")

-import torch
+
 import comfy.utils

 import execution
@ -218,7 +218,7 @@ import comfy.model_patcher
 if args.enable_dynamic_vram or (enables_dynamic_vram() and comfy.model_management.is_nvidia() and not comfy.model_management.is_wsl()):
    if (not args.enable_dynamic_vram) and (comfy.model_management.torch_version_numeric < (2, 8)):
        logging.warning("Unsupported Pytorch detected. DynamicVRAM support requires Pytorch version 2.8 or later. Falling back to legacy ModelPatcher. VRAM estimates may be unreliable especially on Windows")
-    elif comfy_aimdo.control.init_devices(range(torch.cuda.device_count())):
+    elif comfy_aimdo.control.init_devices(d.index for d in comfy.model_management.get_all_torch_devices()):
        if args.verbose == 'DEBUG':
            comfy_aimdo.control.set_log_debug()
        elif args.verbose == 'CRITICAL':
--- a/nodes.py
+++ b/nodes.py
@ -608,73 +608,6 @@ class CheckpointLoaderSimple:
        out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
        return out[:3]

-
-class CheckpointLoaderDevice:
-    @classmethod
-    def INPUT_TYPES(s):
-        device_options = comfy.model_management.get_gpu_device_options()
-        return {
-            "required": {
-                "ckpt_name": (folder_paths.get_filename_list("checkpoints"), {"tooltip": "The name of the checkpoint (model) to load."}),
-            },
-            "optional": {
-                "model_device": (device_options, {"advanced": True, "tooltip": "Device for the diffusion model (UNET)."}),
-                "clip_device": (device_options, {"advanced": True, "tooltip": "Device for the CLIP text encoder."}),
-                "vae_device": (device_options, {"advanced": True, "tooltip": "Device for the VAE."}),
-            }
-        }
-    RETURN_TYPES = ("MODEL", "CLIP", "VAE")
-    OUTPUT_TOOLTIPS = ("The model used for denoising latents.",
-                       "The CLIP model used for encoding text prompts.",
-                       "The VAE model used for encoding and decoding images to and from latent space.")
-    FUNCTION = "load_checkpoint"
-
-    CATEGORY = "advanced/loaders"
-    DESCRIPTION = "Loads a diffusion model checkpoint with per-component device selection for multi-GPU setups."
-
-    @classmethod
-    def VALIDATE_INPUTS(cls, model_device="default", clip_device="default", vae_device="default"):
-        return True
-
-    def load_checkpoint(self, ckpt_name, model_device="default", clip_device="default", vae_device="default"):
-        ckpt_path = folder_paths.get_full_path_or_raise("checkpoints", ckpt_name)
-
-        model_options = {}
-        resolved_model = comfy.model_management.resolve_gpu_device_option(model_device)
-        if resolved_model is not None:
-            if resolved_model.type == "cpu":
-                model_options["load_device"] = model_options["offload_device"] = resolved_model
-            else:
-                model_options["load_device"] = resolved_model
-
-        te_model_options = {}
-        resolved_clip = comfy.model_management.resolve_gpu_device_option(clip_device)
-        if resolved_clip is not None:
-            if resolved_clip.type == "cpu":
-                te_model_options["load_device"] = te_model_options["offload_device"] = resolved_clip
-            else:
-                te_model_options["load_device"] = resolved_clip
-
-        # VAE device is passed via model_options["load_device"] which
-        # load_state_dict_guess_config forwards to the VAE constructor.
-        # If vae_device differs from model_device, we override after loading.
-        resolved_vae = comfy.model_management.resolve_gpu_device_option(vae_device)
-
-        out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"), model_options=model_options, te_model_options=te_model_options)
-        model_patcher, clip, vae = out[:3]
-
-        # Apply VAE device override if it differs from the model device
-        if resolved_vae is not None and vae is not None:
-            vae.device = resolved_vae
-            if resolved_vae.type == "cpu":
-                offload = resolved_vae
-            else:
-                offload = comfy.model_management.vae_offload_device()
-            vae.patcher.load_device = resolved_vae
-            vae.patcher.offload_device = offload
-
-        return (model_patcher, clip, vae)
-
 class DiffusersLoader:
    SEARCH_ALIASES = ["load diffusers model"]

@ -853,23 +786,15 @@ class VAELoader:

    @classmethod
    def INPUT_TYPES(s):
-        return {"required": { "vae_name": (s.vae_list(s), )},
-                "optional": {
-                              "device": (comfy.model_management.get_gpu_device_options(), {"advanced": True}),
-                             }}
+        return {"required": { "vae_name": (s.vae_list(s), )}}
    RETURN_TYPES = ("VAE",)
    FUNCTION = "load_vae"

    CATEGORY = "loaders"

-    @classmethod
-    def VALIDATE_INPUTS(cls, device="default"):
-        return True
-
    #TODO: scale factor?
-    def load_vae(self, vae_name, device="default"):
+    def load_vae(self, vae_name):
        metadata = None
-        vae_path = None
        if vae_name == "pixel_space":
            sd = {}
            sd["pixel_space_vae"] = torch.tensor(1.0)
@ -886,14 +811,8 @@ class VAELoader:
                metadata = {"tae_latent_channels": 128}
            else:
                metadata["tae_latent_channels"] = 128
-        resolved = comfy.model_management.resolve_gpu_device_option(device)
-        vae = comfy.sd.VAE(sd=sd, metadata=metadata, device=resolved)
+        vae = comfy.sd.VAE(sd=sd, metadata=metadata)
        vae.throw_exception_if_invalid()
-        # Register a reload factory on the patcher so MultiGPU work-units can use
-        # ModelPatcher.deepclone_multigpu to produce per-device clones from the
-        # same loader context (mirrors UNETLoader / CLIPLoader / checkpoint loader).
-        if vae_path is not None:
-            vae.patcher.cached_patcher_init = (comfy.sd.load_vae_patcher, (vae_path, metadata, resolved))
        return (vae,)

 class ControlNetLoader:
@ -1018,20 +937,13 @@ class UNETLoader:
    def INPUT_TYPES(s):
        return {"required": { "unet_name": (folder_paths.get_filename_list("diffusion_models"), ),
                              "weight_dtype": (["default", "fp8_e4m3fn", "fp8_e4m3fn_fast", "fp8_e5m2"], {"advanced": True})
-                             },
-                "optional": {
-                              "device": (comfy.model_management.get_gpu_device_options(), {"advanced": True}),
                             }}
    RETURN_TYPES = ("MODEL",)
    FUNCTION = "load_unet"

    CATEGORY = "advanced/loaders"

-    @classmethod
-    def VALIDATE_INPUTS(cls, device="default"):
-        return True
-
-    def load_unet(self, unet_name, weight_dtype, device="default"):
+    def load_unet(self, unet_name, weight_dtype):
        model_options = {}
        if weight_dtype == "fp8_e4m3fn":
            model_options["dtype"] = torch.float8_e4m3fn
@ -1041,13 +953,6 @@ class UNETLoader:
        elif weight_dtype == "fp8_e5m2":
            model_options["dtype"] = torch.float8_e5m2

-        resolved = comfy.model_management.resolve_gpu_device_option(device)
-        if resolved is not None:
-            if resolved.type == "cpu":
-                model_options["load_device"] = model_options["offload_device"] = resolved
-            else:
-                model_options["load_device"] = resolved
-
        unet_path = folder_paths.get_full_path_or_raise("diffusion_models", unet_name)
        model = comfy.sd.load_diffusion_model(unet_path, model_options=model_options)
        return (model,)
@ -1059,7 +964,7 @@ class CLIPLoader:
                              "type": (["stable_diffusion", "stable_cascade", "sd3", "stable_audio", "mochi", "ltxv", "pixart", "cosmos", "lumina2", "wan", "hidream", "chroma", "ace", "omnigen2", "qwen_image", "hunyuan_image", "flux2", "ovis", "longcat_image", "cogvideox"], ),
                              },
                "optional": {
-                              "device": (comfy.model_management.get_gpu_device_options(), {"advanced": True}),
+                              "device": (["default", "cpu"], {"advanced": True}),
                             }}
    RETURN_TYPES = ("CLIP",)
    FUNCTION = "load_clip"
@ -1068,20 +973,12 @@ class CLIPLoader:

    DESCRIPTION = "[Recipes]\n\nstable_diffusion: clip-l\nstable_cascade: clip-g\nsd3: t5 xxl/ clip-g / clip-l\nstable_audio: t5 base\nmochi: t5 xxl\ncogvideox: t5 xxl (226-token padding)\ncosmos: old t5 xxl\nlumina2: gemma 2 2B\nwan: umt5 xxl\n hidream: llama-3.1 (Recommend) or t5\nomnigen2: qwen vl 2.5 3B"

-    @classmethod
-    def VALIDATE_INPUTS(cls, device="default"):
-        return True
-
    def load_clip(self, clip_name, type="stable_diffusion", device="default"):
        clip_type = getattr(comfy.sd.CLIPType, type.upper(), comfy.sd.CLIPType.STABLE_DIFFUSION)

        model_options = {}
-        resolved = comfy.model_management.resolve_gpu_device_option(device)
-        if resolved is not None:
-            if resolved.type == "cpu":
-                model_options["load_device"] = model_options["offload_device"] = resolved
-            else:
-                model_options["load_device"] = resolved
+        if device == "cpu":
+            model_options["load_device"] = model_options["offload_device"] = torch.device("cpu")

        clip_path = folder_paths.get_full_path_or_raise("text_encoders", clip_name)
        clip = comfy.sd.load_clip(ckpt_paths=[clip_path], embedding_directory=folder_paths.get_folder_paths("embeddings"), clip_type=clip_type, model_options=model_options)
@ -1095,7 +992,7 @@ class DualCLIPLoader:
                              "type": (["sdxl", "sd3", "flux", "hunyuan_video", "hidream", "hunyuan_image", "hunyuan_video_15", "kandinsky5", "kandinsky5_image", "ltxv", "newbie", "ace"], ),
                              },
                "optional": {
-                              "device": (comfy.model_management.get_gpu_device_options(), {"advanced": True}),
+                              "device": (["default", "cpu"], {"advanced": True}),
                             }}
    RETURN_TYPES = ("CLIP",)
    FUNCTION = "load_clip"
@ -1104,10 +1001,6 @@ class DualCLIPLoader:

    DESCRIPTION = "[Recipes]\n\nsdxl: clip-l, clip-g\nsd3: clip-l, clip-g / clip-l, t5 / clip-g, t5\nflux: clip-l, t5\nhidream: at least one of t5 or llama, recommended t5 and llama\nhunyuan_image: qwen2.5vl 7b and byt5 small\nnewbie: gemma-3-4b-it, jina clip v2"

-    @classmethod
-    def VALIDATE_INPUTS(cls, device="default"):
-        return True
-
    def load_clip(self, clip_name1, clip_name2, type, device="default"):
        clip_type = getattr(comfy.sd.CLIPType, type.upper(), comfy.sd.CLIPType.STABLE_DIFFUSION)

@ -1115,12 +1008,8 @@ class DualCLIPLoader:
        clip_path2 = folder_paths.get_full_path_or_raise("text_encoders", clip_name2)

        model_options = {}
-        resolved = comfy.model_management.resolve_gpu_device_option(device)
-        if resolved is not None:
-            if resolved.type == "cpu":
-                model_options["load_device"] = model_options["offload_device"] = resolved
-            else:
-                model_options["load_device"] = resolved
+        if device == "cpu":
+            model_options["load_device"] = model_options["offload_device"] = torch.device("cpu")

        clip = comfy.sd.load_clip(ckpt_paths=[clip_path1, clip_path2], embedding_directory=folder_paths.get_folder_paths("embeddings"), clip_type=clip_type, model_options=model_options)
        return (clip,)
@ -2183,7 +2072,6 @@ NODE_CLASS_MAPPINGS = {
    "InpaintModelConditioning": InpaintModelConditioning,

    "CheckpointLoader": CheckpointLoader,
-    "CheckpointLoaderDevice": CheckpointLoaderDevice,
    "DiffusersLoader": DiffusersLoader,

    "LoadLatent": LoadLatent,
@ -2201,7 +2089,6 @@ NODE_DISPLAY_NAME_MAPPINGS = {
    # Loaders
    "CheckpointLoader": "Load Checkpoint With Config (DEPRECATED)",
    "CheckpointLoaderSimple": "Load Checkpoint",
-    "CheckpointLoaderDevice": "Load Checkpoint (Device)",
    "VAELoader": "Load VAE",
    "LoraLoader": "Load LoRA (Model and CLIP)",
    "LoraLoaderModelOnly": "Load LoRA",
--- a/openapi.yaml
+++ b/openapi.yaml
--- a/requirements.txt
+++ b/requirements.txt
@ -23,7 +23,7 @@ SQLAlchemy>=2.0.0
 filelock
 av>=14.2.0
 comfy-kitchen>=0.2.8
-comfy-aimdo==0.4.3
+comfy-aimdo==0.4.4
 requests
 simpleeval>=1.0.0
 blake3
--- a/tests-unit/comfy_test/multigpu_test.py
+++ b/tests-unit/comfy_test/multigpu_test.py
@ -1,147 +0,0 @@
-import importlib
-import sys
-import types
-
-import torch
-
-import comfy.utils
-
-
-def install_fake_comfy_aimdo(monkeypatch):
-    package = types.ModuleType("comfy_aimdo")
-    package.__path__ = []
-    monkeypatch.setitem(sys.modules, "comfy_aimdo", package)
-    for name in ("vram_buffer", "host_buffer", "torch", "model_vbar", "model_mmap", "control"):
-        module = types.ModuleType(f"comfy_aimdo.{name}")
-        monkeypatch.setitem(sys.modules, f"comfy_aimdo.{name}", module)
-        setattr(package, name, module)
-
-
-def test_tiled_scale_multidim_multigpu_clips_edge_tiles(monkeypatch):
-    monkeypatch.setattr(torch.cuda, "set_device", lambda device: None)
-    monkeypatch.setattr(torch.cuda, "synchronize", lambda device: None)
-
-    scale = 1.1
-
-    def upscale(a):
-        return torch.ones((a.shape[0], 1, round(a.shape[-1] * scale)), dtype=a.dtype, device=a.device)
-
-    samples = torch.ones((1, 1, 11))
-    devices = [torch.device("cpu:0"), torch.device("cpu:1")]
-
-    actual = comfy.utils.tiled_scale_multidim_multigpu(
-        samples,
-        {device: upscale for device in devices},
-        tile=(5,),
-        overlap=2,
-        upscale_amount=scale,
-        out_channels=1,
-        output_device="cpu",
-    )
-    expected = comfy.utils.tiled_scale_multidim(
-        samples,
-        upscale,
-        tile=(5,),
-        overlap=2,
-        upscale_amount=scale,
-        out_channels=1,
-        output_device="cpu",
-    )
-
-    assert actual.shape == expected.shape == (1, 1, 12)
-    torch.testing.assert_close(actual, expected)
-
-
-def test_upscale_model_deepclone_does_not_copy_existing_clone_graph(monkeypatch):
-    class FakeModel:
-        def __init__(self):
-            self.param = torch.nn.Parameter(torch.ones(1))
-
-        def eval(self):
-            return self
-
-        def parameters(self):
-            return [self.param]
-
-    class FakeDescriptor:
-        def __init__(self):
-            self.model = FakeModel()
-            self.device = None
-
-        def to(self, device):
-            self.device = device
-            return self
-
-    first_device = torch.device("cpu:0")
-    second_device = torch.device("cpu:1")
-    stale_device = torch.device("cpu:2")
-    existing_clone = FakeDescriptor()
-    stale_clone = FakeDescriptor()
-    source = FakeDescriptor()
-    source.multigpu_clones = {first_device: existing_clone, stale_device: stale_clone}
-    fake_model_management = types.ModuleType("comfy.model_management")
-    fake_model_management.get_all_torch_devices = lambda exclude_current=True: [first_device, second_device]
-    monkeypatch.setitem(sys.modules, "comfy.model_management", fake_model_management)
-    import comfy
-    monkeypatch.setattr(comfy, "model_management", fake_model_management, raising=False)
-    import comfy.multigpu
-    importlib.reload(comfy.multigpu)
-
-    cloned = comfy.multigpu.create_upscale_model_multigpu_deepclones(source, max_gpus=3)
-
-    assert cloned is not source
-    assert cloned.multigpu_clones[first_device] is existing_clone
-    assert stale_device not in cloned.multigpu_clones
-    assert second_device in cloned.multigpu_clones
-    assert not hasattr(cloned.multigpu_clones[second_device], "multigpu_clones")
-    assert cloned.multigpu_clones[second_device].device == "cpu"
-    assert not cloned.multigpu_clones[second_device].model.param.requires_grad
-
-    single_gpu_clone = comfy.multigpu.create_upscale_model_multigpu_deepclones(source, max_gpus=1)
-    assert single_gpu_clone is not source
-    assert not hasattr(single_gpu_clone, "multigpu_clones")
-
-
-def test_checkpoint_loader_registers_vae_cached_patcher(monkeypatch):
-    install_fake_comfy_aimdo(monkeypatch)
-    import comfy.sd
-    importlib.reload(comfy.sd)
-
-    class FakeVAE:
-        def __init__(self):
-            self.patcher = types.SimpleNamespace(cached_patcher_init=None)
-
-    model_patcher = types.SimpleNamespace(cached_patcher_init=None)
-    vae = FakeVAE()
-    metadata = {"format": "checkpoint"}
-    monkeypatch.setattr(comfy.utils, "load_torch_file", lambda path, return_metadata=False: ({}, metadata))
-    monkeypatch.setattr(
-        comfy.sd,
-        "load_state_dict_guess_config",
-        lambda *args, **kwargs: (model_patcher, None, vae, None),
-    )
-
-    comfy.sd.load_checkpoint_guess_config("checkpoint.safetensors", output_vae=True)
-
-    assert model_patcher.cached_patcher_init[0] is comfy.sd.load_checkpoint_guess_config
-    assert vae.patcher.cached_patcher_init[0] is comfy.sd.load_checkpoint_vae_patcher
-    assert vae.patcher.cached_patcher_init[1][0] == "checkpoint.safetensors"
-
-
-def test_checkpoint_loader_skips_cached_patcher_for_placeholder_vae(monkeypatch):
-    install_fake_comfy_aimdo(monkeypatch)
-    import comfy.sd
-    importlib.reload(comfy.sd)
-
-    model_patcher = types.SimpleNamespace(cached_patcher_init=None)
-    placeholder_vae = types.SimpleNamespace()
-    metadata = {"format": "checkpoint"}
-    monkeypatch.setattr(comfy.utils, "load_torch_file", lambda path, return_metadata=False: ({}, metadata))
-    monkeypatch.setattr(
-        comfy.sd,
-        "load_state_dict_guess_config",
-        lambda *args, **kwargs: (model_patcher, None, placeholder_vae, None),
-    )
-
-    assert comfy.sd.load_checkpoint_guess_config("diffusion_only.safetensors", output_vae=True)[2] is placeholder_vae
-    assert model_patcher.cached_patcher_init[0] is comfy.sd.load_checkpoint_guess_config