Add some warnings for pin and unpin errors. (#11561 )

mm: discard async errors from pinning failures (#10738 )
Pretty much every error cudaHostRegister can throw also queues the same error on the async GPU queue. This was fixed for repinning error case, but there is the bad mmap and just enomem cases that are harder to detect. Do some dummy GPU work to clean the error state.
2026-05-26 17:07:25 +08:00 · 2025-12-29 18:26:42 -05:00 · 2025-12-29 18:19:34 -05:00
1 changed files with 16 additions and 0 deletions
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@ -1126,6 +1126,16 @@ if not args.disable_pinned_memory:

 PINNING_ALLOWED_TYPES = set(["Parameter", "QuantizedTensor"])

+def discard_cuda_async_error():
+    try:
+        a = torch.tensor([1], dtype=torch.uint8, device=get_torch_device())
+        b = torch.tensor([1], dtype=torch.uint8, device=get_torch_device())
+        _ = a + b
+        torch.cuda.synchronize()
+    except torch.AcceleratorError:
+        #Dump it! We already know about it from the synchronous return
+        pass
+
 def pin_memory(tensor):
    global TOTAL_PINNED_MEMORY
    if MAX_PINNED_MEMORY <= 0:
@ -1158,6 +1168,9 @@ def pin_memory(tensor):
        PINNED_MEMORY[ptr] = size
        TOTAL_PINNED_MEMORY += size
        return True
+    else:
+        logging.warning("Pin error.")
+        discard_cuda_async_error()

    return False

@ -1186,6 +1199,9 @@ def unpin_memory(tensor):
        if len(PINNED_MEMORY) == 0:
            TOTAL_PINNED_MEMORY = 0
        return True
+    else:
+        logging.warning("Unpin error.")
+        discard_cuda_async_error()

    return False
Author	SHA1	Message	Date
comfyanonymous	0e6221cc79	Add some warnings for pin and unpin errors. (#11561 ) Some checks are pending Python Linting / Run Ruff (push) Waiting to run Details Python Linting / Run Pylint (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run Details Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run Details Execution Tests / test (macos-latest) (push) Waiting to run Details Execution Tests / test (ubuntu-latest) (push) Waiting to run Details Execution Tests / test (windows-latest) (push) Waiting to run Details Test server launches without errors / test (push) Waiting to run Details Unit Tests / test (macos-latest) (push) Waiting to run Details Unit Tests / test (ubuntu-latest) (push) Waiting to run Details Unit Tests / test (windows-2022) (push) Waiting to run Details	2025-12-29 18:26:42 -05:00
rattus	9ca7e143af	mm: discard async errors from pinning failures (#10738 ) Pretty much every error cudaHostRegister can throw also queues the same error on the async GPU queue. This was fixed for repinning error case, but there is the bad mmap and just enomem cases that are harder to detect. Do some dummy GPU work to clean the error state.	2025-12-29 18:19:34 -05:00