mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-06-26 09:49:26 +08:00
Use torch.neg instead of unary minus in RoPE freqs computation
On RTX 5090 (Blackwell) with PyTorch cu130, the unary-minus operation on a CUDA tensor slice in precompute_freqs_cis crashes during RoPE computation for the Gemma3 text encoder (CUDA error: unknown error / access violation depending on driver version), which in turn triggers cascading DynamicVRAM/RAM exhaustion that's easy to misdiagnose as a memory issue. torch.neg(x) is mathematically and numerically identical to -x (verified bit-for-bit equal on CPU) but apparently avoids whatever code path in the unary-minus operator dispatch trips up Blackwell/cu130. Fixes #13977
This commit is contained in:
parent
38f750d80e
commit
39f2960581
@ -437,7 +437,7 @@ def precompute_freqs_cis(head_dim, position_ids, theta, rope_scale=None, rope_di
|
|||||||
cos = cos.unsqueeze(1)
|
cos = cos.unsqueeze(1)
|
||||||
sin = sin.unsqueeze(1)
|
sin = sin.unsqueeze(1)
|
||||||
sin_split = sin.shape[-1] // 2
|
sin_split = sin.shape[-1] // 2
|
||||||
out.append((cos, sin[..., : sin_split], -sin[..., sin_split :]))
|
out.append((cos, sin[..., : sin_split], torch.neg(sin[..., sin_split :])))
|
||||||
|
|
||||||
if len(out) == 1:
|
if len(out) == 1:
|
||||||
return out[0]
|
return out[0]
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user