Use torch.neg instead of unary minus in RoPE freqs computation

On RTX 5090 (Blackwell) with PyTorch cu130, the unary-minus operation
on a CUDA tensor slice in precompute_freqs_cis crashes during RoPE
computation for the Gemma3 text encoder (CUDA error: unknown error /
access violation depending on driver version), which in turn triggers
cascading DynamicVRAM/RAM exhaustion that's easy to misdiagnose as a
memory issue.

torch.neg(x) is mathematically and numerically identical to -x (verified
bit-for-bit equal on CPU) but apparently avoids whatever code path in
the unary-minus operator dispatch trips up Blackwell/cu130.

Fixes #13977
This commit is contained in:
nahcmon 2026-06-08 18:31:48 +02:00
parent 38f750d80e
commit 39f2960581

View File

@ -437,7 +437,7 @@ def precompute_freqs_cis(head_dim, position_ids, theta, rope_scale=None, rope_di
cos = cos.unsqueeze(1)
sin = sin.unsqueeze(1)
sin_split = sin.shape[-1] // 2
out.append((cos, sin[..., : sin_split], -sin[..., sin_split :]))
out.append((cos, sin[..., : sin_split], torch.neg(sin[..., sin_split :])))
if len(out) == 1:
return out[0]