fix: reshape dora_scale before broadcasting in weight_decompose

In weight_decompose(), the 1D dora_scale tensor [N] divided by the
multi-dimensional weight_norm [N, 1, ...] would incorrectly broadcast
to [N, N, ...] (outer-product shape) instead of element-wise [N, 1, ...].

This caused shape mismatches when applying DoRA to non-square weight
matrices (e.g. MLP layers where d_ff != d_model), while silently
producing correct results for square weights (most attention Q/K/V/O).

Fix: explicitly reshape dora_scale to match weight_norm's dimensionality
before the division.

Fixes #12938

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
This commit is contained in:
easonysliu 2026-03-17 11:17:57 +08:00
parent 8cc746a864
commit e5f6c1ff68

View File

@ -298,6 +298,16 @@ def weight_decompose(
)
weight_norm = weight_norm + torch.finfo(weight.dtype).eps
# Reshape dora_scale to match weight_norm dimensionality to avoid
# incorrect broadcasting. Without this, a 1D dora_scale [N] divided by
# a multi-dim weight_norm [N, 1] would broadcast to [N, N] instead of
# the intended element-wise [N, 1]. This caused shape mismatches for
# non-square weights (e.g. MLP layers where d_ff != d_model).
if wd_on_output_axis:
dora_scale = dora_scale.reshape(weight.shape[0], *[1] * (weight.dim() - 1))
else:
dora_scale = dora_scale.reshape(*[1] * (weight.dim() - 1), weight.shape[-1])
weight_calc *= (dora_scale / weight_norm).type(weight.dtype)
if strength != 1.0:
weight_calc -= weight