In weight_decompose(), the 1D dora_scale tensor [N] divided by the
multi-dimensional weight_norm [N, 1, ...] would incorrectly broadcast
to [N, N, ...] (outer-product shape) instead of element-wise [N, 1, ...].
This caused shape mismatches when applying DoRA to non-square weight
matrices (e.g. MLP layers where d_ff != d_model), while silently
producing correct results for square weights (most attention Q/K/V/O).
Fix: explicitly reshape dora_scale to match weight_norm's dimensionality
before the division.
Fixes#12938
Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
* lora: add weight shape calculations.
This lets the loader know if a lora will change the shape of a weight
so it can take appropriate action.
* MPDynamic: force load flux img_in weight
This weight is a bit special, in that the lora changes its geometry.
This is rather unique, not handled by existing estimate and doesn't
work for either offloading or dynamic_vram.
Fix for dynamic_vram as a special case. Ideally we can fully precalculate
these lora geometry changes at load time, but just get these models
working first.
* Fix bypass dtype/device moving
* Force offloading mode for training
* training context var
* offloading implementation in training node
* fix wrong input type
* Support bypass load lora model, correct adapter/offloading handling
* Add factorization utils for lokr
* Add lokr train impl
* Add loha train impl
* Add adapter map for algo selection
* Add optional grad ckpt and algo selection
* Update __init__.py
* correct key name for loha
* Use custom fwd/bwd func and better init for loha
* Support gradient accumulation
* Fix bugs of loha
* use more stable init
* Add OFT training
* linting