Update quant doc so it's not completely wrong. (#13381)

There is still more that needs to be fixed.
2026-07-12 01:17:46 +08:00 · 2026-04-12 20:27:38 -07:00 · 2026-04-12 20:27:38 -07:00 · 971932346a
commit 971932346a
parent 31283d2892
1 changed files with 4 additions and 4 deletions
--- a/QUANTIZATION.md
+++ b/QUANTIZATION.md
@ -139,9 +139,9 @@ Example:
  "_quantization_metadata": {
    "format_version": "1.0",
    "layers": {
-      "model.layers.0.mlp.up_proj": "float8_e4m3fn",
-      "model.layers.0.mlp.down_proj": "float8_e4m3fn",
-      "model.layers.1.mlp.up_proj": "float8_e4m3fn"
+      "model.layers.0.mlp.up_proj": {"format": "float8_e4m3fn"},
+      "model.layers.0.mlp.down_proj": {"format": "float8_e4m3fn"},
+      "model.layers.1.mlp.up_proj": {"format": "float8_e4m3fn"}
    }
  }
 }
@ -165,4 +165,4 @@ Activation quantization (e.g., for FP8 Tensor Core operations) requires `input_s
 3. **Compute scales**: Derive `input_scale` from collected statistics
 4. **Store in checkpoint**: Save `input_scale` parameters alongside weights

-The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.
+The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.