Yu Li
|
5ba2d28b7f
|
add block-wise scaled int8 quantization based on QuantizedLayout mechanism
add more tests by comparing with manual torch implementation
add perf benchmarks
fix errors caused by merging
default no output quant
fix unittest
|
2025-12-10 12:23:05 -06:00 |
|
comfyanonymous
|
73e3a9e676
|
Clamp output when rounding weight to prevent Nan.
|
2024-10-19 19:07:10 -04:00 |
|
comfyanonymous
|
7d2467e830
|
Some minor cleanups.
|
2024-10-05 13:22:39 -04:00 |
|
comfyanonymous
|
00a5d08103
|
Lower fp8 lora memory usage.
|
2024-09-03 01:25:05 -04:00 |
|
comfyanonymous
|
2ca8f6e23d
|
Make the stochastic fp8 rounding reproducible.
|
2024-08-26 15:12:06 -04:00 |
|
comfyanonymous
|
7985ff88b9
|
Use less memory in float8 lora patching by doing calculations in fp16.
|
2024-08-26 14:45:58 -04:00 |
|
comfyanonymous
|
4506ddc86a
|
Better subnormal fp8 stochastic rounding. Thanks Ashen.
|
2024-08-19 13:38:03 -04:00 |
|
comfyanonymous
|
22ec02afc0
|
Handle subnormal numbers in float8 rounding.
|
2024-08-19 05:51:08 -04:00 |
|
comfyanonymous
|
bb222ceddb
|
Fix loras having a weak effect when applied on fp8.
|
2024-08-17 15:20:17 -04:00 |
|