tashiscool
fde8da88e8
Merge 45a2363e6a into ec1896aceb
2026-05-30 01:18:59 +08:00
rattus
684296148e
float: use CK stochastic rounding cuda kernel ( #13971 )
Detect Unreviewed Merge / detect (push) Waiting to run
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Build package / Build Test (3.10) (push) Waiting to run
Build package / Build Test (3.11) (push) Waiting to run
Build package / Build Test (3.12) (push) Waiting to run
Build package / Build Test (3.13) (push) Waiting to run
Build package / Build Test (3.14) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
2026-05-28 19:23:42 -07:00
Jukka Seppänen
1c5db7397d
feat: Support mxfp8 ( #12907 )
2026-03-14 18:36:29 -04:00
Tashdid Khan
edd44a6874
fix: add CPU fallback for FP8 quantization on MPS (Apple Silicon)
...
MPS does not support float8_e4m3fn/float8_e5m2 dtypes. When FP8-quantized
models (FLUX, SD3.5, Wan 2.2, LTX-Video) are loaded on Apple Silicon, the
quantization step crashes with:
TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does
not have support for that dtype.
This adds device-aware fallbacks that move tensors to CPU for the FP8
quantization step only. The rest of inference remains on MPS.
Three code paths are patched:
- comfy/float.py: stochastic_rounding() — also fixes the secondary
"Placeholder storage has not been allocated on MPS device!" error
caused by torch.Generator being bound to MPS.
- comfy/float.py: stochastic_round_quantize_nvfp4*() — these create
float8_e4m3fn block scales internally.
- comfy/quant_ops.py: _TensorCoreFP8LayoutBase.quantize() — the
ck.quantize_per_tensor_fp8 path also fails on MPS.
Fixes : #6995 , #9255 , #11626 , #11817
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 21:21:31 -05:00
comfyanonymous
6165c38cb5
Optimize nvfp4 lora applying. ( #11866 )
...
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
This changes results a bit but it also speeds up things a lot.
2026-01-14 00:49:38 -05:00
comfyanonymous
eff2b9d412
Optimize nvfp4 lora applying. ( #11856 )
2026-01-13 19:37:19 -05:00
comfyanonymous
15b312de7a
Optimize nvfp4 lora applying. ( #11854 )
2026-01-13 19:23:58 -05:00
comfyanonymous
117e7a5853
Refactor to try to lower mem usage. ( #11840 )
2026-01-12 21:01:52 -08:00
comfyanonymous
b3c0e4de57
Make loras work on nvfp4 models. ( #11837 )
...
The initial applying is a bit slow but will probably be sped up in the
future.
2026-01-12 22:33:54 -05:00
comfyanonymous
73e3a9e676
Clamp output when rounding weight to prevent Nan.
2024-10-19 19:07:10 -04:00
comfyanonymous
7d2467e830
Some minor cleanups.
2024-10-05 13:22:39 -04:00
comfyanonymous
00a5d08103
Lower fp8 lora memory usage.
2024-09-03 01:25:05 -04:00
comfyanonymous
2ca8f6e23d
Make the stochastic fp8 rounding reproducible.
2024-08-26 15:12:06 -04:00
comfyanonymous
7985ff88b9
Use less memory in float8 lora patching by doing calculations in fp16.
2024-08-26 14:45:58 -04:00
comfyanonymous
4506ddc86a
Better subnormal fp8 stochastic rounding. Thanks Ashen.
2024-08-19 13:38:03 -04:00
comfyanonymous
22ec02afc0
Handle subnormal numbers in float8 rounding.
2024-08-19 05:51:08 -04:00
comfyanonymous
bb222ceddb
Fix loras having a weak effect when applied on fp8.
2024-08-17 15:20:17 -04:00