MPS does not support float8_e4m3fn/float8_e5m2 dtypes. When FP8-quantized
models (FLUX, SD3.5, Wan 2.2, LTX-Video) are loaded on Apple Silicon, the
quantization step crashes with:
TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does
not have support for that dtype.
This adds device-aware fallbacks that move tensors to CPU for the FP8
quantization step only. The rest of inference remains on MPS.
Three code paths are patched:
- comfy/float.py: stochastic_rounding() — also fixes the secondary
"Placeholder storage has not been allocated on MPS device!" error
caused by torch.Generator being bound to MPS.
- comfy/float.py: stochastic_round_quantize_nvfp4*() — these create
float8_e4m3fn block scales internally.
- comfy/quant_ops.py: _TensorCoreFP8LayoutBase.quantize() — the
ck.quantize_per_tensor_fp8 path also fails on MPS.
Fixes: #6995, #9255, #11626, #11817
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>