comfyanonymous
43071e3de3
Make old scaled fp8 format use the new mixed quant ops system. ( #11000 )
2025-12-05 14:35:42 -05:00
Urle Sistiana
6484ac89dc
fix QuantizedTensor.is_contiguous ( #10956 ) ( #10959 )
2025-11-28 16:33:07 -05:00
rattus
3f382a4f98
quant ops: Dequantize weight in-place ( #10935 )
...
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
In flux2 these weights are huge (200MB). As plain_tensor is a throw-away
deep copy, do this multiplication in-place to save VRAM.
2025-11-27 08:06:30 -08:00
comfyanonymous
bdb10a583f
Fix loras not working on mixed fp8. ( #10899 )
2025-11-26 00:07:58 -05:00
comfyanonymous
015a0599d0
I found a case where this is needed ( #10875 )
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
2025-11-25 03:23:19 -05:00
comfyanonymous
b6805429b9
Allow pinning quantized tensors. ( #10873 )
2025-11-25 02:48:20 -05:00
comfyanonymous
25022e0b09
Cleanup and fix issues with text encoder quants. ( #10872 )
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Build package / Build Test (3.10) (push) Waiting to run
Build package / Build Test (3.11) (push) Waiting to run
Build package / Build Test (3.12) (push) Waiting to run
Build package / Build Test (3.13) (push) Waiting to run
Build package / Build Test (3.9) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Waiting to run
Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Waiting to run
Execution Tests / test (macos-latest) (push) Waiting to run
Execution Tests / test (ubuntu-latest) (push) Waiting to run
Execution Tests / test (windows-latest) (push) Waiting to run
Test server launches without errors / test (push) Waiting to run
Unit Tests / test (macos-latest) (push) Waiting to run
Unit Tests / test (ubuntu-latest) (push) Waiting to run
Unit Tests / test (windows-2022) (push) Waiting to run
2025-11-25 01:48:53 -05:00
contentis
3b3ef9a77a
Quantized Ops fixes ( #10715 )
...
* offload support, bug fixes, remove mixins
* add readme
2025-11-12 18:26:52 -05:00
comfyanonymous
af4b7b5edb
More fp8 torch.compile regressions fixed. ( #10625 )
2025-11-03 22:14:20 -05:00
comfyanonymous
6b88478f9f
Bring back fp8 torch compile performance to what it should be. ( #10622 )
2025-11-03 19:22:10 -05:00
comfyanonymous
e199c8cc67
Fixes ( #10621 )
2025-11-03 17:58:24 -05:00
comfyanonymous
958a17199a
People should update their pytorch versions. ( #10618 )
2025-11-03 17:08:30 -05:00
comfyanonymous
c58c13b2ba
Fix torch compile regression on fp8 ops. ( #10580 )
2025-11-01 00:25:17 -04:00
comfyanonymous
906c089957
Fix small performance regression with fp8 fast and scaled fp8. ( #10537 )
2025-10-29 19:29:01 -04:00
comfyanonymous
1a58087ac2
Reduce memory usage for fp8 scaled op. ( #10531 )
2025-10-29 15:43:51 -04:00
contentis
8817f8fc14
Mixed Precision Quantization System ( #10498 )
...
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Fix missing keys
* Rename quant dtype parameter
* Rename quant dtype parameter
* Fix unittests for CPU build
2025-10-28 16:20:53 -04:00