patientx
d8528ac31e
Merge branch 'comfyanonymous:master' into master
2025-10-29 12:42:07 +03:00
comfyanonymous
e525673f72
Fix issue. ( #10527 )
2025-10-29 00:37:00 -04:00
comfyanonymous
3fa7a5c04a
Speed up offloading using pinned memory. ( #10526 )
...
To enable this feature use: --fast pinned_memory
2025-10-29 00:21:01 -04:00
contentis
8817f8fc14
Mixed Precision Quantization System ( #10498 )
...
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Fix missing keys
* Rename quant dtype parameter
* Rename quant dtype parameter
* Fix unittests for CPU build
2025-10-28 16:20:53 -04:00
patientx
8590e1f713
Merge branch 'comfyanonymous:master' into master
2025-10-26 14:29:29 +03:00
comfyanonymous
f6bbc1ac84
Fix mistake. ( #10484 )
2025-10-25 23:07:29 -04:00
comfyanonymous
098a352f13
Add warning for torch-directml usage ( #10482 )
...
Added a warning message about the state of torch-directml.
2025-10-25 20:05:22 -04:00
comfyanonymous
426cde37f1
Remove useless function ( #10472 )
2025-10-24 19:56:51 -04:00
patientx
0b5b11787f
Merge branch 'comfyanonymous:master' into master
2025-10-24 13:17:09 +03:00
comfyanonymous
1bcda6df98
WIP way to support multi multi dimensional latents. ( #10456 )
2025-10-23 21:21:14 -04:00
patientx
043932cb67
Merge pull request #347 from nota-rudveld/patch-2
...
fix: check for version compatibility before calling PyTorch method
2025-10-23 02:28:35 +03:00
patientx
d4bcb93575
Merge branch 'comfyanonymous:master' into master
2025-10-22 11:34:33 +03:00
comfyanonymous
9cdc64998f
Only disable cudnn on newer AMD GPUs. ( #10437 )
2025-10-21 19:15:23 -04:00
patientx
5bf1c8be44
Merge branch 'comfyanonymous:master' into master
2025-10-21 03:49:14 +03:00
nota-rudveld
c1c044c8cb
fix: check for version compatibility before calling PyTorch method
2025-10-20 17:28:14 -04:00
comfyanonymous
2c2aa409b0
Log message for cudnn disable on AMD. ( #10418 )
2025-10-20 15:43:24 -04:00
patientx
657a7872ab
Merge branch 'comfyanonymous:master' into master
2025-10-19 15:20:17 +03:00
comfyanonymous
b4f30bd408
Pytorch is stupid. ( #10398 )
2025-10-19 01:25:35 -04:00
comfyanonymous
dad076aee6
Speed up chroma radiance. ( #10395 )
2025-10-18 23:19:52 -04:00
comfyanonymous
0cf33953a7
Fix batch size above 1 giving bad output in chroma radiance. ( #10394 )
2025-10-18 23:15:34 -04:00
comfyanonymous
5b80addafd
Turn off cuda malloc by default when --fast autotune is turned on. ( #10393 )
2025-10-18 22:35:46 -04:00
comfyanonymous
9da397ea2f
Disable torch compiler for cast_bias_weight function ( #10384 )
...
* Disable torch compiler for cast_bias_weight function
* Fix torch compile.
2025-10-17 20:03:28 -04:00
patientx
76dde47dbb
Merge branch 'comfyanonymous:master' into master
2025-10-18 00:05:02 +03:00
comfyanonymous
b1293d50ef
workaround also works on cudnn 91200 ( #10375 )
2025-10-16 19:59:56 -04:00
comfyanonymous
19b466160c
Workaround for nvidia issue where VAE uses 3x more memory on torch 2.9 ( #10373 )
2025-10-16 18:16:03 -04:00
patientx
7b0643ada1
Merge branch 'comfyanonymous:master' into master
2025-10-16 16:40:50 +03:00
Faych
afa8a24fe1
refactor: Replace manual patches merging with merge_nested_dicts ( #10360 )
2025-10-15 17:16:09 -07:00
Jedrzej Kosinski
493b81e48f
Fix order of inputs nested merge_nested_dicts ( #10362 )
2025-10-15 16:47:26 -07:00
patientx
26589a3a0b
Merge branch 'comfyanonymous:master' into master
2025-10-15 12:18:21 +03:00
comfyanonymous
1c10b33f9b
gfx942 doesn't support fp8 operations. ( #10348 )
2025-10-15 00:21:11 -04:00
comfyanonymous
3374e900d0
Faster workflow cancelling. ( #10301 )
2025-10-13 23:43:53 -04:00
comfyanonymous
dfff7e5332
Better memory estimation for the SD/Flux VAE on AMD. ( #10334 )
2025-10-13 22:37:19 -04:00
comfyanonymous
e4ea393666
Fix loading old stable diffusion ckpt files on newer numpy. ( #10333 )
2025-10-13 22:18:58 -04:00
comfyanonymous
c8674bc6e9
Enable RDNA4 pytorch attention on ROCm 7.0 and up. ( #10332 )
2025-10-13 21:19:03 -04:00
patientx
eae7a58e60
Merge branch 'comfyanonymous:master' into master
2025-10-14 02:07:30 +03:00
rattus128
95ca2e56c8
WAN2.2: Fix cache VRAM leak on error ( #10308 )
...
Same change pattern as 7e8dd275c2
applied to WAN2.2
If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.
The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.
Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.
So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-13 15:23:11 -04:00
comfyanonymous
e693e4db6a
Always set diffusion model to eval() mode. ( #10331 )
2025-10-13 14:57:27 -04:00
patientx
fa7942933b
Merge branch 'comfyanonymous:master' into master
2025-10-12 13:56:39 +03:00
comfyanonymous
a125cd84b0
Improve AMD performance. ( #10302 )
...
I honestly have no idea why this improves things but it does.
2025-10-12 00:28:01 -04:00
comfyanonymous
84e9ce32c6
Implement the mmaudio VAE. ( #10300 )
2025-10-11 22:57:23 -04:00
patientx
aa6afacc01
Merge branch 'comfyanonymous:master' into master
2025-10-10 02:25:35 +03:00
comfyanonymous
f1dd6e50f8
Fix bug with applying loras on fp8 scaled without fp8 ops. ( #10279 )
2025-10-09 19:02:40 -04:00
patientx
3553ce45e5
Merge branch 'comfyanonymous:master' into master
2025-10-09 23:40:21 +03:00
comfyanonymous
139addd53c
More surgical fix for #10267 ( #10276 )
2025-10-09 16:37:35 -04:00
patientx
77fc639ed2
Merge branch 'comfyanonymous:master' into master
2025-10-09 15:55:12 +03:00
comfyanonymous
6e59934089
Refactor model sampling sigmas code. ( #10250 )
2025-10-08 17:49:02 -04:00
patientx
2502069447
Merge branch 'comfyanonymous:master' into master
2025-10-07 14:02:12 +03:00
comfyanonymous
8aea746212
Implement gemma 3 as a text encoder. ( #10241 )
...
Not useful yet.
2025-10-06 22:08:08 -04:00
patientx
e5c08bb5c2
Merge branch 'comfyanonymous:master' into master
2025-10-06 00:08:39 +03:00
comfyanonymous
195e0b0639
Remove useless code. ( #10223 )
2025-10-05 15:41:19 -04:00