patientx
f49d26848b
Merge branch 'comfyanonymous:master' into master
2025-04-30 13:46:06 +03:00
comfyanonymous
0a66d4b0af
Per device stream counters for async offload. ( #7873 )
2025-04-29 20:28:52 -04:00
patientx
0aeb958ea5
Merge branch 'comfyanonymous:master' into master
2025-04-29 01:49:37 +03:00
comfyanonymous
5a50c3c7e5
Fix stream priority to support older pytorch. ( #7856 )
2025-04-28 13:07:21 -04:00
patientx
6244dfa1e1
Merge branch 'comfyanonymous:master' into master
2025-04-28 01:13:53 +03:00
comfyanonymous
c8cd7ad795
Use stream for casting if enabled. ( #7833 )
2025-04-27 05:38:11 -04:00
patientx
9cc8e2e1d0
Merge branch 'comfyanonymous:master' into master
2025-04-26 23:32:14 +03:00
comfyanonymous
0dcc75ca54
Add experimental --async-offload lowvram weight offloading. ( #7820 )
...
This should speed up the lowvram mode a bit. It currently is only enabled when --async-offload is used but it will be enabled by default in the future if there are no problems.
2025-04-26 16:11:21 -04:00
patientx
a397c3aeb3
Merge branch 'comfyanonymous:master' into master
2025-04-22 13:28:29 +03:00
comfyanonymous
2d6805ce57
Add option for using fp8_e8m0fnu for model weights. ( #7733 )
...
Seems to break every model I have tried but worth testing?
2025-04-22 06:17:38 -04:00
patientx
4541842b9a
Merge branch 'comfyanonymous:master' into master
2025-04-03 03:15:32 +03:00
BiologicalExplosion
2222cf67fd
MLU memory optimization ( #7470 )
...
Co-authored-by: huzhan <huzhan@cambricon.com>
2025-04-02 19:24:04 -04:00
patientx
1040220970
Merge branch 'comfyanonymous:master' into master
2025-04-01 22:56:01 +03:00
BVH
301e26b131
Add option to store TE in bf16 ( #7461 )
2025-04-01 13:48:53 -04:00
patientx
8115bdf68a
Merge branch 'comfyanonymous:master' into master
2025-03-25 22:35:14 +03:00
comfyanonymous
8edc1f44c1
Support more float8 types.
2025-03-25 05:23:49 -04:00
patientx
eaf40b802d
Merge branch 'comfyanonymous:master' into master
2025-03-14 12:00:25 +03:00
FeepingCreature
7aceb9f91c
Add --use-flash-attention flag. ( #7223 )
...
* Add --use-flash-attention flag.
This is useful on AMD systems, as FA builds are still 10% faster than Pytorch cross-attention.
2025-03-14 03:22:41 -04:00
comfyanonymous
35504e2f93
Fix.
2025-03-13 15:03:18 -04:00
comfyanonymous
299436cfed
Print mac version.
2025-03-13 10:05:40 -04:00
patientx
c469113159
Merge branch 'comfyanonymous:master' into master
2025-03-09 14:09:50 +03:00
comfyanonymous
0952569493
Fix stable cascade VAE on some lowvram machines.
2025-03-08 20:24:04 -05:00
patientx
0c4bebf5fb
Merge branch 'comfyanonymous:master' into master
2025-03-01 14:59:20 +03:00
Chenlei Hu
4d55f16ae8
Use enum list for --fast options ( #7024 )
2025-03-01 02:37:35 -05:00
patientx
af43425ab5
Update model_management.py
2025-02-28 16:37:55 +03:00
patientx
1871a594ba
Merge branch 'comfyanonymous:master' into master
2025-02-28 11:47:19 +03:00
comfyanonymous
cf0b549d48
--fast now takes a number as argument to indicate how fast you want it.
...
The idea is that you can indicate how much quality vs speed you want.
At the moment:
--fast 2 enables fp16 accumulation if your pytorch supports it.
--fast 5 enables fp8 matrix mult on fp8 models and the optimization above.
--fast without a number enables all optimizations.
2025-02-28 02:48:20 -05:00
comfyanonymous
eb4543474b
Use fp16 for intermediate for fp8 weights with --fast if supported.
2025-02-28 02:17:50 -05:00
comfyanonymous
1804397952
Use fp16 if checkpoint weights are fp16 and the model supports it.
2025-02-27 16:39:57 -05:00
patientx
c4fb9f2a63
Merge branch 'comfyanonymous:master' into master
2025-02-27 13:06:17 +03:00
BiologicalExplosion
89253e9fe5
Support Cambricon MLU ( #6964 )
...
Co-authored-by: huzhan <huzhan@cambricon.com>
2025-02-26 20:45:13 -05:00
patientx
d705fe2e0b
Merge branch 'comfyanonymous:master' into master
2025-02-24 13:42:27 +03:00
comfyanonymous
96d891cb94
Speedup on some models by not upcasting bfloat16 to float32 on mac.
2025-02-24 05:41:32 -05:00
patientx
8142770e5f
Merge branch 'comfyanonymous:master' into master
2025-02-23 14:51:43 +03:00
comfyanonymous
ace899e71a
Prioritize fp16 compute when using allow_fp16_accumulation
2025-02-23 04:45:54 -05:00
patientx
26eb98b96f
Merge branch 'comfyanonymous:master' into master
2025-02-22 14:42:22 +03:00
comfyanonymous
072db3bea6
Assume the mac black image bug won't be fixed before v16.
2025-02-21 20:24:07 -05:00
comfyanonymous
a6deca6d9a
Latest mac still has the black image bug.
2025-02-21 20:14:30 -05:00
patientx
059397437b
Merge branch 'comfyanonymous:master' into master
2025-02-21 23:25:43 +03:00
comfyanonymous
41c30e92e7
Let all model memory be offloaded on nvidia.
2025-02-21 06:32:21 -05:00
patientx
603cacb14a
Merge branch 'comfyanonymous:master' into master
2025-02-20 23:06:56 +03:00
comfyanonymous
12da6ef581
Apparently directml supports fp16.
2025-02-20 09:30:24 -05:00
patientx
3bde94efbb
Merge branch 'comfyanonymous:master' into master
2025-02-18 16:27:10 +03:00
comfyanonymous
b07258cef2
Fix typo.
...
Let me know if this slows things down on 2000 series and below.
2025-02-18 07:28:33 -05:00
patientx
ef2e97356e
Merge branch 'comfyanonymous:master' into master
2025-02-17 14:04:15 +03:00
comfyanonymous
31e54b7052
Improve AMD arch detection.
2025-02-17 04:53:40 -05:00
comfyanonymous
8c0bae50c3
bf16 manual cast works on old AMD.
2025-02-17 04:42:40 -05:00
patientx
e8bf0ca27f
Merge branch 'comfyanonymous:master' into master
2025-02-17 12:42:02 +03:00
comfyanonymous
530412cb9d
Refactor torch version checks to be more future proof.
2025-02-17 04:36:45 -05:00
patientx
45c55f6cd0
Merge branch 'comfyanonymous:master' into master
2025-02-16 14:37:41 +03:00