Jedrzej Kosinski
efcd8280d6
Merge branch 'master' into worksplit-multigpu
2025-09-11 20:59:47 -07:00
comfyanonymous
fb763d4333
Fix amd_min_version crash when cpu device. ( #9754 )
2025-09-07 21:16:29 -04:00
comfyanonymous
bcbd7884e3
Don't enable pytorch attention on AMD if triton isn't available. ( #9747 )
2025-09-07 00:29:38 -04:00
comfyanonymous
27a0fcccc3
Enable bf16 VAE on RDNA4. ( #9746 )
2025-09-06 23:25:22 -04:00
Jedrzej Kosinski
9e9c129cd0
Merge remote-tracking branch 'origin/master' into worksplit-multigpu
2025-08-29 23:36:19 -07:00
comfyanonymous
0963493a9c
Support for Qwen Diffsynth Controlnets canny and depth. ( #9465 )
...
These are not real controlnets but actually a patch on the model so they
will be treated as such.
Put them in the models/model_patches/ folder.
Use the new ModelPatchLoader and QwenImageDiffsynthControlnet nodes.
2025-08-20 22:26:37 -04:00
Jedrzej Kosinski
1489399cb5
Merge branch 'master' into worksplit-multigpu
2025-08-13 19:47:08 -07:00
Simon Lui
c991a5da65
Fix XPU iGPU regressions ( #9322 )
...
* Change bf16 check and switch non-blocking to off default with option to force to regain speed on certain classes of iGPUs and refactor xpu check.
* Turn non_blocking off by default for xpu.
* Update README.md for Intel GPUs.
2025-08-13 19:13:35 -04:00
Jedrzej Kosinski
962c3c832c
Merge branch 'master' into worksplit-multigpu
2025-08-11 14:09:41 -07:00
comfyanonymous
5828607ccf
Not sure if AMD actually support fp16 acc but it doesn't crash. ( #9258 )
2025-08-09 12:49:25 -04:00
comfyanonymous
735bb4bdb1
Users report gfx1201 is buggy on flux with pytorch attention. ( #9244 )
2025-08-08 04:21:00 -04:00
Jedrzej Kosinski
9cca36fa2b
Merge branch 'master' into worksplit-multigpu
2025-07-29 12:47:36 -07:00
comfyanonymous
7d593baf91
Extra reserved vram on large cards on windows. ( #9093 )
2025-07-29 04:07:45 -04:00
Jedrzej Kosinski
3b90a30178
Merge branch 'master' into worksplit-multigpu-wip
2025-07-27 01:03:25 -07:00
comfyanonymous
69cb57b342
Print xpu device name. ( #9035 )
2025-07-24 15:06:25 -04:00
honglyua
0ccc88b03f
Support Iluvatar CoreX ( #8585 )
...
* Support Iluvatar CoreX
Co-authored-by: mingjiang.li <mingjiang.li@iluvatar.com>
2025-07-24 13:57:36 -04:00
comfyanonymous
d3504e1778
Enable pytorch attention by default for gfx1201 on torch 2.8 ( #9029 )
2025-07-23 19:21:29 -04:00
comfyanonymous
a86a58c308
Fix xpu function not implemented p2. ( #9027 )
2025-07-23 18:18:20 -04:00
comfyanonymous
39dda1d40d
Fix xpu function not implemented. ( #9026 )
2025-07-23 18:10:59 -04:00
comfyanonymous
5ad33787de
Add default device argument. ( #9023 )
2025-07-23 14:20:49 -04:00
Simon Lui
255f139863
Add xpu version for async offload and some other things. ( #9004 )
2025-07-22 15:20:09 -04:00
Jedrzej Kosinski
d53479a197
Merge branch 'master' into worksplit-multigpu
2025-07-01 17:33:05 -05:00
comfyanonymous
a96e65df18
Disable omnigen2 fp16 on older pytorch versions. ( #8672 )
2025-06-26 03:39:09 -04:00
Jedrzej Kosinski
1ae98932f1
Merge branch 'master' into worksplit-multigpu
2025-06-17 04:58:56 -05:00
comfyanonymous
6e28a46454
Apple most likely is never fixing the fp16 attention bug. ( #8485 )
2025-06-10 13:06:24 -04:00
comfyanonymous
7f800d04fa
Enable AMD fp8 and pytorch attention on some GPUs. ( #8474 )
...
Information is from the pytorch source code.
2025-06-09 12:50:39 -04:00
comfyanonymous
97755eed46
Enable fp8 ops by default on gfx1201 ( #8464 )
2025-06-08 14:15:34 -04:00
comfyanonymous
daf9d25ee2
Cleaner torch version comparisons. ( #8453 )
2025-06-07 10:01:15 -04:00
kosinkadink1@gmail.com
0336b0ace8
Merge branch 'master' into worksplit-multigpu
2025-06-01 02:39:26 -07:00
comfyanonymous
704fc78854
Put ROCm version in tuple to make it easier to enable stuff based on it. ( #8348 )
2025-05-30 15:41:02 -04:00
comfyanonymous
89a84e32d2
Disable initial GPU load when novram is used. ( #8294 )
2025-05-26 16:39:27 -04:00
comfyanonymous
e5799c4899
Enable pytorch attention by default on AMD gfx1151 ( #8282 )
2025-05-26 04:29:25 -04:00
comfyanonymous
0b50d4c0db
Add argument to explicitly enable fp8 compute support. ( #8257 )
...
This can be used to test if your current GPU/pytorch version supports fp8 matrix mult in combination with --fast or the fp8_e4m3fn_fast dtype.
2025-05-23 17:43:50 -04:00
Jedrzej Kosinski
9726eac475
Merge branch 'master' into worksplit-multigpu
2025-05-12 19:29:13 -05:00
comfyanonymous
0a66d4b0af
Per device stream counters for async offload. ( #7873 )
2025-04-29 20:28:52 -04:00
comfyanonymous
5a50c3c7e5
Fix stream priority to support older pytorch. ( #7856 )
2025-04-28 13:07:21 -04:00
comfyanonymous
c8cd7ad795
Use stream for casting if enabled. ( #7833 )
2025-04-27 05:38:11 -04:00
comfyanonymous
0dcc75ca54
Add experimental --async-offload lowvram weight offloading. ( #7820 )
...
This should speed up the lowvram mode a bit. It currently is only enabled when --async-offload is used but it will be enabled by default in the future if there are no problems.
2025-04-26 16:11:21 -04:00
Jedrzej Kosinski
272e8d42c1
Merge branch 'master' into worksplit-multigpu
2025-04-22 22:40:00 -05:00
comfyanonymous
2d6805ce57
Add option for using fp8_e8m0fnu for model weights. ( #7733 )
...
Seems to break every model I have tried but worth testing?
2025-04-22 06:17:38 -04:00
Jedrzej Kosinski
8be711715c
Make unload_all_models account for all devices
2025-04-19 17:35:54 -05:00
Jedrzej Kosinski
2fa9affcc1
Merge branch 'master' into worksplit-multigpu
2025-04-08 22:52:17 -05:00
BiologicalExplosion
2222cf67fd
MLU memory optimization ( #7470 )
...
Co-authored-by: huzhan <huzhan@cambricon.com>
2025-04-02 19:24:04 -04:00
BVH
301e26b131
Add option to store TE in bf16 ( #7461 )
2025-04-01 13:48:53 -04:00
Jedrzej Kosinski
a786ce5ead
Merge branch 'master' into worksplit-multigpu
2025-03-26 22:26:26 -05:00
comfyanonymous
8edc1f44c1
Support more float8 types.
2025-03-25 05:23:49 -04:00
Jedrzej Kosinski
219d3cd0d0
Merge branch 'master' into worksplit-multigpu
2025-03-17 14:26:35 -05:00
FeepingCreature
7aceb9f91c
Add --use-flash-attention flag. ( #7223 )
...
* Add --use-flash-attention flag.
This is useful on AMD systems, as FA builds are still 10% faster than Pytorch cross-attention.
2025-03-14 03:22:41 -04:00
Jedrzej Kosinski
cc928a786d
Merge branch 'master' into worksplit-multigpu
2025-03-13 20:59:11 -05:00
comfyanonymous
35504e2f93
Fix.
2025-03-13 15:03:18 -04:00