Merge cb213aee66 into 4a8cf359fe

Expand AMD ROCm Tips readme section
Add suggestion to disable online tuning Add miopen info Add flash attention info Add vram oom suggestion
2026-03-22 09:33:29 +08:00 · 2026-03-13 06:57:41 +00:00 · 2026-02-07 18:03:16 +00:00
1 changed files with 5 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -367,7 +367,11 @@ You can enable experimental memory efficient attention on recent pytorch in Comf

 ```TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention```

-You can also try setting this env variable `PYTORCH_TUNABLEOP_ENABLED=1` which might speed things up at the cost of a very slow initial run.
+You can also try:
+* Tunable ops: Setting `PYTORCH_TUNABLEOP_ENABLED=1` which might speed things up at the cost of a very slow initial runs. After running online tuning for a while consider disabling it with `PYTORCH_TUNABLEOP_TUNING=0` to only used the tuned settings and avoid slowdowns.
+* MIOpen: Currently disabled by default. Enable with `COMFYUI_ENABLE_MIOPEN=1`. Be aware that miopen will autotune by default, consider disabling it with `MIOPEN_FIND_MODE=FAST` to avoid tuning slowdowns.
+* Flash attention: Install from [flash-attention](https://github.com/Dao-AILab/flash-attention) & enable with `FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE` and arg `--use-flash-attention`. See also notes in the repo on triton autotuning.
+* If you are encountering VRAM OOMs `PYTORCH_NO_HIP_MEMORY_CACHING=1` may help.

 # Notes
Author	SHA1	Message	Date
Alex Butler	1dc76f39eb	Merge `cb213aee66` into `4a8cf359fe`	2026-03-13 06:57:41 +00:00
Alex Butler	cb213aee66	Expand AMD ROCm Tips readme section Add suggestion to disable online tuning Add miopen info Add flash attention info Add vram oom suggestion	2026-02-07 18:03:16 +00:00