Commit Graph

118 Commits

Author SHA1 Message Date
comfyanonymous
260b25aef8 Disable non blocking on mps. 2023-12-10 01:30:35 -05:00
comfyanonymous
63349484b8 Make --gpu-only put intermediate values in GPU memory instead of cpu. 2023-12-08 02:35:45 -05:00
comfyanonymous
1f1ef695bb Slightly faster lora applying. 2023-12-06 05:13:14 -05:00
comfyanonymous
dfa7737afb Use .itemsize to get dtype size for fp8. 2023-12-04 11:52:06 -05:00
comfyanonymous
080f5f4e84 UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
comfyanonymous
c013b8e94c Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00
comfyanonymous
9f546f0cb3 Disable xformers when it can't load properly. 2023-11-13 12:31:10 -05:00
comfyanonymous
21ea9c3263 Allow different models to estimate memory usage differently. 2023-11-12 04:03:52 -05:00
comfyanonymous
7f861d49fd Empty the cache when torch cache is more than 25% free mem. 2023-10-22 13:58:12 -04:00
comfyanonymous
daabf7fd3a Add some Quadro cards to the list of cards with broken fp16. 2023-10-16 16:48:46 -04:00
comfyanonymous
db653f4908 Add a --bf16-unet to test running the unet in bf16. 2023-10-13 14:51:10 -04:00
comfyanonymous
728139a5b9 Refactor code so model can be a dtype other than fp32 or fp16. 2023-10-13 14:41:17 -04:00
comfyanonymous
494ddf7717 pytorch_attention_enabled can now return True when xformers is enabled. 2023-10-11 21:30:57 -04:00
comfyanonymous
18e4504de7 Pull some small changes from the other repo. 2023-10-11 20:38:48 -04:00
Simon Lui
47164eb065 Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively. 2023-09-22 21:11:27 -07:00
comfyanonymous
795f5b3163 Only do the cast on the device if the device supports it. 2023-09-20 17:52:41 -04:00
comfyanonymous
cdbbeb584d Enable pytorch attention by default on xpu. 2023-09-17 04:09:19 -04:00
comfyanonymous
cac135d12f Don't run text encoders on xpu because there are issues. 2023-09-14 12:16:07 -04:00
comfyanonymous
ef0c0892f6 Add a force argument to soft_empty_cache to force a cache empty. 2023-09-04 00:58:18 -04:00
Simon Lui
1148c2dec7 Some fixes to generalize CUDA specific functionality to Intel or other GPUs. 2023-09-02 18:22:10 -07:00
comfyanonymous
ae3f7060d8 Enable bf16-vae by default on ampere and up. 2023-08-27 23:06:19 -04:00
comfyanonymous
90bfcef833 Fix lowvram model merging. 2023-08-26 11:52:07 -04:00
comfyanonymous
30d39b387d The new smart memory management makes this unnecessary. 2023-08-25 18:02:15 -04:00
comfyanonymous
4731c0b618 Code cleanups. 2023-08-24 19:39:18 -04:00
comfyanonymous
74d1dfb0ad Try to free enough vram for control lora inference. 2023-08-24 17:20:54 -04:00
comfyanonymous
e340ef7852 Always shift text encoder to GPU when the device supports fp16. 2023-08-23 21:45:00 -04:00
comfyanonymous
5ef57a983b Even with forced fp16 the cpu device should never use it. 2023-08-23 21:38:28 -04:00
comfyanonymous
e7fc7fb557 Save memory by storing text encoder weights in fp16 in most situations.
Do inference in fp32 to make sure quality stays the exact same.
2023-08-23 01:08:51 -04:00
comfyanonymous
37a6cb2649 Small cleanups. 2023-08-20 14:56:47 -04:00
Simon Lui
a670a3f848 Further tuning and fix mem_free_total. 2023-08-20 14:19:53 -04:00
Simon Lui
af8959c8a9 Add ipex optimize and other enhancements for Intel GPUs based on recent memory changes. 2023-08-20 14:19:51 -04:00
comfyanonymous
56901bd7c6 --disable-smart-memory now disables loading model directly to vram. 2023-08-20 04:00:53 -04:00
comfyanonymous
21e07337ed Add --disable-smart-memory for those that want the old behaviour. 2023-08-17 03:12:37 -04:00
comfyanonymous
197ab43811 Fix issue with regular torch version. 2023-08-17 01:58:54 -04:00
comfyanonymous
a216b56591 Smarter memory management.
Try to keep models on the vram when possible.

Better lowvram mode for controlnets.
2023-08-17 01:06:34 -04:00
comfyanonymous
ef16077917 Add CMP 30HX card to the nvidia_16_series list. 2023-08-04 12:08:45 -04:00
comfyanonymous
28401d83c5 Only shift text encoder to vram when CPU cores are under 8. 2023-07-31 00:08:54 -04:00
comfyanonymous
2ee42215be Lower CPU thread check for running the text encoder on the CPU vs GPU. 2023-07-30 17:18:24 -04:00
comfyanonymous
aa8fde7d6b Try to fix memory issue with lora. 2023-07-22 21:38:56 -04:00
comfyanonymous
b2879e0168 Merge branch 'fix-AttributeError-module-'torch'-has-no-attribute-'mps'' of https://github.com/KarryCharon/ComfyUI 2023-07-20 00:34:54 -04:00
comfyanonymous
3aad28d483 Add MX450 and MX550 to list of cards with broken fp16. 2023-07-19 03:08:30 -04:00
comfyanonymous
22abe3af9f Fix device print on old torch version. 2023-07-17 15:18:58 -04:00
comfyanonymous
5ddb2ca26f Add a command line argument to enable backend:cudaMallocAsync 2023-07-17 11:00:14 -04:00
comfyanonymous
ba6e888eb9 Lower lora ram usage when in normal vram mode. 2023-07-16 02:59:04 -04:00
comfyanonymous
73c2afbe44 Speed up lora loading a bit. 2023-07-15 13:25:22 -04:00
KarryCharon
3ee78c064b fix mps miss import 2023-07-12 10:06:34 +08:00
comfyanonymous
42805fd416 Empty cache after model unloading for normal vram and lower. 2023-07-09 09:56:03 -04:00
comfyanonymous
9caaa09c71 Add arguments to run the VAE in fp16 or bf16 for testing. 2023-07-06 23:23:46 -04:00
comfyanonymous
fa8010f038 Disable autocast in unet for increased speed. 2023-07-05 21:58:29 -04:00
comfyanonymous
06ce99e525 Fix issue with OSX. 2023-07-04 02:09:02 -04:00