doctorpangloss
9d5a5dd533
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-12-28 14:24:27 -08:00
comfyanonymous
99a1fb6027
Make fast fp8 take a bit less peak memory.
2024-12-24 18:05:19 -05:00
doctorpangloss
2d1676c717
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-12-09 15:54:37 -08:00
Haoming
fbf68c4e52
clamp input ( #5928 )
2024-12-07 14:00:31 -05:00
doctorpangloss
31eacb6ac9
Improve compilation of models, adding support for triton
2024-11-01 10:40:58 -07:00
doctorpangloss
a8d8bff548
Improve support for torch compilation and sage attention
2024-10-29 19:22:26 -07:00
doctorpangloss
76a80a65ea
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-10-29 15:35:39 -07:00
comfyanonymous
915fdb5745
Fix lowvram edge case.
2024-10-22 16:34:50 -04:00
comfyanonymous
8ce2a1052c
Optimizations to --fast and scaled fp8.
2024-10-22 02:12:28 -04:00
comfyanonymous
0075c6d096
Mixed precision diffusion models with scaled fp8.
...
This change allows supports for diffusion models where all the linears are
scaled fp8 while the other weights are the original precision.
2024-10-21 18:12:51 -04:00
comfyanonymous
83ca891118
Support scaled fp8 t5xxl model.
2024-10-20 22:27:00 -04:00
comfyanonymous
f9f9faface
Fixed model merging issue with scaled fp8.
2024-10-20 06:24:31 -04:00
comfyanonymous
a68bbafddb
Support diffusion models with scaled fp8 weights.
2024-10-19 23:47:42 -04:00
comfyanonymous
67158994a4
Use the lowvram cast_to function for everything.
2024-10-17 17:25:56 -04:00
doctorpangloss
8512f361fe
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-10-14 15:26:27 -07:00
doctorpangloss
f3da381869
Fix inference mode execution issues
2024-10-10 21:00:09 -07:00
doctorpangloss
a38968f098
Improvements to execution
...
- Validation errors that occur early in the lifecycle of prompt
execution now get propagated to their callers in the
EmbeddedComfyClient. This includes error messages about missing node
classes.
- The execution context now includes the node_id and the prompt_id
- Latent previews are now sent with a node_id. This is not backwards
compatible with old frontends.
- Dependency execution errors are now modeled correctly.
- Distributed progress encodes image previews with node and prompt IDs.
- Typing for models
- The frontend was updated to use node IDs with previews
- Improvements to torch.compile experiments
- Some controlnet_aux nodes were upstreamed
2024-10-10 19:30:18 -07:00
doctorpangloss
69e523b89d
Experimental quantization support. Only Linux is meaningfully supported
2024-10-10 13:43:06 -07:00
comfyanonymous
e38c94228b
Add a weight_dtype fp8_e4m3fn_fast to the Diffusion Model Loader node.
...
This is used to load weights in fp8 and use fp8 matrix multiplication.
2024-10-09 19:43:17 -04:00
doctorpangloss
fa3176f96f
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-09-23 12:50:31 -07:00
comfyanonymous
9c41bc8d10
Remove useless line.
2024-09-23 02:32:29 -04:00
comfyanonymous
dc96a1ae19
Load controlnet in fp8 if weights are in fp8.
2024-09-21 04:50:12 -04:00
doctorpangloss
5155a3e248
Merge WIP
2024-08-25 18:52:29 -07:00
comfyanonymous
8ae23d8e80
Fix onnx export.
2024-08-23 17:52:47 -04:00
comfyanonymous
c7ee4b37a1
Try to fix some lora issues.
2024-08-22 15:32:18 -04:00
comfyanonymous
904bf58e7d
Make --fast work on pytorch nightly.
2024-08-21 14:01:41 -04:00
Svein Ove Aas
5f50263088
Replace use of .view with .reshape ( #4522 )
...
When generating images with fp8_e4_m3 Flux and batch size >1, using --fast, ComfyUI throws a "view size is not compatible with input tensor's size and stride" error pointing at the first of these two calls to view.
As reshape is semantically equivalent to view except for working on a broader set of inputs, there should be no downside to changing this. The only difference is that it clones the underlying data in cases where .view would error out. I have confirmed that the output still looks as expected, but cannot confirm that no mutable use is made of the tensors anywhere.
Note that --fast is only marginally faster than the default.
2024-08-21 11:21:48 -04:00
comfyanonymous
03ec517afb
Remove useless line, adjust windows default reserved vram.
2024-08-21 00:47:19 -04:00
comfyanonymous
510f3438c1
Speed up fp8 matrix mult by using better code.
2024-08-20 22:53:26 -04:00
comfyanonymous
9953f22fce
Add --fast argument to enable experimental optimizations.
...
Optimizations that might break things/lower quality will be put behind
this flag first and might be enabled by default in the future.
Currently the only optimization is float8_e4m3fn matrix multiplication on
4000/ADA series Nvidia cards or later. If you have one of these cards you
will see a speed boost when using fp8_e4m3fn flux for example.
2024-08-20 11:55:51 -04:00
comfyanonymous
538cb068bc
Make cast_to a nop if weight is already good.
2024-08-20 10:46:36 -04:00
comfyanonymous
39f114c44b
Less broken non blocking?
2024-08-18 16:53:17 -04:00
comfyanonymous
6730f3e1a3
Disable non blocking.
...
It fixed some perf issues but caused other issues that need to be debugged.
2024-08-18 14:38:09 -04:00
comfyanonymous
73332160c8
Enable non blocking transfers in lowvram mode.
2024-08-18 10:29:33 -04:00
doctorpangloss
0a1ae64b0b
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-08-01 16:19:11 -07:00
comfyanonymous
b85216a3c0
Lower T5 memory usage by a few hundred MB.
2024-07-31 00:52:34 -04:00
comfyanonymous
25853d0be8
Use common function for casting weights to input.
2024-07-30 10:49:14 -04:00
doctorpangloss
8cdc246450
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-06-17 16:19:48 -07:00
comfyanonymous
bb1969cab7
Initial support for the stable audio open model.
2024-06-15 12:14:56 -04:00
doctorpangloss
cb557c960b
Merge branch 'master' of github.com:comfyanonymous/ComfyUI
2024-05-31 07:42:11 -07:00
comfyanonymous
6c23854f54
Fix OSX latent2rgb previews.
2024-05-22 13:56:28 -04:00
doctorpangloss
005e370254
Merge upstream
2024-03-21 13:15:36 -07:00
comfyanonymous
448d9263a2
Fix control loras breaking.
2024-03-14 09:30:21 -04:00
comfyanonymous
db8b59ecff
Lower memory usage for loras in lowvram mode at the cost of perf.
2024-03-13 20:07:27 -04:00
doctorpangloss
7520691021
Merge with master
2024-02-19 10:55:22 -08:00
comfyanonymous
667c92814e
Stable Cascade Stage B.
2024-02-16 13:02:03 -05:00
doctorpangloss
82edb2ff0e
Merge with latest upstream.
2024-01-29 15:06:31 -08:00
comfyanonymous
78a70fda87
Remove useless import.
2024-01-19 15:38:05 -05:00
comfyanonymous
36a7953142
Greatly improve lowvram sampling speed by getting rid of accelerate.
...
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
comfyanonymous
77755ab8db
Refactor comfy.ops
...
comfy.ops -> comfy.ops.disable_weight_init
This should make it more clear what they actually do.
Some unused code has also been removed.
2023-12-11 23:27:13 -05:00