Torch has alignment enforcement when viewing with data type changes
but only relative to itself. Do all tensor constructions straight
off the memory-view individually so pytorch doesnt see an alignment
problem.
The is needed for handling misaligned safetensors weights, which are
reasonably common in third party models.
This limits usage of this safetensors loader to GPU compute only
as CPUs kernnel are very likely to bus error. But it works for
dynamic_vram, where we really dont want to take a deep copy and we
always use GPU copy_ which disentangles the misalignment.
* make setattr safe for non existent attributes
Handle the case where the attribute doesnt exist by returning a static
sentinel (distinct from None). If the sentinel is passed in as the set
value, del the attr.
* Account for dequantization and type-casts in offload costs
When measuring the cost of offload, identify weights that need a type
change or dequantization and add the size of the conversion result
to the offload cost.
This is mutually exclusive with lowvram patches which already has
a large conservative estimate and wont overlap the dequant cost so\
dont double count.
* Set the compute type on CLIP MPs
So that the loader can know the size of weights for dequant accounting.
* Support for async execution functions
This commit adds support for node execution functions defined as async. When
a node's execution function is defined as async, we can continue
executing other nodes while it is processing.
Standard uses of `await` should "just work", but people will still have
to be careful if they spawn actual threads. Because torch doesn't really
have async/await versions of functions, this won't particularly help
with most locally-executing nodes, but it does work for e.g. web
requests to other machines.
In addition to the execute function, the `VALIDATE_INPUTS` and
`check_lazy_status` functions can also be defined as async, though we'll
only resolve one node at a time right now for those.
* Add the execution model tests to CI
* Add a missing file
It looks like this got caught by .gitignore? There's probably a better
place to put it, but I'm not sure what that is.
* Add the websocket library for automated tests
* Add additional tests for async error cases
Also fixes one bug that was found when an async function throws an error
after being scheduled on a task.
* Add a feature flags message to reduce bandwidth
We now only send 1 preview message of the latest type the client can
support.
We'll add a console warning when the client fails to send a feature
flags message at some point in the future.
* Add async tests to CI
* Don't actually add new tests in this PR
Will do it in a separate PR
* Resolve unit test in GPU-less runner
* Just remove the tests that GHA can't handle
* Change line endings to UNIX-style
* Avoid loading model_management.py so early
Because model_management.py has a top-level `logging.info`, we have to
be careful not to import that file before we call `setup_logging`. If we
do, we end up having the default logging handler registered in addition
to our custom one.
This commit fixes the temporal tile size calculation, and removes
a redundant tile at the end of the range when its elements are
completely covered by the previous tile.
Co-authored-by: Andrew Kvochko <a.kvochko@lightricks.com>
* fix attention OOM in xformers
* allow passing attention mask in flux attention
* allow an attn_mask in flux
* attn masks can be done using replace patches instead of a separate dict
* fix return types
* fix return order
* enumerate
* patch the right keys
* arg names
* fix a silly bug
* fix xformers masks
* replace match with if, elif, else
* mask with image_ref_size
* remove unused import
* remove unused import 2
* fix pytorch/xformers attention
This corrects a weird inconsistency with skip_reshape.
It also allows masks of various shapes to be passed, which will be
automtically expanded (in a memory-efficient way) to a size that is
compatible with xformers or pytorch sdpa respectively.
* fix mask shapes
To use:
"Load CLIP" node with t5xxl + type mochi
"Load Diffusion Model" node with the mochi dit file.
"Load VAE" with the mochi vae file.
EmptyMochiLatentVideo node for the latent.
euler + linear_quadratic in the KSampler node.