- ImageRequestParameter now returns None or a provided default when the value of its path / URL is empty, instead of erroring
- Custom nodes which touch nodes.NODE_CLASS_MAPPINGS will once again see all the nodes available during execution, instead of only the base nodes
* mm: factor out the current stream getter
Make this a reusable function.
* ops: sync the offload stream with the consumption of w&b
This sync is nessacary as pytorch will queue cuda async frees on the
same stream as created to tensor. In the case of async offload, this
will be on the offload stream.
Weights and biases can go out of scope in python which then
triggers the pytorch garbage collector to queue the free operation on
the offload stream possible before the compute stream has used the
weight. This causes a use after free on weight data leading to total
corruption of some workflows.
So sync the offload stream with the compute stream after the weight
has been used so the free has to wait for the weight to be used.
The cast_bias_weight is extended in a backwards compatible way with
the new behaviour opt-in on a defaulted parameter. This handles
custom node packs calling cast_bias_weight and defeatures
async-offload for them (as they do not handle the race).
The pattern is now:
cast_bias_weight(... , offloadable=True) #This might be offloaded
thing(weight, bias, ...)
uncast_bias_weight(...)
* controlnet: adopt new cast_bias_weight synchronization scheme
This is nessacary for safe async weight offloading.
* mm: sync the last stream in the queue, not the next
Currently this peeks ahead to sync the next stream in the queue of
streams with the compute stream. This doesnt allow a lot of
parallelization, as then end result is you can only get one weight load
ahead regardless of how many streams you have.
Rotate the loop logic here to synchronize the end of the queue before
returning the next stream. This allows weights to be loaded ahead of the
compute streams position.
In the case of --cache-none lazy and subgraph execution can cause
anything to be run multiple times per workflow. If that rerun nodes is
in itself a subgraph generator, this will crash for two reasons.
pending_subgraph_results[] does not cleanup entries after their use.
So when a pending_subgraph_result is consumed, remove it from the list
so that if the corresponding node is fully re-executed this misses
lookup and it fall through to execute the node as it should.
Secondly, theres is an explicit enforcement against dups in the
addition of subgraphs nodes as ephemerals to the dymprompt. Remove this
enforcement as the use case is now valid.
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Fix missing keys
* Rename quant dtype parameter
* Rename quant dtype parameter
* Fix unittests for CPU build
* feat(api-nodes): implement new API client for V3 nodes
* feat(api-nodes): implement new API client for V3 nodes
* feat(api-nodes): implement new API client for V3 nodes
* converted WAN nodes to use new client; polishing
* fix(auth): do not leak authentification for the absolute urls
* convert BFL API nodes to use new API client; remove deprecated BFL nodes
* converted Google Veo nodes
* fix(Veo3.1 model): take into account "generate_audio" parameter
- Fixed and verifies compatibility with the following nodes:
ComfyUI-Manager==3.0.1
ComfyUI_LayerStyle==1.0.90
ComfyUI-Easy-Use==1.3.4 (required fix)
ComfyUI-KJNodes==1.1.7 (required mitigation)
ComfyUI_Custom_Nodes_AlekPet==1.0.88
LanPaint==1.4.1
Comfyui-Simple-Json-Node==1.1.0
- Add support to referencing files in packages.