EasyAI代码托管平台

wangbo/ComfyUI

Fork 0

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-05-22 06:57:36 +08:00

Commit Graph

Author	SHA1	Message	Date
Dustin	5770c7034b	Guard against None offload_stream in prefetch_queue_pop cast_modules_with_vbar can return None when all modules in a prefetched block are already resident. Mirrors the existing None check in sync_stream.	2026-05-04 02:17:17 -04:00
rattus	783782d5d7	Implement block prefetch + Lora Async load + and adopt in LTX (Speedup!) (CORE-111) (#13618 ) * mm: Use Aimdo raw allocator for cast buffers pytorch manages allocation of growing buffers on streams poorly. Pyt has no windows support for the expandable segments allocator (which is the right tool for this job), while also segmenting the memory by stream such that it can be generally re-used. So kick the problem to aimdo which can just grow a virtual region thats freed per stream. * plan * ops: move cpu handler up to the caller * ops: split up prefetch from weight prep block prefetching API Split up the casting and weight formating/lora stuff in prep for arbitrary prefetch support. * ops: implement block prefetching API allow a model to construct a prefetch list and operate it for increased async offload. * ltxv2: Implement block prefetching * Implement lora async offload Implement async offload of loras.	2026-05-02 19:23:24 -04:00

Author

SHA1

Message

Date

Dustin

5770c7034b

Guard against None offload_stream in prefetch_queue_pop

cast_modules_with_vbar can return None when all modules in a
prefetched block are already resident. Mirrors the existing
None check in sync_stream.

2026-05-04 02:17:17 -04:00

rattus

783782d5d7

Implement block prefetch + Lora Async load + and adopt in LTX (Speedup!) (CORE-111) (#13618 )

* mm: Use Aimdo raw allocator for cast buffers

pytorch manages allocation of growing buffers on streams poorly. Pyt
has no windows support for the expandable segments allocator (which is
the right tool for this job), while also segmenting the memory by
stream such that it can be generally re-used. So kick the problem to
aimdo which can just grow a virtual region thats freed per stream.

* plan

* ops: move cpu handler up to the caller

* ops: split up prefetch from weight prep block prefetching API

Split up the casting and weight formating/lora stuff in prep for
arbitrary prefetch support.

* ops: implement block prefetching API

allow a model to construct a prefetch list and operate it for increased
async offload.

* ltxv2: Implement block prefetching

* Implement lora async offload

Implement async offload of loras.

2026-05-02 19:23:24 -04:00

2 Commits