mirror of
https://github.com/comfyanonymous/ComfyUI.git
synced 2026-04-25 18:02:37 +08:00
Some checks are pending
Build package / Build Test (3.10) (push) Waiting to run
Build package / Build Test (3.12) (push) Waiting to run
Python Linting / Run Ruff (push) Waiting to run
Python Linting / Run Pylint (push) Waiting to run
Build package / Build Test (3.11) (push) Waiting to run
Build package / Build Test (3.13) (push) Waiting to run
Build package / Build Test (3.14) (push) Waiting to run
* fix: pin SQLAlchemy>=2.0 in requirements.txt (fixes #13036) (#13316) * Refactor io to IO in nodes_ace.py (#13485) * Bump comfyui-frontend-package to 1.42.12 (#13489) * Make the ltx audio vae more native. (#13486) * feat(api-nodes): add automatic downscaling of videos for ByteDance 2 nodes (#13465) * Support standalone LTXV audio VAEs (#13499) * [Partner Nodes] added 4K resolution for Veo models; added Veo 3 Lite model (#13330) * feat(api nodes): added 4K resolution for Veo models; added Veo 3 Lite model Signed-off-by: bigcat88 <bigcat88@icloud.com> * increase poll_interval from 5 to 9 --------- Signed-off-by: bigcat88 <bigcat88@icloud.com> Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com> * Bump comfyui-frontend-package to 1.42.14 (#13493) * Add gpt-image-2 as version option (#13501) * Allow logging in comfy app files. (#13505) * chore: update workflow templates to v0.9.59 (#13507) * fix(veo): reject 4K resolution for veo-3.0 models in Veo3VideoGenerationNode (#13504) The tooltip on the resolution input states that 4K is not available for veo-3.1-lite or veo-3.0 models, but the execute guard only rejected the lite combination. Selecting 4K with veo-3.0-generate-001 or veo-3.0-fast-generate-001 would fall through and hit the upstream API with an invalid request. Broaden the guard to match the documented behavior and update the error message accordingly. Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com> * feat: RIFE and FILM frame interpolation model support (CORE-29) (#13258) * initial RIFE support * Also support FILM * Better RAM usage, reduce FILM VRAM peak * Add model folder placeholder * Fix oom fallback frame loss * Remove torch.compile for now * Rename model input * Shorter input type name --------- * fix: use Parameter assignment for Stable_Zero123 cc_projection weights (fixes #13492) (#13518) On Windows with aimdo enabled, disable_weight_init.Linear uses lazy initialization that sets weight and bias to None to avoid unnecessary memory allocation. This caused a crash when copy_() was called on the None weight attribute in Stable_Zero123.__init__. Replace copy_() with direct torch.nn.Parameter assignment, which works correctly on both Windows (aimdo enabled) and other platforms. * Derive InterruptProcessingException from BaseException (#13523) * bump manager version to 4.2.1 (#13516) * ModelPatcherDynamic: force cast stray weights on comfy layers (#13487) the mixed_precision ops can have input_scale parameters that are used in tensor math but arent a weight or bias so dont get proper VRAM management. Treat these as force-castable parameters like the non comfy weight, random params are buffers already are. * Update logging level for invalid version format (#13526) * [Partner Nodes] add SD2 real human support (#13509) * feat(api-nodes): add SD2 real human support Signed-off-by: bigcat88 <bigcat88@icloud.com> * fix: add validation before uploading Assets Signed-off-by: bigcat88 <bigcat88@icloud.com> * Add asset_id and group_id displaying on the node Signed-off-by: bigcat88 <bigcat88@icloud.com> * extend poll_op to use instead of custom async cycle Signed-off-by: bigcat88 <bigcat88@icloud.com> * added the polling for the "Active" status after asset creation Signed-off-by: bigcat88 <bigcat88@icloud.com> * updated tooltip for group_id * allow usage of real human in the ByteDance2FirstLastFrame node * add reference count limits * corrected price in status when input assets contain video Signed-off-by: bigcat88 <bigcat88@icloud.com> --------- Signed-off-by: bigcat88 <bigcat88@icloud.com> * feat: SAM (segment anything) 3.1 support (CORE-34) (#13408) * [Partner Nodes] GPTImage: fix price badges, add new resolutions (#13519) * fix(api-nodes): fixed price badges, add new resolutions Signed-off-by: bigcat88 <bigcat88@icloud.com> * proper calculate the total run cost when "n > 1" Signed-off-by: bigcat88 <bigcat88@icloud.com> --------- Signed-off-by: bigcat88 <bigcat88@icloud.com> * chore: update workflow templates to v0.9.61 (#13533) * chore: update embedded docs to v0.4.4 (#13535) * add 4K resolution to Kling nodes (#13536) Signed-off-by: bigcat88 <bigcat88@icloud.com> * Fix LTXV Reference Audio node (#13531) * comfy-aimdo 0.2.14: Hotfix async allocator estimations (#13534) This was doing an over-estimate of VRAM used by the async allocator when lots of little small tensors were in play. Also change the versioning scheme to == so we can roll forward aimdo without worrying about stable regressions downstream in comfyUI core. * Disable sageattention for SAM3 (#13529) Causes Nans * execution: Add anti-cycle validation (#13169) Currently if the graph contains a cycle, the just inifitiate recursions, hits a catch all then throws a generic error against the output node that seeded the validation. Instead, fail the offending cycling mode chain and handlng it as an error in its own right. Co-authored-by: guill <jacob.e.segal@gmail.com> * chore: update workflow templates to v0.9.62 (#13539) --------- Signed-off-by: bigcat88 <bigcat88@icloud.com> Co-authored-by: Octopus <liyuan851277048@icloud.com> Co-authored-by: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com> Co-authored-by: Comfy Org PR Bot <snomiao+comfy-pr@gmail.com> Co-authored-by: Alexander Piskun <13381981+bigcat88@users.noreply.github.com> Co-authored-by: Jukka Seppänen <40791699+kijai@users.noreply.github.com> Co-authored-by: AustinMroz <austin@comfy.org> Co-authored-by: Daxiong (Lin) <contact@comfyui-wiki.com> Co-authored-by: Matt Miller <matt@miller-media.com> Co-authored-by: blepping <157360029+blepping@users.noreply.github.com> Co-authored-by: Dr.Lt.Data <128333288+ltdrdata@users.noreply.github.com> Co-authored-by: rattus <46076784+rattus128@users.noreply.github.com> Co-authored-by: guill <jacob.e.segal@gmail.com>
259 lines
12 KiB
Python
259 lines
12 KiB
Python
"""FILM: Frame Interpolation for Large Motion (ECCV 2022)."""
|
|
|
|
import torch
|
|
import torch.nn as nn
|
|
import torch.nn.functional as F
|
|
|
|
import comfy.ops
|
|
|
|
ops = comfy.ops.disable_weight_init
|
|
|
|
|
|
class FilmConv2d(nn.Module):
|
|
"""Conv2d with optional LeakyReLU and FILM-style padding."""
|
|
|
|
def __init__(self, in_channels, out_channels, size, activation=True, device=None, dtype=None, operations=ops):
|
|
super().__init__()
|
|
self.even_pad = not size % 2
|
|
self.conv = operations.Conv2d(in_channels, out_channels, kernel_size=size, padding=size // 2 if size % 2 else 0, device=device, dtype=dtype)
|
|
self.activation = nn.LeakyReLU(0.2) if activation else None
|
|
|
|
def forward(self, x):
|
|
if self.even_pad:
|
|
x = F.pad(x, (0, 1, 0, 1))
|
|
x = self.conv(x)
|
|
if self.activation is not None:
|
|
x = self.activation(x)
|
|
return x
|
|
|
|
|
|
def _warp_core(image, flow, grid_x, grid_y):
|
|
dtype = image.dtype
|
|
H, W = flow.shape[2], flow.shape[3]
|
|
dx = flow[:, 0].float() / (W * 0.5)
|
|
dy = flow[:, 1].float() / (H * 0.5)
|
|
grid = torch.stack([grid_x[None, None, :] + dx, grid_y[None, :, None] + dy], dim=3)
|
|
return F.grid_sample(image.float(), grid, mode="bilinear", padding_mode="border", align_corners=False).to(dtype)
|
|
|
|
|
|
def build_image_pyramid(image, pyramid_levels):
|
|
pyramid = [image]
|
|
for _ in range(1, pyramid_levels):
|
|
image = F.avg_pool2d(image, 2, 2)
|
|
pyramid.append(image)
|
|
return pyramid
|
|
|
|
|
|
def flow_pyramid_synthesis(residual_pyramid):
|
|
flow = residual_pyramid[-1]
|
|
flow_pyramid = [flow]
|
|
for residual_flow in residual_pyramid[:-1][::-1]:
|
|
flow = F.interpolate(flow, size=residual_flow.shape[2:4], mode="bilinear", scale_factor=None).mul_(2).add_(residual_flow)
|
|
flow_pyramid.append(flow)
|
|
flow_pyramid.reverse()
|
|
return flow_pyramid
|
|
|
|
|
|
def multiply_pyramid(pyramid, scalar):
|
|
return [image * scalar[:, None, None, None] for image in pyramid]
|
|
|
|
|
|
def pyramid_warp(feature_pyramid, flow_pyramid, warp_fn):
|
|
return [warp_fn(features, flow) for features, flow in zip(feature_pyramid, flow_pyramid)]
|
|
|
|
|
|
def concatenate_pyramids(pyramid1, pyramid2):
|
|
return [torch.cat([f1, f2], dim=1) for f1, f2 in zip(pyramid1, pyramid2)]
|
|
|
|
|
|
class SubTreeExtractor(nn.Module):
|
|
def __init__(self, in_channels=3, channels=64, n_layers=4, device=None, dtype=None, operations=ops):
|
|
super().__init__()
|
|
convs = []
|
|
for i in range(n_layers):
|
|
out_ch = channels << i
|
|
convs.append(nn.Sequential(
|
|
FilmConv2d(in_channels, out_ch, 3, device=device, dtype=dtype, operations=operations),
|
|
FilmConv2d(out_ch, out_ch, 3, device=device, dtype=dtype, operations=operations)))
|
|
in_channels = out_ch
|
|
self.convs = nn.ModuleList(convs)
|
|
|
|
def forward(self, image, n):
|
|
head = image
|
|
pyramid = []
|
|
for i, layer in enumerate(self.convs):
|
|
head = layer(head)
|
|
pyramid.append(head)
|
|
if i < n - 1:
|
|
head = F.avg_pool2d(head, 2, 2)
|
|
return pyramid
|
|
|
|
|
|
class FeatureExtractor(nn.Module):
|
|
def __init__(self, in_channels=3, channels=64, sub_levels=4, device=None, dtype=None, operations=ops):
|
|
super().__init__()
|
|
self.extract_sublevels = SubTreeExtractor(in_channels, channels, sub_levels, device=device, dtype=dtype, operations=operations)
|
|
self.sub_levels = sub_levels
|
|
|
|
def forward(self, image_pyramid):
|
|
sub_pyramids = [self.extract_sublevels(image_pyramid[i], min(len(image_pyramid) - i, self.sub_levels))
|
|
for i in range(len(image_pyramid))]
|
|
feature_pyramid = []
|
|
for i in range(len(image_pyramid)):
|
|
features = sub_pyramids[i][0]
|
|
for j in range(1, self.sub_levels):
|
|
if j <= i:
|
|
features = torch.cat([features, sub_pyramids[i - j][j]], dim=1)
|
|
feature_pyramid.append(features)
|
|
# Free sub-pyramids no longer needed by future levels
|
|
if i >= self.sub_levels - 1:
|
|
sub_pyramids[i - self.sub_levels + 1] = None
|
|
return feature_pyramid
|
|
|
|
|
|
class FlowEstimator(nn.Module):
|
|
def __init__(self, in_channels, num_convs, num_filters, device=None, dtype=None, operations=ops):
|
|
super().__init__()
|
|
self._convs = nn.ModuleList()
|
|
for _ in range(num_convs):
|
|
self._convs.append(FilmConv2d(in_channels, num_filters, 3, device=device, dtype=dtype, operations=operations))
|
|
in_channels = num_filters
|
|
self._convs.append(FilmConv2d(in_channels, num_filters // 2, 1, device=device, dtype=dtype, operations=operations))
|
|
self._convs.append(FilmConv2d(num_filters // 2, 2, 1, activation=False, device=device, dtype=dtype, operations=operations))
|
|
|
|
def forward(self, features_a, features_b):
|
|
net = torch.cat([features_a, features_b], dim=1)
|
|
for conv in self._convs:
|
|
net = conv(net)
|
|
return net
|
|
|
|
|
|
class PyramidFlowEstimator(nn.Module):
|
|
def __init__(self, filters=64, flow_convs=(3, 3, 3, 3), flow_filters=(32, 64, 128, 256), device=None, dtype=None, operations=ops):
|
|
super().__init__()
|
|
in_channels = filters << 1
|
|
predictors = []
|
|
for i in range(len(flow_convs)):
|
|
predictors.append(FlowEstimator(in_channels, flow_convs[i], flow_filters[i], device=device, dtype=dtype, operations=operations))
|
|
in_channels += filters << (i + 2)
|
|
self._predictor = predictors[-1]
|
|
self._predictors = nn.ModuleList(predictors[:-1][::-1])
|
|
|
|
def forward(self, feature_pyramid_a, feature_pyramid_b, warp_fn):
|
|
levels = len(feature_pyramid_a)
|
|
v = self._predictor(feature_pyramid_a[-1], feature_pyramid_b[-1])
|
|
residuals = [v]
|
|
# Coarse-to-fine: shared predictor for deep levels, then specialized predictors for fine levels
|
|
steps = [(i, self._predictor) for i in range(levels - 2, len(self._predictors) - 1, -1)]
|
|
steps += [(len(self._predictors) - 1 - k, p) for k, p in enumerate(self._predictors)]
|
|
for i, predictor in steps:
|
|
v = F.interpolate(v, size=feature_pyramid_a[i].shape[2:4], mode="bilinear").mul_(2)
|
|
v_residual = predictor(feature_pyramid_a[i], warp_fn(feature_pyramid_b[i], v))
|
|
residuals.append(v_residual)
|
|
v = v.add_(v_residual)
|
|
residuals.reverse()
|
|
return residuals
|
|
|
|
|
|
def _get_fusion_channels(level, filters):
|
|
# Per direction: multi-scale features + RGB image (3ch) + flow (2ch), doubled for both directions
|
|
return (sum(filters << i for i in range(level)) + 3 + 2) * 2
|
|
|
|
|
|
class Fusion(nn.Module):
|
|
def __init__(self, n_layers=4, specialized_layers=3, filters=64, device=None, dtype=None, operations=ops):
|
|
super().__init__()
|
|
self.output_conv = operations.Conv2d(filters, 3, kernel_size=1, device=device, dtype=dtype)
|
|
self.convs = nn.ModuleList()
|
|
in_channels = _get_fusion_channels(n_layers, filters)
|
|
increase = 0
|
|
for i in range(n_layers)[::-1]:
|
|
num_filters = (filters << i) if i < specialized_layers else (filters << specialized_layers)
|
|
self.convs.append(nn.ModuleList([
|
|
FilmConv2d(in_channels, num_filters, 2, activation=False, device=device, dtype=dtype, operations=operations),
|
|
FilmConv2d(in_channels + (increase or num_filters), num_filters, 3, device=device, dtype=dtype, operations=operations),
|
|
FilmConv2d(num_filters, num_filters, 3, device=device, dtype=dtype, operations=operations)]))
|
|
in_channels = num_filters
|
|
increase = _get_fusion_channels(i, filters) - num_filters // 2
|
|
|
|
def forward(self, pyramid):
|
|
net = pyramid[-1]
|
|
for k, layers in enumerate(self.convs):
|
|
i = len(self.convs) - 1 - k
|
|
net = layers[0](F.interpolate(net, size=pyramid[i].shape[2:4], mode="nearest"))
|
|
net = layers[2](layers[1](torch.cat([pyramid[i], net], dim=1)))
|
|
return self.output_conv(net)
|
|
|
|
|
|
class FILMNet(nn.Module):
|
|
def __init__(self, pyramid_levels=7, fusion_pyramid_levels=5, specialized_levels=3, sub_levels=4,
|
|
filters=64, flow_convs=(3, 3, 3, 3), flow_filters=(32, 64, 128, 256), device=None, dtype=None, operations=ops):
|
|
super().__init__()
|
|
self.pyramid_levels = pyramid_levels
|
|
self.fusion_pyramid_levels = fusion_pyramid_levels
|
|
self.extract = FeatureExtractor(3, filters, sub_levels, device=device, dtype=dtype, operations=operations)
|
|
self.predict_flow = PyramidFlowEstimator(filters, flow_convs, flow_filters, device=device, dtype=dtype, operations=operations)
|
|
self.fuse = Fusion(sub_levels, specialized_levels, filters, device=device, dtype=dtype, operations=operations)
|
|
self._warp_grids = {}
|
|
|
|
def get_dtype(self):
|
|
return self.extract.extract_sublevels.convs[0][0].conv.weight.dtype
|
|
|
|
def _build_warp_grids(self, H, W, device):
|
|
"""Pre-compute warp grids for all pyramid levels."""
|
|
if (H, W) in self._warp_grids:
|
|
return
|
|
self._warp_grids = {} # clear old resolution grids to prevent memory leaks
|
|
for _ in range(self.pyramid_levels):
|
|
self._warp_grids[(H, W)] = (
|
|
torch.linspace(-(1 - 1 / W), 1 - 1 / W, W, dtype=torch.float32, device=device),
|
|
torch.linspace(-(1 - 1 / H), 1 - 1 / H, H, dtype=torch.float32, device=device),
|
|
)
|
|
H, W = H // 2, W // 2
|
|
|
|
def warp(self, image, flow):
|
|
grid_x, grid_y = self._warp_grids[(flow.shape[2], flow.shape[3])]
|
|
return _warp_core(image, flow, grid_x, grid_y)
|
|
|
|
def extract_features(self, img):
|
|
"""Extract image and feature pyramids for a single frame. Can be cached across pairs."""
|
|
image_pyramid = build_image_pyramid(img, self.pyramid_levels)
|
|
feature_pyramid = self.extract(image_pyramid)
|
|
return image_pyramid, feature_pyramid
|
|
|
|
def forward(self, img0, img1, timestep=0.5, cache=None):
|
|
# FILM uses a scalar timestep per batch element (spatially-varying timesteps not supported)
|
|
t = timestep.mean(dim=(1, 2, 3)).item() if isinstance(timestep, torch.Tensor) else timestep
|
|
return self.forward_multi_timestep(img0, img1, [t], cache=cache)
|
|
|
|
def forward_multi_timestep(self, img0, img1, timesteps, cache=None):
|
|
"""Compute flow once, synthesize at multiple timesteps. Expects batch=1 inputs."""
|
|
self._build_warp_grids(img0.shape[2], img0.shape[3], img0.device)
|
|
|
|
image_pyr0, feat_pyr0 = cache["img0"] if cache and "img0" in cache else self.extract_features(img0)
|
|
image_pyr1, feat_pyr1 = cache["img1"] if cache and "img1" in cache else self.extract_features(img1)
|
|
|
|
fwd_flow = flow_pyramid_synthesis(self.predict_flow(feat_pyr0, feat_pyr1, self.warp))[:self.fusion_pyramid_levels]
|
|
bwd_flow = flow_pyramid_synthesis(self.predict_flow(feat_pyr1, feat_pyr0, self.warp))[:self.fusion_pyramid_levels]
|
|
|
|
# Build warp targets and free full pyramids (only first fpl levels needed from here)
|
|
fpl = self.fusion_pyramid_levels
|
|
p2w = [concatenate_pyramids(image_pyr0[:fpl], feat_pyr0[:fpl]),
|
|
concatenate_pyramids(image_pyr1[:fpl], feat_pyr1[:fpl])]
|
|
del image_pyr0, image_pyr1, feat_pyr0, feat_pyr1
|
|
|
|
results = []
|
|
dt_tensors = torch.tensor(timesteps, device=img0.device, dtype=img0.dtype)
|
|
for idx in range(len(timesteps)):
|
|
batch_dt = dt_tensors[idx:idx + 1]
|
|
bwd_scaled = multiply_pyramid(bwd_flow, batch_dt)
|
|
fwd_scaled = multiply_pyramid(fwd_flow, 1 - batch_dt)
|
|
fwd_warped = pyramid_warp(p2w[0], bwd_scaled, self.warp)
|
|
bwd_warped = pyramid_warp(p2w[1], fwd_scaled, self.warp)
|
|
aligned = [torch.cat([fw, bw, bf, ff], dim=1)
|
|
for fw, bw, bf, ff in zip(fwd_warped, bwd_warped, bwd_scaled, fwd_scaled)]
|
|
del fwd_warped, bwd_warped, bwd_scaled, fwd_scaled
|
|
results.append(self.fuse(aligned))
|
|
del aligned
|
|
return torch.cat(results, dim=0)
|