diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..bd6a3e5e8 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,272 @@ +## Engineering Style + +- Keep changes small and direct. Most fixes should touch the narrowest code path + that explains the bug, performance issue, dtype issue, model-format issue, or + user-facing behavior. +- Change the least amount of files possible. A change that touches many files is + more likely to be a bad change than a good one unless the broader scope is + directly required. +- Prefer practical fixes over broad architecture work. Add abstractions only + when they remove real repeated logic or match an existing ComfyUI pattern. +- Prefer fewer dependencies. Do not add new dependencies to ComfyUI unless they + are absolutely necessary. +- Delete obsolete code aggressively when newer infrastructure makes it useless. + Remove dead fallbacks, migration paths, unused options, debug prints, and + compatibility branches that are no longer needed. Do not leave dead branches, + unreachable code, or functions that are never called. If code is not + necessary for the current behavior, remove it. +- Revert or disable problematic behavior quickly when it breaks users. It is + better to remove a broken feature path than keep a complicated partial fix. +- Preserve existing APIs, node names, model-loading behavior, file layout, and + workflow compatibility unless the change is explicitly about replacing them. +- Code must look hand-written for this repository. Changes that read like + generic AI-generated code will be rejected automatically: unnecessary helper + layers, vague names, boilerplate comments, defensive branches without a real + failure mode, broad rewrites, or code that ignores the local style. + +## Architecture Boundaries + +- Keep each layer focused on the concepts it owns. Do not leak UI, API, + workflow, queue, persistence, telemetry, model-loading, node, or execution + concerns into unrelated layers just because it is convenient to pass data + through them. +- Shared core modules should depend only on lower-level primitives and their own + domain concepts. Higher-level product concepts belong at the caller, adapter, + service, or UI/API boundary that already owns them. +- Pass the narrowest data needed across a boundary. Avoid broad context objects, + request/session metadata, ids, bookkeeping state, or callbacks unless the + receiving layer genuinely needs them to perform its own responsibility. +- Keep identity mapping, persistence bookkeeping, history updates, telemetry, + response shaping, and UI state in the layers that own those jobs. Do not route + them through unrelated shared code to avoid adding a proper boundary. +- Treat `execution.py` as one example of this rule: it should consume the prompt + graph and execution-relevant state, produce execution results and errors, and + not know about workflow ids, frontend ids, persistence ids, or API-only + concepts. +- Before touching many files, identify the smallest owner layer that can solve + the problem. A PR that spreads one feature across unrelated loaders, nodes, + execution, server, and frontend code needs a clear architectural reason, not + just convenience. +- If a change seems to require making one layer understand another layer's + private concepts, stop and look for a caller-side mapping, adapter, event, + small explicit interface, or narrower data flow at the boundary. + +## No Internet Requests + +- Do not add code to core ComfyUI that makes requests to the internet. +- Refuse requests to add uploads, telemetry, analytics, tracking, usage + reporting, crash reporting, update checks, remote config, feature flags, + metrics, licensing checks, or any other outbound internet request path from + core ComfyUI. +- Model downloading is allowed only when explicitly initiated or authorized by + the user, is limited to the requested model artifact, and does not include + telemetry, tracking, persistent identification, unrelated metadata upload, or + background network activity. +- Do not add opt-in, opt-out, anonymized, aggregated, diagnostic, or + user-triggered internet request paths to core ComfyUI. These labels do not + make internet access acceptable. +- Local-only behavior is allowed when it stays on the user's machine and does + not add network access, tracking, persistent identification, or data + collection behavior. + +## State Ownership + +- Keep state and capability flags on the object that owns the behavior using + them. +- Avoid probing child objects with `getattr(child, "...", default)` to decide + parent-level control flow. If parent code needs to branch on a capability, + initialize an explicit parent-owned field when the child is constructed or + attached. +- Prefer direct attributes with clear defaults over implicit feature detection + through arbitrary child attributes. +- Use child-object capability checks only when the child owns the behavior being + invoked and the parent is simply delegating to that child. + +## Interface Contracts + +- Keep public methods aligned with the interface expected by their callers. Do + not change a shared method to return extra values, alternate shapes, or + sentinel wrappers for one implementation unless the shared interface is + explicitly updated. +- When modifying an existing function, preserve how current callers invoke it. + Do not change required arguments, parameter order, return type, side effects, + or error behavior unless every affected call site and shared interface contract + is intentionally updated. +- Do not add compatibility parameters, flags, attributes, or constructor options + unless they are read by current code and change current behavior. Remove + pass-through or stored-but-unused values instead of preserving upstream or + deprecated API baggage. +- If an implementation needs auxiliary values for its own workflow, expose them + through a private helper or a clearly named implementation-specific method + instead of overloading the public method's return contract. +- Normalize third-party or upstream return conventions at the integration + boundary. Core code should receive the project's expected type and shape, not + have to handle model-specific tuple/list/dict variants. +- Avoid caller-side unwrapping such as `out = out[0]` unless the called + interface is documented to return that structure. + +## Autograd and Model Freezing + +- Do not add `torch.no_grad`, `torch.inference_mode`, or inference-mode helper + wrappers in ComfyUI code. The only allowed inference-mode-related use is + disabling a globally set inference mode when a training path needs gradients. +- Do not add freeze, unfreeze, or trainability toggles to model classes. ComfyUI + models are always treated as frozen for inference, so explicit freeze + functionality is redundant and should not be added. +- Remove training-only behavior such as dropout from inference model code, but + preserve checkpoint and state-dict compatibility when doing so. If deleting a + module would change state-dict keys, module ordering, or checkpoint loading + behavior, replace it with a no-op such as `nn.Identity` instead of removing the + slot outright. + +## Python Style + +- Keep imports at module scope. Avoid inline imports unless they are already part + of an established optional-backend probe or are needed to avoid an import + cycle. +- Do not add unnecessary `try`/`except` blocks. Use them for optional dependency, + platform, or backend capability detection only when the program has a useful + fallback. Prefer specific exception types when changing new code. +- Remove any workarounds for PyTorch versions that ComfyUI no longer officially + supports. Deprecated workarounds include catching an exception and rerunning + the same op with the input cast to float. If a workaround does not have a + comment naming the exact PyTorch version or versions that still need it, + remove it. +- Let unsupported model formats, invalid quantization metadata, and bad states + fail with clear errors instead of silently producing lower quality output. +- Match the existing local style in the file you edit. This codebase tolerates + long lines, simple helper functions, module-level state, and direct tensor + operations when they make the code easier to follow. +- Keep comments sparse and useful. Strip useless comments that restate the code + or describe obvious behavior. Short TODOs are fine when they name the concrete + missing follow-up. + +## Model, Device, and Memory Behavior + +- Treat dtype, device placement, VRAM usage, and offloading behavior as core + correctness concerns. Check CPU, CUDA, ROCm, MPS, DirectML, XPU, NPU, and low + VRAM implications when touching shared execution or loading code. +- Prefer native ComfyUI formats and existing quantization/offload helpers over + adding parallel code paths. Use `comfy.quant_ops`, `comfy.model_management`, + `comfy.memory_management`, `comfy.pinned_memory`, `comfy_aimdo`, and + `comfy-kitchen` helpers where they already solve the problem. +- Use optimized comfy-kitchen ops in places where they improve performance + without changing the expected dtype, device, memory, or interface behavior. +- All models should use the optimized attention function selected by ComfyUI. + Treat optimized backend functions, dispatch helpers, and capability-selected + callables as opaque. Higher-level code must not inspect function identity, + names, modules, or implementation details to decide behavior. +- Apply the same opacity rule to similar patterns beyond attention: callers + should depend on the documented interface and result contract, not on which + backend implementation was selected underneath. +- Do not use custom inference ops that only duplicate an existing op while + upcasting to float32, such as custom RMSNorm variants. Use the generic ComfyUI + ops and/or native torch ops instead. +- If a model class `__init__` has an `operations` parameter, assume + `operations` is never `None`. Do not add fallback branches or default torch + ops for a missing `operations` object. +- Do not add unnecessary parameters to model, model block, or model ops related + classes. Constructor and forward signatures should carry only values that are + actually needed by that object for inference. +- Reuse existing model classes, blocks, ops, and helper modules when appropriate. + Before implementing a new version of a model component, search the existing + model code for a class or helper that already provides the behavior. +- Avoid adding `einops` usage in core inference code. Use native torch tensor + ops such as `reshape`, `view`, `permute`, `transpose`, `flatten`, `unflatten`, + `unsqueeze`, and `squeeze` instead. +- Do not use tensors as general-purpose Python data structures. Keep metadata, + bookkeeping, counters, flags, shape math, padding math, index planning, memory + estimates, and control-flow decisions in plain Python values unless the data + must participate directly in tensor computation. Avoid creating temporary + tensors just to use tensor methods for scalar or structural calculations. +- Avoid unnecessary casts and transfers. Preserve the intended compute dtype, + storage dtype, bias dtype, and original tensor shape metadata. +- Assume inputs to the main model forward are already in the compute dtype by + default, except integer inputs such as some model timestep tensors. Do not add + defensive or convenience casts in model code; it is better for invalid dtype + plumbing to error clearly than to hide it with unnecessary casts. +- Raw model parameters that are not owned by an op and may be initialized in a + dtype different from the compute dtype should be cast at use in forward or + inference code with `comfy.ops.cast_to_input` or + `comfy.model_management.cast_to` to avoid dtype mismatches. +- Model code should not care what dtype it is initialized in, and model + `__init__` methods should not contain workarounds for specific dtypes. Dtype + workaround code, such as making a model work with fp16 compute, belongs in the + execution or model-management layer that owns compute policy. +- Model code should not perform unnecessary device-to-CPU or CPU-to-device + transfers. New allocations must be created on the correct device and dtype; + never allocate on CPU and then move to GPU, or allocate in one dtype and then + convert to another. +- Model code itself should not perform memory management. Loading, unloading, + offloading, device movement, VRAM policy, cache lifetime, and cleanup belong + in the relevant model-management and execution layers, not inside model + implementations. +- Do not add global, module-level, class-level, singleton, or model-owned stores + for tensors or other large memory that persist across executions. Temporary + caches must be scoped to a single execution or forward/encode/decode call: + allocate them in the owning top-level call, pass them explicitly through the + call stack, and let them be discarded when that call returns. +- Follow the Wan VAE temporal cache pattern for temporary caches: create a local + cache such as `feat_map` for the encode/decode operation, pass it into the + blocks that need it, and do not retain it on the model or in global state. +- In model init code, prefer `torch.empty` for parameter/buffer placeholders + that are populated from the model state dict instead of zero-initializing with + `torch.zeros` or similar. If an allocation is not loaded from the state dict + and is useless for inference, do not include it. +- `nn.Parameter` tensors that are stored in and populated from the model state + dict should be initialized with `torch.empty`, not with zero, random, or + otherwise meaningful initialization. +- Model initialization should describe module structure, not fabricate + checkpoint-owned tensor contents. Parameters and buffers that are loaded from + the state dict must not be manually initialized, reassigned, or filled with + fallback values unless that value is actually used when no checkpoint key + exists. +- When slicing large tensors, copy the slice if the sliced tensor's lifetime + exceeds the current function scope. Do not keep a long-lived view into a large + backing tensor when a smaller copy would release memory sooner. +- Use fused or compound torch operations such as `addcmul` when they naturally + match the math. Reducing Python and torch dispatch overhead is a valid + optimization when it does not obscure the code or change dtype/device + behavior. +- Avoid caches that persist across different executions as much as possible. + Persistent caches are acceptable only when they use a very minimal amount of + memory and have a clear ownership and invalidation story. +- When optimizing, favor small measurable changes: fewer allocations, fewer + device transfers, less peak memory, better batching, or use of a faster + existing backend op. + +## Nodes and User-Facing Behavior + +- Follow existing node conventions: `INPUT_TYPES`, `RETURN_TYPES`, `FUNCTION`, + `CATEGORY`, and registration through the local mapping used by that file. +- Keep node changes backward compatible by default. Add inputs with sensible + defaults and avoid changing output types unless the request requires it. +- Model implementations should add the minimal number of ComfyUI nodes required + to run the model. Reuse existing nodes as much as possible; adapting the model + to work with existing nodes is strongly preferred over creating new nodes. +- Node-level code must not patch model code directly. Any node behavior that + modifies, wraps, hooks, or changes model behavior must go through the model + patcher class instead of reaching into model internals. +- The official mascot of ComfyUI is a very cute anime girl with massive fennec + ears, a big fluffy tail, long blonde wavy hair, and blue eyes. Feel free to + use her in ComfyUI materials, UI text, examples, tests, generated assets, or + comments, but do not disrespect her. +- Warning and info messages should be short and actionable. Remove noisy or + misleading messages rather than adding more logging. +- Documentation and README edits should be concise, factual, and tied to the + changed behavior. + +## Commit and Review Habits + +- If asked to write commit messages, use short direct subjects like the existing + history: `Fix ...`, `Add ...`, `Support ...`, `Remove ...`, `Update ...`, + `Make ...`, `Use ...`, `Disable ...`, `Bump ...`, or `Revert ...`. +- Keep PR descriptions short and reviewable. State the problem, the behavioral + change, and the tests run; avoid long narrative explanations, implementation + diaries, or exhaustive file-by-file summaries unless the reviewer explicitly + needs that context. +- Prefer one coherent behavioral change per commit. Dependency pins, tests, and + the code that needs them may be in the same commit when they are inseparable. +- In reviews, prioritize real user impact: crashes, wrong dtype/device behavior, + memory regressions, broken model loading, workflow incompatibility, and noisy + or misleading user-facing output. diff --git a/app/assets/api/routes.py b/app/assets/api/routes.py index 7ef462f5c..53c84eff3 100644 --- a/app/assets/api/routes.py +++ b/app/assets/api/routes.py @@ -306,12 +306,15 @@ async def download_asset_content(request: web.Request) -> web.Response: 404, "FILE_NOT_FOUND", "Underlying file not found on disk." ) - _DANGEROUS_MIME_TYPES = { - "text/html", "text/html-sandboxed", "application/xhtml+xml", - "text/javascript", "text/css", - } - if content_type in _DANGEROUS_MIME_TYPES: + # User-controlled asset content must never render inline in the app origin + # (stored XSS via SVG/HTML/XML). Force dangerous types to download and + # override any requested inline disposition. Centralised through + # folder_paths.is_dangerous_content_type so this can't drift from /view and + # /userdata (the previous inline set here omitted image/svg+xml and missed + # the charset/casing/+xml-dialect bypasses). + if folder_paths.is_dangerous_content_type(content_type): content_type = "application/octet-stream" + disposition = "attachment" safe_name = (filename or "").replace("\r", "").replace("\n", "") encoded = urllib.parse.quote(safe_name) diff --git a/app/model_manager.py b/app/model_manager.py index 8f6e34b33..b0329ce17 100644 --- a/app/model_manager.py +++ b/app/model_manager.py @@ -50,21 +50,45 @@ class ModelFileManager: @routes.get("/experiment/models/preview/{folder}/{path_index}/{filename:.*}") async def get_model_preview(request): folder_name = request.match_info.get("folder", None) - path_index = int(request.match_info.get("path_index", None)) filename = request.match_info.get("filename", None) if folder_name not in folder_paths.folder_names_and_paths: return web.Response(status=404) + # The "{filename:.*}" capture also matches the empty string, which + # would resolve to the folder itself; reject it explicitly. + if not filename: + return web.Response(status=400) + + try: + path_index = int(request.match_info.get("path_index", None)) + except (TypeError, ValueError): + return web.Response(status=400) + folders = folder_paths.folder_names_and_paths[folder_name] + if path_index < 0 or path_index >= len(folders[0]): + return web.Response(status=404) folder = folders[0][path_index] - full_filename = os.path.join(folder, filename) + full_filename = os.path.normpath(os.path.join(folder, filename)) + + # Prevent path traversal: the requested file must stay within the + # configured model folder. `filename` is an unrestricted ".*" capture, + # so values like "../../../../etc/passwd" would otherwise escape it. + if not folder_paths.is_within_directory(folder, full_filename): + return web.Response(status=403) previews = self.get_model_previews(full_filename) default_preview = previews[0] if len(previews) > 0 else None if default_preview is None or (isinstance(default_preview, str) and not os.path.isfile(default_preview)): return web.Response(status=404) + # The preview is selected by a glob inside get_model_previews, so a + # companion file (e.g. "model.preview.png") could itself be a symlink + # resolving outside the model folder. Re-validate the file actually + # opened: is_within_directory realpaths it, catching symlink escape. + if isinstance(default_preview, str) and not folder_paths.is_within_directory(folder, default_preview): + return web.Response(status=403) + try: with Image.open(default_preview) as img: img_bytes = BytesIO() diff --git a/app/user_manager.py b/app/user_manager.py index 7b11e381c..de261ad39 100644 --- a/app/user_manager.py +++ b/app/user_manager.py @@ -6,6 +6,7 @@ import glob import shutil import logging import tempfile +import mimetypes from aiohttp import web from urllib import parse from comfy.cli_args import args @@ -336,7 +337,20 @@ class UserManager(): if not isinstance(path, str): return path - return web.FileResponse(path) + # User data files are arbitrary user-supplied content and are never + # meant to render inline. Disable MIME sniffing and force a download + # so uploaded markup/scripts can't execute in the app origin (stored + # XSS). Content-Disposition: attachment is the load-bearing guard; + # the content-type override and nosniff are defence in depth. + content_type = mimetypes.guess_type(path)[0] or 'application/octet-stream' + if folder_paths.is_dangerous_content_type(content_type): + content_type = 'application/octet-stream' + + return web.FileResponse(path, headers={ + "Content-Type": content_type, + "X-Content-Type-Options": "nosniff", + "Content-Disposition": "attachment", + }) @routes.post("/userdata/{file}") async def post_userdata(request): diff --git a/comfy/cli_args.py b/comfy/cli_args.py index e3099a230..4bef096fb 100644 --- a/comfy/cli_args.py +++ b/comfy/cli_args.py @@ -240,6 +240,7 @@ database_default_path = os.path.abspath( ) parser.add_argument("--database-url", type=str, default=f"sqlite:///{database_default_path}", help="Specify the database URL, e.g. for an in-memory database you can use 'sqlite:///:memory:'.") parser.add_argument("--enable-assets", action="store_true", help="Enable the assets system (API routes, database synchronization, and background scanning).") +parser.add_argument("--enable-asset-hashing", action="store_true", help="Compute blake3 content hashes when scanning assets. Hashing enables future asset-portability features (deduplication, cross-machine model resolution) but adds startup cost and per-output cost on large models directories. Off by default; enable to opt in.") parser.add_argument("--feature-flag", type=str, action='append', default=[], metavar="KEY[=VALUE]", help="Set a server feature flag. Use KEY=VALUE to set an explicit value, or bare KEY to set it to true. Can be specified multiple times. Boolean values (true/false) and numbers are auto-converted. Examples: --feature-flag show_signin_button=true or --feature-flag show_signin_button") parser.add_argument("--list-feature-flags", action="store_true", help="Print the registry of known CLI-settable feature flags as JSON and exit.") diff --git a/comfy/text_encoders/qwen3vl.py b/comfy/text_encoders/qwen3vl.py index 59c9aae6d..2082c42e7 100644 --- a/comfy/text_encoders/qwen3vl.py +++ b/comfy/text_encoders/qwen3vl.py @@ -167,7 +167,7 @@ class Qwen3VLTokenizer(sd1_clip.SD1Tokenizer): embed_count = 0 for r in tokens[key_name]: for i in range(len(r)): - if r[i][0] == 151655: # <|image_pad|> + if isinstance(r[i][0], (int, float)) and r[i][0] == 151655: # <|image_pad|> if len(images) > embed_count: r[i] = ({"type": "image", "data": images[embed_count], "original_type": "image"},) + r[i][1:] embed_count += 1 diff --git a/comfy_api_nodes/apis/gemini.py b/comfy_api_nodes/apis/gemini.py index caaba8f36..7b2543270 100644 --- a/comfy_api_nodes/apis/gemini.py +++ b/comfy_api_nodes/apis/gemini.py @@ -121,6 +121,7 @@ class GeminiGenerationConfig(BaseModel): topK: int | None = Field(None, ge=1) topP: float | None = Field(None, ge=0.0, le=1.0) thinkingConfig: GeminiThinkingConfig | None = Field(None) + responseModalities: list[str] | None = Field(None) class GeminiImageOutputOptions(BaseModel): diff --git a/comfy_api_nodes/apis/ideogram.py b/comfy_api_nodes/apis/ideogram.py index c5ad9559f..ee3256e96 100644 --- a/comfy_api_nodes/apis/ideogram.py +++ b/comfy_api_nodes/apis/ideogram.py @@ -33,53 +33,6 @@ class IdeogramColorPalette( ) -class ImageRequest(BaseModel): - aspect_ratio: Optional[str] = Field( - None, - description="Optional. The aspect ratio (e.g., 'ASPECT_16_9', 'ASPECT_1_1'). Cannot be used with resolution. Defaults to 'ASPECT_1_1' if unspecified.", - ) - color_palette: Optional[Dict[str, Any]] = Field( - None, description='Optional. Color palette object. Only for V_2, V_2_TURBO.' - ) - magic_prompt_option: Optional[str] = Field( - None, description="Optional. MagicPrompt usage ('AUTO', 'ON', 'OFF')." - ) - model: str = Field(..., description="The model used (e.g., 'V_2', 'V_2A_TURBO')") - negative_prompt: Optional[str] = Field( - None, - description='Optional. Description of what to exclude. Only for V_1, V_1_TURBO, V_2, V_2_TURBO.', - ) - num_images: Optional[int] = Field( - 1, - description='Optional. Number of images to generate (1-8). Defaults to 1.', - ge=1, - le=8, - ) - prompt: str = Field( - ..., description='Required. The prompt to use to generate the image.' - ) - resolution: Optional[str] = Field( - None, - description="Optional. Resolution (e.g., 'RESOLUTION_1024_1024'). Only for model V_2. Cannot be used with aspect_ratio.", - ) - seed: Optional[int] = Field( - None, - description='Optional. A number between 0 and 2147483647.', - ge=0, - le=2147483647, - ) - style_type: Optional[str] = Field( - None, - description="Optional. Style type ('AUTO', 'GENERAL', 'REALISTIC', 'DESIGN', 'RENDER_3D', 'ANIME'). Only for models V_2 and above.", - ) - - -class IdeogramGenerateRequest(BaseModel): - image_request: ImageRequest = Field( - ..., description='The image generation request parameters.' - ) - - class Datum(BaseModel): is_image_safe: Optional[bool] = Field( None, description='Indicates whether the image is considered safe.' @@ -113,20 +66,6 @@ class StyleCode(RootModel[str]): root: str = Field(..., pattern='^[0-9A-Fa-f]{8}$') -class Datum1(BaseModel): - is_image_safe: Optional[bool] = None - prompt: Optional[str] = None - resolution: Optional[str] = None - seed: Optional[int] = None - style_type: Optional[str] = None - url: Optional[str] = None - - -class IdeogramV3IdeogramResponse(BaseModel): - created: Optional[datetime] = None - data: Optional[List[Datum1]] = None - - class RenderingSpeed1(str, Enum): TURBO = 'TURBO' DEFAULT = 'DEFAULT' diff --git a/comfy_api_nodes/nodes_gemini.py b/comfy_api_nodes/nodes_gemini.py index a63625ada..aa992802d 100644 --- a/comfy_api_nodes/nodes_gemini.py +++ b/comfy_api_nodes/nodes_gemini.py @@ -13,7 +13,7 @@ import torch from typing_extensions import override import folder_paths -from comfy_api.latest import IO, ComfyExtension, Input, Types +from comfy_api.latest import IO, ComfyExtension, Input, InputImpl, Types from comfy_api_nodes.apis.gemini import ( GeminiContent, GeminiFileData, @@ -37,6 +37,7 @@ from comfy_api_nodes.util import ( audio_to_base64_string, bytesio_to_image_tensor, download_url_to_image_tensor, + download_url_to_video_output, get_number_of_images, sync_op, tensor_to_base64_string, @@ -45,6 +46,7 @@ from comfy_api_nodes.util import ( upload_images_to_comfyapi, upload_video_to_comfyapi, validate_string, + validate_video_duration, video_to_base64_string, ) @@ -229,10 +231,29 @@ async def get_image_from_response(response: GeminiGenerateContentResponse, thoug return torch.cat(image_tensors, dim=0) +async def get_video_from_response( + response: GeminiGenerateContentResponse, cls: type[IO.ComfyNode] | None = None +) -> InputImpl.VideoFromFile: + parts = get_parts_by_type(response, "video/*") + for part in parts: + if part.inlineData and part.inlineData.data: + return InputImpl.VideoFromFile(BytesIO(base64.b64decode(part.inlineData.data))) + if part.fileData and part.fileData.fileUri: + return await download_url_to_video_output(part.fileData.fileUri, cls=cls) + model_message = get_text_from_response(response).strip() + if model_message: + raise ValueError(f"Gemini did not generate a video. Model response: {model_message}") + raise ValueError( + "Gemini did not generate a video. Try rephrasing your prompt, " + "shortening the requested duration, or reducing the number of input images/videos." + ) + + def calculate_tokens_price(response: GeminiGenerateContentResponse) -> float | None: if not response.modelVersion: return None # Define prices (Cost per 1,000,000 tokens), see https://cloud.google.com/vertex-ai/generative-ai/pricing + output_video_tokens_price = 0.0 if response.modelVersion == "gemini-2.5-pro": input_tokens_price = 1.25 output_text_tokens_price = 10.0 @@ -249,18 +270,27 @@ def calculate_tokens_price(response: GeminiGenerateContentResponse) -> float | N input_tokens_price = 2 output_text_tokens_price = 12.0 output_image_tokens_price = 0.0 - elif response.modelVersion == "gemini-3.1-flash-lite-preview": + elif response.modelVersion in ("gemini-3.1-flash-lite-preview", "gemini-3.1-flash-lite"): input_tokens_price = 0.25 output_text_tokens_price = 1.50 output_image_tokens_price = 0.0 - elif response.modelVersion == "gemini-3-pro-image-preview": + elif response.modelVersion in ("gemini-3-pro-image-preview", "gemini-3-pro-image"): input_tokens_price = 2 output_text_tokens_price = 12.0 output_image_tokens_price = 120.0 - elif response.modelVersion == "gemini-3.1-flash-image-preview": + elif response.modelVersion in ("gemini-3.1-flash-image-preview", "gemini-3.1-flash-image"): input_tokens_price = 0.5 output_text_tokens_price = 3.0 output_image_tokens_price = 60.0 + elif response.modelVersion == "gemini-3.1-flash-lite-image": + input_tokens_price = 0.25 + output_text_tokens_price = 1.50 + output_image_tokens_price = 30.0 + elif response.modelVersion == "gemini-omni-flash-preview": + input_tokens_price = 2.145 + output_text_tokens_price = 12.87 + output_image_tokens_price = 0.0 + output_video_tokens_price = 25.025 else: return None final_price = response.usageMetadata.promptTokenCount * input_tokens_price @@ -268,6 +298,8 @@ def calculate_tokens_price(response: GeminiGenerateContentResponse) -> float | N for i in response.usageMetadata.candidatesTokensDetails: if i.modality == Modality.IMAGE: final_price += output_image_tokens_price * i.tokenCount # for Nano Banana models + elif i.modality == Modality.VIDEO: + final_price += output_video_tokens_price * i.tokenCount # for Omni Flash else: final_price += output_text_tokens_price * i.tokenCount if response.usageMetadata.thoughtsTokenCount: @@ -1302,7 +1334,7 @@ class GeminiNanoBanana2(IO.ComfyNode): ) -def _nano_banana_2_v2_model_inputs(): +def _nano_banana_2_v2_model_inputs(resolutions: list[str]): return [ IO.Combo.Input( "aspect_ratio", @@ -1329,8 +1361,8 @@ def _nano_banana_2_v2_model_inputs(): ), IO.Combo.Input( "resolution", - options=["1K", "2K", "4K"], - tooltip="Target output resolution. For 2K/4K the native Gemini upscaler is used.", + options=resolutions, + tooltip="Target output resolution.", ), IO.Combo.Input( "thinking_level", @@ -1376,7 +1408,11 @@ class GeminiNanoBanana2V2(IO.ComfyNode): options=[ IO.DynamicCombo.Option( "Nano Banana 2 (Gemini 3.1 Flash Image)", - _nano_banana_2_v2_model_inputs(), + _nano_banana_2_v2_model_inputs(resolutions=["1K", "2K", "4K"]), + ), + IO.DynamicCombo.Option( + "Nano Banana 2 Lite", + _nano_banana_2_v2_model_inputs(resolutions=["1K"]), ), ], ), @@ -1445,9 +1481,13 @@ class GeminiNanoBanana2V2(IO.ComfyNode): depends_on=IO.PriceBadgeDepends(widgets=["model", "model.resolution"]), expr=""" ( - $r := $lookup(widgets, "model.resolution"); - $prices := {"1k": 0.0696, "2k": 0.1014, "4k": 0.154}; - {"type":"usd","usd": $lookup($prices, $r), "format":{"suffix":"/Image","approximate":true}} + $contains(widgets.model, "lite") + ? {"type":"usd","usd": 0.034, "format":{"suffix":"/Image","approximate":true}} + : ( + $r := $lookup(widgets, "model.resolution"); + $prices := {"1k": 0.0696, "2k": 0.1014, "4k": 0.154}; + {"type":"usd","usd": $lookup($prices, $r), "format":{"suffix":"/Image","approximate":true}} + ) ) """, ), @@ -1468,6 +1508,8 @@ class GeminiNanoBanana2V2(IO.ComfyNode): model_choice = model["model"] if model_choice == "Nano Banana 2 (Gemini 3.1 Flash Image)": model_id = "gemini-3.1-flash-image-preview" + elif model_choice == "Nano Banana 2 Lite": + model_id = "gemini-3.1-flash-lite-image" else: model_id = model_choice @@ -1517,6 +1559,149 @@ class GeminiNanoBanana2V2(IO.ComfyNode): ) +OMNI_MAX_IMAGES = 14 +OMNI_MAX_VIDEOS = 3 + +OMNI_MODELS: dict[str, str] = { + "Omni Flash": "gemini-omni-flash-preview", +} + + +def _omni_flash_inputs() -> list[Input]: + """Per-model inputs for the Omni video DynamicCombo (prompt + reference media + sampling).""" + return [ + IO.String.Input( + "prompt", + multiline=True, + default="", + tooltip="Describe the video to generate. Specify the length and aspect ratio directly in the " + 'prompt, e.g. "a 6-second clip in 16:9". Length may be 3-10 seconds; the aspect ratio must be ' + "16:9 (landscape) or 9:16 (portrait). The output is 720p, 24 FPS, with audio.", + ), + IO.Autogrow.Input( + "images", + template=IO.Autogrow.TemplateNames( + IO.Image.Input("image"), + names=[f"image_{i}" for i in range(1, OMNI_MAX_IMAGES + 1)], + min=0, + ), + tooltip=f"Optional reference image(s) to guide or animate the video. Up to {OMNI_MAX_IMAGES} images.", + ), + IO.Autogrow.Input( + "videos", + template=IO.Autogrow.TemplateNames( + IO.Video.Input("video"), + names=[f"video_{i}" for i in range(1, OMNI_MAX_VIDEOS + 1)], + min=0, + ), + tooltip=f"Optional reference video(s) to guide or edit. Up to {OMNI_MAX_VIDEOS} videos, " + f"each up to 10 seconds long.", + ), + IO.Float.Input( + "temperature", + default=1.0, + min=0.0, + max=2.0, + step=0.01, + tooltip="Controls randomness. Lower is more focused/deterministic, higher is more varied.", + advanced=True, + ), + IO.Float.Input( + "top_p", + default=0.95, + min=0.0, + max=1.0, + step=0.01, + tooltip="Nucleus sampling: sample from the smallest token set whose cumulative probability reaches top_p.", + advanced=True, + ), + ] + + +class GeminiVideoOmni(IO.ComfyNode): + + @classmethod + def define_schema(cls): + return IO.Schema( + node_id="GeminiVideoOmni", + display_name="Google Gemini Omni (Video)", + category="partner/video/Gemini", + essentials_category="Video Generation", + description="Generate a video with audio from a text prompt using Google's Gemini Omni Flash model. " + "Optionally provide reference images and/or videos to guide or edit the result. Describe the desired " + "length (3-10s) and aspect ratio (16:9 or 9:16) directly in the prompt.", + inputs=[ + IO.DynamicCombo.Input( + "model", + options=[ + IO.DynamicCombo.Option("Omni Flash", _omni_flash_inputs()), + ], + tooltip="The Gemini video model used to generate the video.", + ), + IO.Int.Input( + "seed", + default=42, + min=0, + max=2147483647, + control_after_generate=True, + tooltip="Seed controls whether the node should re-run; " + "results are non-deterministic regardless of seed.", + ), + ], + outputs=[ + IO.Video.Output(), + IO.String.Output(), + ], + hidden=[ + IO.Hidden.auth_token_comfy_org, + IO.Hidden.api_key_comfy_org, + IO.Hidden.unique_id, + ], + is_api_node=True, + price_badge=IO.PriceBadge( + expr='{"type":"usd","usd":0.146,"format":{"suffix":"/second","approximate":true}}' + ), + ) + + @classmethod + async def execute(cls, model: dict, seed: int) -> IO.NodeOutput: + prompt = model.get("prompt") or "" + validate_string(prompt, strip_whitespace=True, min_length=1) + model_id = OMNI_MODELS[model["model"]] + + images = [t for t in (model.get("images") or {}).values() if t is not None] + videos = [v for v in (model.get("videos") or {}).values() if v is not None] + if sum(get_number_of_images(t) for t in images) > OMNI_MAX_IMAGES: + raise ValueError(f"The current maximum number of supported images is {OMNI_MAX_IMAGES}.") + if len(videos) > OMNI_MAX_VIDEOS: + raise ValueError(f"The current maximum number of supported videos is {OMNI_MAX_VIDEOS}.") + for video in videos: + validate_video_duration(video, max_duration=10) + + parts: list[GeminiPart] = [] + if images or videos: + parts.extend(await build_gemini_media_parts(cls, images, [], videos)) + parts.append(GeminiPart(text=prompt)) + response = await sync_op( + cls, + ApiEndpoint(path=f"{GEMINI_BASE_ENDPOINT}/{model_id}", method="POST"), + data=GeminiGenerateContentRequest( + contents=[GeminiContent(role=GeminiRole.user, parts=parts)], + generationConfig=GeminiGenerationConfig( + responseModalities=["TEXT", "VIDEO"], + temperature=model.get("temperature", 1.0), + topP=model.get("top_p", 0.95), + ), + ), + response_model=GeminiGenerateContentResponse, + price_extractor=calculate_tokens_price, + ) + return IO.NodeOutput( + await get_video_from_response(response, cls=cls), + get_text_from_response(response), + ) + + class GeminiExtension(ComfyExtension): @override async def get_node_list(self) -> list[type[IO.ComfyNode]]: @@ -1527,6 +1712,7 @@ class GeminiExtension(ComfyExtension): GeminiImage2, GeminiNanoBanana2, GeminiNanoBanana2V2, + GeminiVideoOmni, GeminiInputFiles, ] diff --git a/comfy_api_nodes/nodes_ideogram.py b/comfy_api_nodes/nodes_ideogram.py index 3b914a850..cc0467987 100644 --- a/comfy_api_nodes/nodes_ideogram.py +++ b/comfy_api_nodes/nodes_ideogram.py @@ -5,9 +5,7 @@ from PIL import Image import numpy as np import torch from comfy_api_nodes.apis.ideogram import ( - IdeogramGenerateRequest, IdeogramGenerateResponse, - ImageRequest, IdeogramV3Request, IdeogramV3EditRequest, IdeogramV4Request, @@ -21,101 +19,6 @@ from comfy_api_nodes.util import ( validate_string, ) -V1_V1_RES_MAP = { - "Auto":"AUTO", - "512 x 1536":"RESOLUTION_512_1536", - "576 x 1408":"RESOLUTION_576_1408", - "576 x 1472":"RESOLUTION_576_1472", - "576 x 1536":"RESOLUTION_576_1536", - "640 x 1024":"RESOLUTION_640_1024", - "640 x 1344":"RESOLUTION_640_1344", - "640 x 1408":"RESOLUTION_640_1408", - "640 x 1472":"RESOLUTION_640_1472", - "640 x 1536":"RESOLUTION_640_1536", - "704 x 1152":"RESOLUTION_704_1152", - "704 x 1216":"RESOLUTION_704_1216", - "704 x 1280":"RESOLUTION_704_1280", - "704 x 1344":"RESOLUTION_704_1344", - "704 x 1408":"RESOLUTION_704_1408", - "704 x 1472":"RESOLUTION_704_1472", - "720 x 1280":"RESOLUTION_720_1280", - "736 x 1312":"RESOLUTION_736_1312", - "768 x 1024":"RESOLUTION_768_1024", - "768 x 1088":"RESOLUTION_768_1088", - "768 x 1152":"RESOLUTION_768_1152", - "768 x 1216":"RESOLUTION_768_1216", - "768 x 1232":"RESOLUTION_768_1232", - "768 x 1280":"RESOLUTION_768_1280", - "768 x 1344":"RESOLUTION_768_1344", - "832 x 960":"RESOLUTION_832_960", - "832 x 1024":"RESOLUTION_832_1024", - "832 x 1088":"RESOLUTION_832_1088", - "832 x 1152":"RESOLUTION_832_1152", - "832 x 1216":"RESOLUTION_832_1216", - "832 x 1248":"RESOLUTION_832_1248", - "864 x 1152":"RESOLUTION_864_1152", - "896 x 960":"RESOLUTION_896_960", - "896 x 1024":"RESOLUTION_896_1024", - "896 x 1088":"RESOLUTION_896_1088", - "896 x 1120":"RESOLUTION_896_1120", - "896 x 1152":"RESOLUTION_896_1152", - "960 x 832":"RESOLUTION_960_832", - "960 x 896":"RESOLUTION_960_896", - "960 x 1024":"RESOLUTION_960_1024", - "960 x 1088":"RESOLUTION_960_1088", - "1024 x 640":"RESOLUTION_1024_640", - "1024 x 768":"RESOLUTION_1024_768", - "1024 x 832":"RESOLUTION_1024_832", - "1024 x 896":"RESOLUTION_1024_896", - "1024 x 960":"RESOLUTION_1024_960", - "1024 x 1024":"RESOLUTION_1024_1024", - "1088 x 768":"RESOLUTION_1088_768", - "1088 x 832":"RESOLUTION_1088_832", - "1088 x 896":"RESOLUTION_1088_896", - "1088 x 960":"RESOLUTION_1088_960", - "1120 x 896":"RESOLUTION_1120_896", - "1152 x 704":"RESOLUTION_1152_704", - "1152 x 768":"RESOLUTION_1152_768", - "1152 x 832":"RESOLUTION_1152_832", - "1152 x 864":"RESOLUTION_1152_864", - "1152 x 896":"RESOLUTION_1152_896", - "1216 x 704":"RESOLUTION_1216_704", - "1216 x 768":"RESOLUTION_1216_768", - "1216 x 832":"RESOLUTION_1216_832", - "1232 x 768":"RESOLUTION_1232_768", - "1248 x 832":"RESOLUTION_1248_832", - "1280 x 704":"RESOLUTION_1280_704", - "1280 x 720":"RESOLUTION_1280_720", - "1280 x 768":"RESOLUTION_1280_768", - "1280 x 800":"RESOLUTION_1280_800", - "1312 x 736":"RESOLUTION_1312_736", - "1344 x 640":"RESOLUTION_1344_640", - "1344 x 704":"RESOLUTION_1344_704", - "1344 x 768":"RESOLUTION_1344_768", - "1408 x 576":"RESOLUTION_1408_576", - "1408 x 640":"RESOLUTION_1408_640", - "1408 x 704":"RESOLUTION_1408_704", - "1472 x 576":"RESOLUTION_1472_576", - "1472 x 640":"RESOLUTION_1472_640", - "1472 x 704":"RESOLUTION_1472_704", - "1536 x 512":"RESOLUTION_1536_512", - "1536 x 576":"RESOLUTION_1536_576", - "1536 x 640":"RESOLUTION_1536_640", -} - -V1_V2_RATIO_MAP = { - "1:1":"ASPECT_1_1", - "4:3":"ASPECT_4_3", - "3:4":"ASPECT_3_4", - "16:9":"ASPECT_16_9", - "9:16":"ASPECT_9_16", - "2:1":"ASPECT_2_1", - "1:2":"ASPECT_1_2", - "3:2":"ASPECT_3_2", - "2:3":"ASPECT_2_3", - "4:5":"ASPECT_4_5", - "5:4":"ASPECT_5_4", -} V3_RATIO_MAP = { "1:3":"1x3", @@ -229,298 +132,6 @@ async def download_and_process_images(image_urls): return stacked_tensors -class IdeogramV1(IO.ComfyNode): - - @classmethod - def define_schema(cls): - return IO.Schema( - node_id="IdeogramV1", - display_name="Ideogram V1", - category="partner/image/Ideogram", - description="Generates images using the Ideogram V1 model.", - inputs=[ - IO.String.Input( - "prompt", - multiline=True, - default="", - tooltip="Prompt for the image generation", - ), - IO.Boolean.Input( - "turbo", - default=False, - tooltip="Whether to use turbo mode (faster generation, potentially lower quality)", - ), - IO.Combo.Input( - "aspect_ratio", - options=list(V1_V2_RATIO_MAP.keys()), - default="1:1", - tooltip="The aspect ratio for image generation.", - optional=True, - ), - IO.Combo.Input( - "magic_prompt_option", - options=["AUTO", "ON", "OFF"], - default="AUTO", - tooltip="Determine if MagicPrompt should be used in generation", - optional=True, - advanced=True, - ), - IO.Int.Input( - "seed", - default=0, - min=0, - max=2147483647, - step=1, - control_after_generate=True, - display_mode=IO.NumberDisplay.number, - optional=True, - ), - IO.String.Input( - "negative_prompt", - multiline=True, - default="", - tooltip="Description of what to exclude from the image", - optional=True, - ), - IO.Int.Input( - "num_images", - default=1, - min=1, - max=8, - step=1, - display_mode=IO.NumberDisplay.number, - optional=True, - ), - ], - outputs=[ - IO.Image.Output(), - ], - hidden=[ - IO.Hidden.auth_token_comfy_org, - IO.Hidden.api_key_comfy_org, - IO.Hidden.unique_id, - ], - is_api_node=True, - price_badge=IO.PriceBadge( - depends_on=IO.PriceBadgeDepends(widgets=["num_images", "turbo"]), - expr=""" - ( - $n := widgets.num_images; - $base := (widgets.turbo = true) ? 0.0286 : 0.0858; - {"type":"usd","usd": $round($base * $n, 2)} - ) - """, - ), - ) - - @classmethod - async def execute( - cls, - prompt, - turbo=False, - aspect_ratio="1:1", - magic_prompt_option="AUTO", - seed=0, - negative_prompt="", - num_images=1, - ): - # Determine the model based on turbo setting - aspect_ratio = V1_V2_RATIO_MAP.get(aspect_ratio, None) - model = "V_1_TURBO" if turbo else "V_1" - - response = await sync_op( - cls, - ApiEndpoint(path="/proxy/ideogram/generate", method="POST"), - response_model=IdeogramGenerateResponse, - data=IdeogramGenerateRequest( - image_request=ImageRequest( - prompt=prompt, - model=model, - num_images=num_images, - seed=seed, - aspect_ratio=aspect_ratio if aspect_ratio != "ASPECT_1_1" else None, - magic_prompt_option=(magic_prompt_option if magic_prompt_option != "AUTO" else None), - negative_prompt=negative_prompt if negative_prompt else None, - ) - ), - max_retries=1, - ) - - if not response.data or len(response.data) == 0: - raise Exception("No images were generated in the response") - - image_urls = [image_data.url for image_data in response.data if image_data.url] - if not image_urls: - raise Exception("No image URLs were generated in the response") - return IO.NodeOutput(await download_and_process_images(image_urls)) - - -class IdeogramV2(IO.ComfyNode): - - @classmethod - def define_schema(cls): - return IO.Schema( - node_id="IdeogramV2", - display_name="Ideogram V2", - category="partner/image/Ideogram", - description="Generates images using the Ideogram V2 model.", - inputs=[ - IO.String.Input( - "prompt", - multiline=True, - default="", - tooltip="Prompt for the image generation", - ), - IO.Boolean.Input( - "turbo", - default=False, - tooltip="Whether to use turbo mode (faster generation, potentially lower quality)", - ), - IO.Combo.Input( - "aspect_ratio", - options=list(V1_V2_RATIO_MAP.keys()), - default="1:1", - tooltip="The aspect ratio for image generation. Ignored if resolution is not set to AUTO.", - optional=True, - ), - IO.Combo.Input( - "resolution", - options=list(V1_V1_RES_MAP.keys()), - default="Auto", - tooltip="The resolution for image generation. " - "If not set to AUTO, this overrides the aspect_ratio setting.", - optional=True, - ), - IO.Combo.Input( - "magic_prompt_option", - options=["AUTO", "ON", "OFF"], - default="AUTO", - tooltip="Determine if MagicPrompt should be used in generation", - optional=True, - advanced=True, - ), - IO.Int.Input( - "seed", - default=0, - min=0, - max=2147483647, - step=1, - control_after_generate=True, - display_mode=IO.NumberDisplay.number, - optional=True, - ), - IO.Combo.Input( - "style_type", - options=["AUTO", "GENERAL", "REALISTIC", "DESIGN", "RENDER_3D", "ANIME"], - default="NONE", - tooltip="Style type for generation (V2 only)", - optional=True, - advanced=True, - ), - IO.String.Input( - "negative_prompt", - multiline=True, - default="", - tooltip="Description of what to exclude from the image", - optional=True, - ), - IO.Int.Input( - "num_images", - default=1, - min=1, - max=8, - step=1, - display_mode=IO.NumberDisplay.number, - optional=True, - ), - #"color_palette": ( - # IO.STRING, - # { - # "multiline": False, - # "default": "", - # "tooltip": "Color palette preset name or hex colors with weights", - # }, - #), - ], - outputs=[ - IO.Image.Output(), - ], - hidden=[ - IO.Hidden.auth_token_comfy_org, - IO.Hidden.api_key_comfy_org, - IO.Hidden.unique_id, - ], - is_api_node=True, - price_badge=IO.PriceBadge( - depends_on=IO.PriceBadgeDepends(widgets=["num_images", "turbo"]), - expr=""" - ( - $n := widgets.num_images; - $base := (widgets.turbo = true) ? 0.0715 : 0.1144; - {"type":"usd","usd": $round($base * $n, 2)} - ) - """, - ), - ) - - @classmethod - async def execute( - cls, - prompt, - turbo=False, - aspect_ratio="1:1", - resolution="Auto", - magic_prompt_option="AUTO", - seed=0, - style_type="NONE", - negative_prompt="", - num_images=1, - color_palette="", - ): - aspect_ratio = V1_V2_RATIO_MAP.get(aspect_ratio, None) - resolution = V1_V1_RES_MAP.get(resolution, None) - # Determine the model based on turbo setting - model = "V_2_TURBO" if turbo else "V_2" - - # Handle resolution vs aspect_ratio logic - # If resolution is not AUTO, it overrides aspect_ratio - final_resolution = None - final_aspect_ratio = None - - if resolution != "AUTO": - final_resolution = resolution - else: - final_aspect_ratio = aspect_ratio if aspect_ratio != "ASPECT_1_1" else None - - response = await sync_op( - cls, - endpoint=ApiEndpoint(path="/proxy/ideogram/generate", method="POST"), - response_model=IdeogramGenerateResponse, - data=IdeogramGenerateRequest( - image_request=ImageRequest( - prompt=prompt, - model=model, - num_images=num_images, - seed=seed, - aspect_ratio=final_aspect_ratio, - resolution=final_resolution, - magic_prompt_option=(magic_prompt_option if magic_prompt_option != "AUTO" else None), - style_type=style_type if style_type != "NONE" else None, - negative_prompt=negative_prompt if negative_prompt else None, - color_palette=color_palette if color_palette else None, - ) - ), - max_retries=1, - ) - if not response.data or len(response.data) == 0: - raise Exception("No images were generated in the response") - - image_urls = [image_data.url for image_data in response.data if image_data.url] - if not image_urls: - raise Exception("No image URLs were generated in the response") - return IO.NodeOutput(await download_and_process_images(image_urls)) - - class IdeogramV3(IO.ComfyNode): @classmethod @@ -917,8 +528,6 @@ class IdeogramExtension(ComfyExtension): @override async def get_node_list(self) -> list[type[IO.ComfyNode]]: return [ - IdeogramV1, - IdeogramV2, IdeogramV3, IdeogramV4, ] diff --git a/comfy_extras/nodes_cond.py b/comfy_extras/nodes_cond.py index b745a43af..c8091b7a4 100644 --- a/comfy_extras/nodes_cond.py +++ b/comfy_extras/nodes_cond.py @@ -8,7 +8,8 @@ class CLIPTextEncodeControlnet(io.ComfyNode): def define_schema(cls) -> io.Schema: return io.Schema( node_id="CLIPTextEncodeControlnet", - category="experimental/conditioning", + display_name="CLIP Text Encode (Controlnet)", + category="model/conditioning", inputs=[ io.Clip.Input("clip"), io.Conditioning.Input("conditioning"), @@ -35,11 +36,12 @@ class T5TokenizerOptions(io.ComfyNode): def define_schema(cls) -> io.Schema: return io.Schema( node_id="T5TokenizerOptions", - category="experimental/conditioning", + display_name="T5 Tokenizer Options", + category="model/conditioning", inputs=[ io.Clip.Input("clip"), - io.Int.Input("min_padding", default=0, min=0, max=10000, step=1, advanced=True), - io.Int.Input("min_length", default=0, min=0, max=10000, step=1, advanced=True), + io.Int.Input("min_padding", default=0, min=0, max=10000, step=1), + io.Int.Input("min_length", default=0, min=0, max=10000, step=1), ], outputs=[io.Clip.Output()], is_experimental=True, diff --git a/comfy_extras/nodes_custom_sampler.py b/comfy_extras/nodes_custom_sampler.py index c9d7e06fc..56ef5f526 100644 --- a/comfy_extras/nodes_custom_sampler.py +++ b/comfy_extras/nodes_custom_sampler.py @@ -1070,7 +1070,7 @@ class AddNoise(io.ComfyNode): def define_schema(cls): return io.Schema( node_id="AddNoise", - category="experimental/custom_sampling/noise", + category="model/sampling/noise", is_experimental=True, inputs=[ io.Model.Input("model"), @@ -1120,7 +1120,7 @@ class ManualSigmas(io.ComfyNode): return io.Schema( node_id="ManualSigmas", search_aliases=["custom noise schedule", "define sigmas"], - category="experimental/custom_sampling", + category="model/sampling/sigmas", is_experimental=True, inputs=[ io.String.Input("sigmas", default="1, 0.5", multiline=False) diff --git a/comfy_extras/nodes_photomaker.py b/comfy_extras/nodes_photomaker.py index 8a2248572..72fad1673 100644 --- a/comfy_extras/nodes_photomaker.py +++ b/comfy_extras/nodes_photomaker.py @@ -123,7 +123,8 @@ class PhotoMakerLoader(io.ComfyNode): def define_schema(cls): return io.Schema( node_id="PhotoMakerLoader", - category="experimental/photomaker", + display_name="Load PhotoMaker Model", + category="model/loaders", inputs=[ io.Combo.Input("photomaker_model_name", options=folder_paths.get_filename_list("photomaker")), ], @@ -149,7 +150,8 @@ class PhotoMakerEncode(io.ComfyNode): def define_schema(cls): return io.Schema( node_id="PhotoMakerEncode", - category="experimental/photomaker", + display_name="PhotoMaker Encode", + category="model/conditioning/photomaker", inputs=[ io.Photomaker.Input("photomaker"), io.Image.Input("image"), diff --git a/comfy_extras/nodes_stable_cascade.py b/comfy_extras/nodes_stable_cascade.py index 6a78ffb47..ddfb4f2b0 100644 --- a/comfy_extras/nodes_stable_cascade.py +++ b/comfy_extras/nodes_stable_cascade.py @@ -119,7 +119,7 @@ class StableCascade_SuperResolutionControlnet(io.ComfyNode): def define_schema(cls): return io.Schema( node_id="StableCascade_SuperResolutionControlnet", - category="experimental/stable_cascade", + category="experimental/stable cascade", is_experimental=True, inputs=[ io.Image.Input("image"), diff --git a/comfy_extras/nodes_triposplat.py b/comfy_extras/nodes_triposplat.py index 7bf4703fe..c892213e4 100644 --- a/comfy_extras/nodes_triposplat.py +++ b/comfy_extras/nodes_triposplat.py @@ -143,7 +143,7 @@ class VAEDecodeTripoSplat(IO.ComfyNode): return IO.Schema( node_id="VAEDecodeTripoSplat", display_name="TripoSplat Decode", - category="3d/latent", + category="model/latent/triposplat", description="Decode the sampled TripoSplat latent into a 3D gaussian splat. " "Modify the number of gaussians to vary the density.", inputs=[ @@ -188,7 +188,7 @@ class TripoSplatSamplingPreview(IO.ComfyNode): return IO.Schema( node_id="TripoSplatSamplingPreview", display_name="TripoSplat Sampling Preview", - category="3d/latent", + category="model/latent/triposplat", description="Patch the TripoSplat model for the standard Ksampler node to show a live decoded " "gaussian splat preview at each step.", inputs=[ diff --git a/comfyui_version.py b/comfyui_version.py index f8db561ba..8e9967f1b 100644 --- a/comfyui_version.py +++ b/comfyui_version.py @@ -1,3 +1,3 @@ # This file is automatically generated by the build process when version is # updated in pyproject.toml. -__version__ = "0.26.0" +__version__ = "0.27.0" diff --git a/folder_paths.py b/folder_paths.py index 7304e1b73..ee048b0f2 100644 --- a/folder_paths.py +++ b/folder_paths.py @@ -264,6 +264,59 @@ def annotated_filepath(name: str) -> tuple[str, str | None]: return name, base_dir +# Content types a browser may execute or render inline. File endpoints that +# serve user-controlled content must force these to download (and ideally set +# Content-Disposition: attachment) to avoid stored XSS. Centralised here so the +# /view and /userdata handlers can't drift apart. mimetypes.guess_type may +# return either the text/* or application/* spelling depending on platform, so +# both are listed. +DANGEROUS_CONTENT_TYPES = { + 'text/html', 'text/html-sandboxed', 'application/xhtml+xml', + 'text/javascript', 'application/javascript', 'application/x-javascript', + 'application/ecmascript', 'text/css', + 'image/svg+xml', 'application/xml', 'text/xml', + # message/rfc822 (.mht/.mhtml) can carry script in some browsers. + 'message/rfc822', +} + + +def is_dangerous_content_type(content_type: str | None) -> bool: + """Return True if a browser may execute or render `content_type` inline. + + Normalises before matching so the check can't be slipped past with a + charset/boundary parameter (``text/html; charset=utf-8``) or casing + (``TEXT/HTML``). Any XML dialect (``*+xml`` or ``*/xml``) is treated as + dangerous because XML can carry inline script via stylesheet/entity tricks, + which also covers the ``application/{xslt,rss,atom,rdf}+xml`` family without + enumerating each one. Endpoints serving user-controlled content should route + a dangerous type to ``application/octet-stream`` + ``Content-Disposition: + attachment`` + ``X-Content-Type-Options: nosniff``. + """ + if not content_type: + return False + normalized = content_type.split(';', 1)[0].strip().lower() + if normalized in DANGEROUS_CONTENT_TYPES: + return True + return normalized.endswith('+xml') or normalized.endswith('/xml') + + +def is_within_directory(directory: str, target: str) -> bool: + """Return True if `target` resolves to a path inside `directory`. + + Uses realpath on both operands so that a symlink placed inside `directory` + that points elsewhere cannot escape the containment check at open time. + """ + try: + directory = os.path.realpath(directory) + target = os.path.realpath(target) + return os.path.commonpath((directory, target)) == directory + except ValueError: + # ValueError is raised by realpath() on a path with an embedded null + # byte, and by commonpath() on Windows when the paths are on different + # drives. In either case the target is not safely within the directory. + return False + + def get_annotated_filepath(name: str, default_dir: str | None=None) -> str: name, base_dir = annotated_filepath(name) @@ -273,7 +326,12 @@ def get_annotated_filepath(name: str, default_dir: str | None=None) -> str: else: base_dir = get_input_directory() # fallback path - return os.path.join(base_dir, name) + filepath = os.path.abspath(os.path.join(base_dir, name)) + # Prevent path traversal: the resolved path must stay within base_dir. + # repr() the name in the message so a crafted value can't inject log lines. + if not is_within_directory(base_dir, filepath): + raise ValueError("Invalid file path: {!r}".format(name)) + return filepath def exists_annotated_filepath(name) -> bool: @@ -282,7 +340,10 @@ def exists_annotated_filepath(name) -> bool: if base_dir is None: base_dir = get_input_directory() # fallback path - filepath = os.path.join(base_dir, name) + filepath = os.path.abspath(os.path.join(base_dir, name)) + # Treat traversal attempts as non-existent rather than probing the filesystem. + if not is_within_directory(base_dir, filepath): + return False return os.path.exists(filepath) diff --git a/main.py b/main.py index aa4ee2adb..20ec83c9e 100644 --- a/main.py +++ b/main.py @@ -403,7 +403,7 @@ def prompt_worker(q, server_instance): hook_breaker_ac10a0.restore_functions() if not asset_seeder.is_disabled(): - asset_seeder.enqueue_enrich(roots=("output",), compute_hashes=True) + asset_seeder.enqueue_enrich(roots=("output",), compute_hashes=args.enable_asset_hashing) asset_seeder.resume() @@ -458,7 +458,7 @@ def setup_database(): if dependencies_available(): init_db() if args.enable_assets: - if asset_seeder.start(roots=("models", "input", "output"), prune_first=True, compute_hashes=True): + if asset_seeder.start(roots=("models", "input", "output"), prune_first=True, compute_hashes=args.enable_asset_hashing): logging.info("Background asset scan initiated for models, input, output") except Exception as e: if "database is locked" in str(e): diff --git a/nodes.py b/nodes.py index 77c577b9a..9043a8d0a 100644 --- a/nodes.py +++ b/nodes.py @@ -349,7 +349,7 @@ class VAEDecodeTiled: RETURN_TYPES = ("IMAGE",) FUNCTION = "decode" - CATEGORY = "experimental" + CATEGORY = "model/latent" def decode(self, vae, samples, tile_size, overlap=64, temporal_size=64, temporal_overlap=8): if tile_size < overlap * 4: @@ -396,7 +396,7 @@ class VAEEncodeTiled: RETURN_TYPES = ("LATENT",) FUNCTION = "encode" - CATEGORY = "experimental" + CATEGORY = "model/latent" def encode(self, vae, pixels, tile_size, overlap, temporal_size=64, temporal_overlap=8): t = vae.encode_tiled(pixels, tile_x=tile_size, tile_y=tile_size, overlap=overlap, tile_t=temporal_size, overlap_t=temporal_overlap) @@ -514,7 +514,7 @@ class SaveLatent: OUTPUT_NODE = True - CATEGORY = "experimental" + CATEGORY = "model/latent" def save(self, samples, filename_prefix="ComfyUI", prompt=None, extra_pnginfo=None): full_output_folder, filename, counter, subfolder, filename_prefix = folder_paths.get_save_image_path(filename_prefix, self.output_dir) @@ -559,7 +559,7 @@ class LoadLatent: files = [f for f in os.listdir(input_dir) if os.path.isfile(os.path.join(input_dir, f)) and f.endswith(".latent")] return {"required": {"latent": [sorted(files), ]}, } - CATEGORY = "experimental" + CATEGORY = "model/latent" RETURN_TYPES = ("LATENT", ) FUNCTION = "load" @@ -2155,6 +2155,8 @@ NODE_DISPLAY_NAME_MAPPINGS = { "GLIGENTextBoxApply": "Apply GLIGEN Text Box", "ConditioningZeroOut": "Conditioning Zero Out", # Latent + "LoadLatent": "Load Latent", + "SaveLatent": "Save Latent", "VAEEncodeForInpaint": "VAE Encode (for Inpainting)", "SetLatentNoiseMask": "Set Latent Noise Mask", "VAEDecode": "VAE Decode", @@ -2189,7 +2191,6 @@ NODE_DISPLAY_NAME_MAPPINGS = { "ImageSharpen": "Sharpen Image", "ImageScaleToTotalPixels": "Scale Image to Total Pixels", "GetImageSize": "Get Image Size", - # experimental "VAEDecodeTiled": "VAE Decode (Tiled)", "VAEEncodeTiled": "VAE Encode (Tiled)", } diff --git a/pyproject.toml b/pyproject.toml index 2e8a85d3f..8c17e410e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "ComfyUI" -version = "0.26.0" +version = "0.27.0" readme = "README.md" license = { file = "LICENSE" } requires-python = ">=3.10" diff --git a/requirements.txt b/requirements.txt index eb7230b49..1d9fe4137 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,5 @@ comfyui-frontend-package==1.45.20 -comfyui-workflow-templates==0.10.7 +comfyui-workflow-templates==0.11.1 comfyui-embedded-docs==0.5.6 torch torchsde @@ -22,7 +22,7 @@ alembic SQLAlchemy>=2.0.0 filelock av>=16.0.0 -comfy-kitchen==0.2.15 +comfy-kitchen==0.2.16 comfy-aimdo==0.4.10 requests simpleeval>=1.0.0 diff --git a/server.py b/server.py index 361850f38..461ebe2f6 100644 --- a/server.py +++ b/server.py @@ -127,6 +127,7 @@ def create_cors_middleware(allowed_origin: str): return cors_middleware + def is_loopback(host): if host is None: return False @@ -616,15 +617,30 @@ class PromptServer(): or 'application/octet-stream' ) - # For security, force certain mimetypes to download instead of display - if content_type in {'text/html', 'text/html-sandboxed', 'application/xhtml+xml', 'text/javascript', 'text/css'}: - content_type = 'application/octet-stream' # Forces download + # For security, force renderable/active types (HTML, JS, + # CSS, SVG, XML — anything that can carry inline ' + files = {"file": ("evil.svg", svg, "image/svg+xml")} + form_data = { + "tags": json.dumps(["models", "checkpoints", "unit-tests", "svgxss"]), + "name": "evil.svg", + } + up = http.post(api_base + "/api/assets", files=files, data=form_data, timeout=120) + body = up.json() + assert up.status_code in (200, 201), body + aid = body["id"] + try: + r = http.get(f"{api_base}/api/assets/{aid}/content?disposition=inline", timeout=120) + r.content + assert r.status_code == 200 + ct = r.headers.get("Content-Type", "").lower() + cd = r.headers.get("Content-Disposition", "").lower() + assert "svg" not in ct, f"SVG served with a renderable content type: {ct!r}" + assert ct.startswith("application/octet-stream"), f"expected octet-stream, got {ct!r}" + assert "attachment" in cd, f"inline disposition not overridden to attachment: {cd!r}" + assert r.headers.get("X-Content-Type-Options", "").lower() == "nosniff" + finally: + with contextlib.suppress(Exception): + http.delete(f"{api_base}/api/assets/{aid}", timeout=30) + + def test_download_attachment_and_inline(http: requests.Session, api_base: str, seeded_asset: dict): aid = seeded_asset["id"] diff --git a/tests-unit/comfy_test/folder_path_test.py b/tests-unit/comfy_test/folder_path_test.py index 775e15c36..3b398e60b 100644 --- a/tests-unit/comfy_test/folder_path_test.py +++ b/tests-unit/comfy_test/folder_path_test.py @@ -53,8 +53,11 @@ def test_annotated_filepath(): def test_get_annotated_filepath(): default_dir = "/default/dir" - assert folder_paths.get_annotated_filepath("test.txt", default_dir) == os.path.join(default_dir, "test.txt") - assert folder_paths.get_annotated_filepath("test.txt [output]") == os.path.join(folder_paths.get_output_directory(), "test.txt") + # get_annotated_filepath now normalizes with os.path.abspath (part of the + # GHSA-779p traversal hardening), so compare against the normalized form — + # on Windows abspath also prepends the current drive letter. + assert folder_paths.get_annotated_filepath("test.txt", default_dir) == os.path.abspath(os.path.join(default_dir, "test.txt")) + assert folder_paths.get_annotated_filepath("test.txt [output]") == os.path.abspath(os.path.join(folder_paths.get_output_directory(), "test.txt")) def test_add_model_folder_path_append(clear_folder_paths): folder_paths.add_model_folder_path("test_folder", "/default/path", is_default=True) diff --git a/tests-unit/security_test/__init__.py b/tests-unit/security_test/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/tests-unit/security_test/test_ghsa_779p_02_preview_traversal.py b/tests-unit/security_test/test_ghsa_779p_02_preview_traversal.py new file mode 100644 index 000000000..f17fd26ea --- /dev/null +++ b/tests-unit/security_test/test_ghsa_779p_02_preview_traversal.py @@ -0,0 +1,192 @@ +"""CI unit tests for FIX #2 of GHSA-779p-m5rp-r4h4. + +Path traversal / hardening in app/model_manager.py get_model_preview +(route /experiment/models/preview/{folder}/{path_index}/{filename:.*}). + +Reference: https://github.com/Comfy-Org/ComfyUI/security/advisories/GHSA-779p-m5rp-r4h4 +""" +import pytest +import yarl +from io import BytesIO +from PIL import Image +from aiohttp import web +from unittest.mock import patch +from app.model_manager import ModelFileManager + +pytestmark = ( + pytest.mark.asyncio +) # This applies the asyncio mark to all test functions in the module + +@pytest.fixture +def model_manager(): + return ModelFileManager() + +@pytest.fixture +def app(model_manager): + app = web.Application() + routes = web.RouteTableDef() + model_manager.add_routes(routes) + app.add_routes(routes) + return app + + +async def test_legit_preview_returns_200(aiohttp_client, app, tmp_path): + """Sanity: a real preview PNG inside the model folder is served as webp 200.""" + img = Image.new('RGB', (16, 16), color=(255, 0, 128)) + img.save(tmp_path / "test_model.png", format='PNG') + + with patch('folder_paths.folder_names_and_paths', { + 'test_folder': ([str(tmp_path)], None) + }): + client = await aiohttp_client(app) + response = await client.get('/experiment/models/preview/test_folder/0/test_model.png') + + assert response.status == 200 + assert response.content_type == 'image/webp' + + img_bytes = BytesIO(await response.read()) + served = Image.open(img_bytes) + assert served.format + assert served.format.lower() == 'webp' + served.close() + + +async def test_non_integer_path_index_returns_400(aiohttp_client, app, tmp_path): + """A non-integer path_index segment must be rejected with 400.""" + with patch('folder_paths.folder_names_and_paths', { + 'test_folder': ([str(tmp_path)], None) + }): + client = await aiohttp_client(app) + response = await client.get('/experiment/models/preview/test_folder/abc/test_model.png') + + assert response.status == 400 + + +async def test_out_of_range_path_index_returns_404(aiohttp_client, app, tmp_path): + """A path_index beyond the configured folder list must return 404.""" + with patch('folder_paths.folder_names_and_paths', { + 'test_folder': ([str(tmp_path)], None) + }): + client = await aiohttp_client(app) + response = await client.get('/experiment/models/preview/test_folder/99/test_model.png') + + assert response.status == 404 + + +async def test_empty_filename_returns_400(aiohttp_client, app, tmp_path): + """The "{filename:.*}" capture also matches the empty string (trailing + slash). It would resolve to the folder itself and must be rejected with 400.""" + with patch('folder_paths.folder_names_and_paths', { + 'test_folder': ([str(tmp_path)], None) + }): + client = await aiohttp_client(app) + response = await client.get('/experiment/models/preview/test_folder/0/') + + assert response.status == 400 + + +async def test_path_traversal_in_filename_returns_403(aiohttp_client, app, tmp_path): + """Path traversal in {filename} must be rejected with 403 and must NOT read + a file outside the configured model directory. + + GOTCHA: aiohttp/yarl collapses literal ``../`` dot-segments out of the URL + path before it reaches the handler, which would make this test vacuously + pass (the request would hit a different/non-existent route). We percent-encode + the dots and slashes (``%2e%2e%2f``) and send the URL with + ``yarl.URL(..., encoded=True)`` so the bytes survive client-side normalization + untouched; aiohttp's router then percent-decodes them into ``match_info``, + delivering the literal ``../`` traversal to the handler's ``{filename:.*}`` + capture. + + Without the fix the handler computes + ``os.path.normpath(os.path.join(folder, "../../../../etc/hosts"))``, which + escapes ``tmp_path`` and would be passed straight to get_model_previews -> + Image.open, serving bytes from outside the model dir (200/served bytes). The + is_within_directory() containment check is the load-bearing fix that turns + that escape into a 403. + """ + # Sanity-anchor: a legit preview exists inside tmp_path, so a 200 path is + # genuinely reachable — proving the 403 below is the containment check + # firing, not an unrelated 404. + img = Image.new('RGB', (16, 16), color=(255, 0, 128)) + img.save(tmp_path / "test_model.png", format='PNG') + + # Percent-encoded "../../../../etc/hosts" so yarl does not collapse the + # dot-segments before the request leaves the client. + encoded_traversal = '%2e%2e%2f' * 4 + 'etc%2fhosts' + raw_path = '/experiment/models/preview/test_folder/0/' + encoded_traversal + url = yarl.URL(raw_path, encoded=True) + + with patch('folder_paths.folder_names_and_paths', { + 'test_folder': ([str(tmp_path)], None) + }): + client = await aiohttp_client(app) + response = await client.get(url) + + # Confirm the traversal actually reached the handler intact: a 200 here + # would mean either normalization stripped the ``../`` (vacuous pass) or + # the containment check failed open and served outside-dir bytes. + assert response.status == 403, ( + f"expected 403 from is_within_directory() containment check, " + f"got {response.status}; traversal may have been normalized away " + f"or the fix failed open" + ) + body = await response.read() + assert body == b"", "403 response must not carry any file bytes" + + +async def test_symlink_companion_preview_returns_403(aiohttp_client, app, tmp_path): + """A companion preview file is selected by a glob inside get_model_previews + and then opened. If that companion is a symlink whose path is in-dir but + whose target escapes the model folder, it must be rejected with 403 — not + served. The requested path itself stays in-dir (so the first containment + check passes); the load-bearing fix is the SECOND is_within_directory check + on the file actually opened. + """ + model_dir = tmp_path / "models" + model_dir.mkdir() + secret_dir = tmp_path / "secret" + secret_dir.mkdir() + # A real image OUTSIDE the model dir — valid, so without the fix Image.open + # would succeed and its bytes would be served (200). + secret = secret_dir / "secret.png" + Image.new('RGB', (8, 8), color=(0, 0, 0)).save(secret, format='PNG') + # Companion preview, in-dir by name but a symlink escaping the model dir. + # (No real model file is needed — get_model_previews globs companions by + # basename, and omitting a .safetensors avoids the metadata-header read.) + companion = model_dir / "model.preview.png" + try: + companion.symlink_to(secret) + except (OSError, NotImplementedError): + pytest.skip("symlinks not supported on this platform/filesystem") + + with patch('folder_paths.folder_names_and_paths', { + 'test_folder': ([str(model_dir)], None) + }): + client = await aiohttp_client(app) + response = await client.get('/experiment/models/preview/test_folder/0/model.safetensors') + + assert response.status == 403, ( + f"expected 403 — the globbed companion preview is a symlink resolving " + f"outside the model dir and must not be served; got {response.status}" + ) + assert await response.read() == b"" + + +async def test_null_byte_in_filename_no_500(aiohttp_client, app, tmp_path): + """A NUL byte in the filename must yield a clean client rejection, not a 500 + from an uncaught ValueError in is_within_directory's realpath() call.""" + raw_path = '/experiment/models/preview/test_folder/0/' + 'a%00b' + url = yarl.URL(raw_path, encoded=True) + + with patch('folder_paths.folder_names_and_paths', { + 'test_folder': ([str(tmp_path)], None) + }): + client = await aiohttp_client(app) + response = await client.get(url) + + assert response.status != 500, ( + f"NUL byte produced a 500 (uncaught ValueError); expected a clean " + f"4xx rejection, got {response.status}" + ) + assert 400 <= response.status < 500 diff --git a/tests-unit/security_test/test_ghsa_779p_03_annotated_traversal.py b/tests-unit/security_test/test_ghsa_779p_03_annotated_traversal.py new file mode 100644 index 000000000..88102760c --- /dev/null +++ b/tests-unit/security_test/test_ghsa_779p_03_annotated_traversal.py @@ -0,0 +1,165 @@ +"""Security tests for GHSA-779p-m5rp-r4h4 — FIX #3. + +Path traversal in folder_paths.get_annotated_filepath / exists_annotated_filepath, +plus the shared is_within_directory() containment helper. + +These are pure-function tests (no running server). The input/output/temp +directories are pointed at tmp_path via the folder_paths setters, so a crafted +name containing `../`, an absolute path, or a symlink that escapes the base +directory must be rejected. + +Reference: https://github.com/Comfy-Org/ComfyUI/security/advisories/GHSA-779p-m5rp-r4h4 +""" +import os + +import pytest + +import folder_paths +from comfy.options import enable_args_parsing +enable_args_parsing() + + +@pytest.fixture +def sandbox(tmp_path): + """Point folder_paths' input/output/temp dirs at a real temp sandbox. + + Yields the realpath'd base, input, output and temp directories. The original + directory values are restored afterward so tests stay isolated. + """ + base = os.path.realpath(str(tmp_path)) + input_dir = os.path.join(base, "input") + output_dir = os.path.join(base, "output") + temp_dir = os.path.join(base, "temp") + for d in (input_dir, output_dir, temp_dir): + os.makedirs(d, exist_ok=True) + + orig_input = folder_paths.get_input_directory() + orig_output = folder_paths.get_output_directory() + orig_temp = folder_paths.get_temp_directory() + + folder_paths.set_input_directory(input_dir) + folder_paths.set_output_directory(output_dir) + folder_paths.set_temp_directory(temp_dir) + + yield { + "base": base, + "input": input_dir, + "output": output_dir, + "temp": temp_dir, + } + + folder_paths.set_input_directory(orig_input) + folder_paths.set_output_directory(orig_output) + folder_paths.set_temp_directory(orig_temp) + + +# --------------------------------------------------------------------------- +# is_within_directory() — the shared containment helper +# --------------------------------------------------------------------------- + +def test_is_within_directory_legit_child(sandbox): + base = sandbox["input"] + child = os.path.join(base, "sub", "image.png") + assert folder_paths.is_within_directory(base, child) is True + + +def test_is_within_directory_dotdot_escape(sandbox): + base = sandbox["input"] + escape = os.path.join(base, "..", "..", "etc", "passwd") + assert folder_paths.is_within_directory(base, escape) is False + + +def test_is_within_directory_symlink_escape(sandbox): + """A symlink created INSIDE base that points OUTSIDE base must not pass. + + This is the key new hardening: is_within_directory realpath()s both operands, + so a symlink planted in the base directory can't be used to read files + elsewhere. We create a real on-disk symlink and a real secret target to + verify the check actually resolves the link. + """ + base = sandbox["input"] + + # A directory living outside the base, holding a secret file. + outside = os.path.join(sandbox["base"], "outside_secret_dir") + os.makedirs(outside, exist_ok=True) + secret = os.path.join(outside, "secret.txt") + with open(secret, "w") as f: + f.write("top secret") + + # Plant a symlink inside base that points at the outside directory. + # symlink creation can require elevated privileges / Developer Mode on + # Windows, so skip cleanly where it isn't available (same guard as the + # sibling test in test_ghsa_779p_02_preview_traversal.py). + link = os.path.join(base, "escape_link") + try: + os.symlink(outside, link) + except (OSError, NotImplementedError): + pytest.skip("symlinks not supported on this platform/filesystem") + + # Accessing the secret "through" the in-base symlink must be rejected. + target_via_link = os.path.join(link, "secret.txt") + assert folder_paths.is_within_directory(base, target_via_link) is False + + +# --------------------------------------------------------------------------- +# get_annotated_filepath() +# --------------------------------------------------------------------------- + +def test_get_annotated_filepath_legit_name(sandbox): + result = folder_paths.get_annotated_filepath("image.png") + assert result == os.path.join(sandbox["input"], "image.png") + assert folder_paths.is_within_directory(sandbox["input"], result) + + +def test_get_annotated_filepath_input_annotation(sandbox): + result = folder_paths.get_annotated_filepath("image.png [input]") + assert result == os.path.join(sandbox["input"], "image.png") + + +def test_get_annotated_filepath_output_annotation(sandbox): + result = folder_paths.get_annotated_filepath("image.png [output]") + assert result == os.path.join(sandbox["output"], "image.png") + + +def test_get_annotated_filepath_temp_annotation(sandbox): + result = folder_paths.get_annotated_filepath("image.png [temp]") + assert result == os.path.join(sandbox["temp"], "image.png") + + +def test_get_annotated_filepath_dotdot_raises(sandbox): + with pytest.raises(ValueError): + folder_paths.get_annotated_filepath("../etc/passwd") + + +def test_get_annotated_filepath_dotdot_with_annotation_raises(sandbox): + with pytest.raises(ValueError): + folder_paths.get_annotated_filepath("../../etc/passwd [output]") + + +def test_get_annotated_filepath_absolute_escape_raises(sandbox): + with pytest.raises(ValueError): + folder_paths.get_annotated_filepath("/etc/passwd") + + +# --------------------------------------------------------------------------- +# exists_annotated_filepath() +# --------------------------------------------------------------------------- + +def test_exists_annotated_filepath_existing_legit_file(sandbox): + real = os.path.join(sandbox["input"], "real.png") + with open(real, "w") as f: + f.write("data") + assert folder_paths.exists_annotated_filepath("real.png") is True + + +def test_exists_annotated_filepath_traversal_returns_false(sandbox): + """A traversal name must return False without raising and without probing + outside the base directory (must never reach os.path.exists for the escape). + """ + # /etc/passwd exists on POSIX; the function must still report False because + # the resolved path escapes the input directory. + assert folder_paths.exists_annotated_filepath("../../../../../../etc/passwd") is False + + +def test_exists_annotated_filepath_absolute_returns_false(sandbox): + assert folder_paths.exists_annotated_filepath("/etc/passwd") is False diff --git a/tests-unit/security_test/test_ghsa_779p_04_userdata_xss.py b/tests-unit/security_test/test_ghsa_779p_04_userdata_xss.py new file mode 100644 index 000000000..aa1250327 --- /dev/null +++ b/tests-unit/security_test/test_ghsa_779p_04_userdata_xss.py @@ -0,0 +1,147 @@ +""" +CI unit tests for FIX #4 of GHSA-779p-m5rp-r4h4. + +Stored-XSS hardening on GET /userdata/{file} in app/user_manager.py. + +User data files are arbitrary user-supplied content and must never render +inline in the app origin. The getuserdata handler: + - forces Content-Type to application/octet-stream for any type in + folder_paths.DANGEROUS_CONTENT_TYPES (text/html, image/svg+xml, + text/javascript, ...), + - sets X-Content-Type-Options: nosniff, + - sets Content-Disposition: attachment. + +These tests pre-create files in tmp_path and GET them back, asserting the +secure response headers. They mirror the aiohttp_client pattern in +tests-unit/prompt_server_test/user_manager_test.py. +""" + +import pytest +import os +from aiohttp import web +from app.user_manager import UserManager + +pytestmark = ( + pytest.mark.asyncio +) # This applies the asyncio mark to all test functions in the module + + +@pytest.fixture +def user_manager(tmp_path): + um = UserManager() + um.get_request_user_filepath = lambda req, file, **kwargs: os.path.join( + tmp_path, file + ) if file else tmp_path + return um + + +@pytest.fixture +def app(user_manager): + app = web.Application() + routes = web.RouteTableDef() + user_manager.add_routes(routes) + app.add_routes(routes) + return app + + +async def test_html_served_as_octet_stream(aiohttp_client, app, tmp_path): + (tmp_path / "evil.html").write_text( + "" + ) + + client = await aiohttp_client(app) + resp = await client.get("/userdata/evil.html") + + assert resp.status == 200 + ct = resp.headers.get("Content-Type", "") + # The load-bearing assertion: a .html file must NOT be served as text/html. + assert "text/html" not in ct.lower(), ( + f"Content-Type {ct!r} would let a browser render/execute the file (stored XSS)." + ) + assert ct == "application/octet-stream" + assert resp.headers.get("X-Content-Type-Options") == "nosniff" + assert "attachment" in resp.headers.get("Content-Disposition", "") + + +async def test_svg_served_as_octet_stream(aiohttp_client, app, tmp_path): + (tmp_path / "evil.svg").write_text( + '' + '' + '' + "" + ) + + client = await aiohttp_client(app) + resp = await client.get("/userdata/evil.svg") + + assert resp.status == 200 + ct = resp.headers.get("Content-Type", "") + # SVG can carry inline