Distributed setup now defaults to panicking when out of memory now, to facilitate graceful recovery

This commit is contained in:
doctorpangloss 2025-02-18 15:07:02 -08:00
parent 3ddec8ae90
commit e65faca817
3 changed files with 4 additions and 2 deletions

View File

@ -194,7 +194,7 @@ def _create_parser() -> EnhancedConfigArgParser:
'--panic-when', '--panic-when',
action='append', action='append',
help=""" help="""
List of fully qualified exception class names to panic (os.exit(1)) when a workflow raises it. List of fully qualified exception class names to panic (sys.exit(1)) when a workflow raises it.
Example: --panic-when=torch.cuda.OutOfMemoryError. Can be specified multiple times or as a Example: --panic-when=torch.cuda.OutOfMemoryError. Can be specified multiple times or as a
comma-separated list.""", comma-separated list.""",
type=str, type=str,

View File

@ -122,7 +122,7 @@ class Configuration(dict):
anthropic_api_key (str): Configures the Anthropic API key for its nodes related to Claude functionality. Visit https://console.anthropic.com/settings/keys to create this key. anthropic_api_key (str): Configures the Anthropic API key for its nodes related to Claude functionality. Visit https://console.anthropic.com/settings/keys to create this key.
user_directory (Optional[str]): Set the ComfyUI user directory with an absolute path. user_directory (Optional[str]): Set the ComfyUI user directory with an absolute path.
log_stdout (bool): Send normal process output to stdout instead of stderr (default) log_stdout (bool): Send normal process output to stdout instead of stderr (default)
panic_when (list[str]): List of fully qualified exception class names to panic (os.exit(1)) when a workflow raises it. panic_when (list[str]): List of fully qualified exception class names to panic (sys.exit(1)) when a workflow raises it.
""" """
def __init__(self, **kwargs): def __init__(self, **kwargs):

View File

@ -16,6 +16,8 @@ services:
capabilities: [ gpu ] capabilities: [ gpu ]
environment: environment:
- COMFYUI_DISTRIBUTED_QUEUE_CONNECTION_URI=amqp://guest:guest@rabbitmq:5672 - COMFYUI_DISTRIBUTED_QUEUE_CONNECTION_URI=amqp://guest:guest@rabbitmq:5672
- COMFYUI_EXECUTOR_FACTORY=ProcessPoolExecutor
- COMFYUI_PANIC_WHEN=torch.cuda.OutOfMemoryError
- COMFYUI_LOGGING_LEVEL=ERROR - COMFYUI_LOGGING_LEVEL=ERROR
command: command:
- comfyui-worker - comfyui-worker