Distributed AI video creation pipeline — connect ComfyUI (image/video generation), Kokoro TTS (voiceover), and audio tools into an automated workflow.
Creative Studio Pipeline orchestrates video production across multiple machines:
- GPU Worker — runs ComfyUI with FLUX (image generation) and Wan2.2 (image-to-video animation)
- Audio Worker — runs ACE-Step for music generation, Kokoro TTS for voiceover
- Orchestrator — coordinates jobs, transfers assets, assembles final video with ffmpeg
Works with 1 machine or 3. All communication is SSH-based — no cloud dependencies, no paid APIs.
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Orchestrator │────▶│ GPU Worker │ │ Audio Worker │
│ (your dev │ SSH │ (ComfyUI) │ SSH │ (TTS/Music) │
│ machine) │◀────│ FLUX + Wan2.2│ │ Kokoro/ACE │
└─────────────┘ └──────────────┘ └──────────────┘
│ │
└──────────── ffmpeg assembly ───────────┘
│
┌────┴────┐
│ Output │
│ video │
└─────────┘
The pipeline:
- Plan — an LLM generates a creative brief with scene descriptions, music prompts, narration
- Generate images — FLUX on the GPU worker creates stills for each scene
- Animate — Wan2.2 I2V turns stills into short video clips with motion
- Voiceover — Kokoro TTS generates narration audio for each scene
- Music — ACE-Step or ffmpeg procedural audio creates the soundtrack
- Assemble — ffmpeg concatenates clips, adds transitions, mixes audio, applies color grade
- Python 3.10+
- ffmpeg (with libx264, aac support)
- ComfyUI with FLUX GGUF models + Wan2.2 I2V (on the GPU worker)
- Kokoro TTS server (or any OpenAI-compatible TTS API)
- SSH access to remote machines (optional — works fully local too)
# 1. Clone
git clone https://github.com/optimizedwf/creative-studio-pipeline.git
cd creative-studio-pipeline
# 2. Set up environment
cp .env.example .env
# Edit .env with your machine addresses and paths
# 3. Run a full pipeline
pip install -r requirements.txt # if you have one
python scripts/creative_studio_local.py music --artifacts ./runs/my-video
python scripts/creative_studio_local.py tts --artifacts ./runs/my-video
python scripts/creative_studio_local.py images --artifacts ./runs/my-video
python scripts/creative_studio_local.py animate --artifacts ./runs/my-video
python scripts/creative_studio_local.py enhance --artifacts ./runs/my-video
python scripts/creative_studio_local.py assemble --artifacts ./runs/my-videoOr use the Archon workflow:
# Requires Archon (separate tool)
archon run creative-studio-local --args "your creative brief here"| Command | Description |
|---|---|
save-plan |
Save an LLM-generated plan to artifacts |
preflight |
Check all dependencies and remote connectivity |
music |
Generate music (ACE-Step remote or ffmpeg procedural) |
tts |
Generate voiceover audio (Kokoro or Edge-TTS) |
images |
Generate FLUX stills on the GPU worker |
animate |
Run Wan2.2 I2V animation on the GPU worker |
enhance |
Post-process clips (upscale + frame interpolation) |
assemble |
Concatenate clips, add transitions, mix audio |
qa |
Validate the final video (duration, streams, quality) |
critique |
Per-scene quality analysis (brightness, motion, duration) |
detect-beats |
Detect beat/onset times from music for alignment |
regen-scene |
Regenerate a single failed scene |
| Variable | Default | Description |
|---|---|---|
PIPBOY_SSH |
user@comfy-host |
SSH target for the GPU worker |
PIPBOY_HOST |
comfy-host.local |
Hostname/IP of GPU worker |
PIPBOY_COMFY_PATH |
/path/to/ComfyUI |
ComfyUI directory on GPU worker |
PIPBOY_RUNS_DIR |
/tmp/creative-studio-runs |
Working directory on GPU worker |
PIPBOY_WORKSPACE |
/home/user |
Remote workspace root |
PIPBOY_MEDIA_DIR |
/home/user/remotion-render/public/media |
Media directory on remote |
PIPBOY_BRIDGE_PATH |
/home/user/comfyui-bridge/bridge.py |
Remote bridge script path |
PIPBOY_WIN_USER |
User |
Windows username for WSL file transfer |
PIPBOY_WSL_DISTRO |
Ubuntu |
WSL distribution name |
PIPBOY_KEY |
$HOME/.ssh/id_ed25519 |
SSH key path |
DELL_SSH |
user@audio-worker |
SSH target for the audio worker |
DELL_PORT |
22 |
SSH port for audio worker |
ACE_STEP_ROOT |
/opt/ACE-Step-1.5 |
ACE-Step installation directory |
KOKORO_URL |
http://localhost:8765 |
Kokoro TTS server URL |
KOKORO_VOICE |
af_heart |
Default TTS voice |
CREATIVE_PUBLIC_ROOT |
./output |
Output directory for final videos |
CREATIVE_QUALITY |
draft |
Quality preset (draft / final) |
CREATIVE_UPSCALE |
none |
Upscale mode (none / hd / 2x) |
CREATIVE_INTERPOLATE |
none |
Frame interpolation (none / film / 2x) |
CREATIVE_AUTO_REGEN |
false |
Auto-regenerate failed scenes |
CREATIVE_SNAP_TO_BEATS |
true |
Snap transitions to detected beats |
See .env.example for all available environment variables.
The config.yaml file can be used for workflow-level configuration.
- No embedded secrets. All credentials, API keys, and machine addresses come from environment variables or
.envfiles. - SSH-based only. No cloud APIs, no telemetry, no external callbacks.
- Local-first. Binds to localhost by default. Run on a single machine with no network for full isolation.
- File paths are configurable. No hardcoded paths — everything is env-var driven.
MIT — see LICENSE.
This software generates AI media content. You are responsible for:
- Complying with the terms of service of any APIs you connect (ComfyUI, Kokoro, etc.)
- Ensuring your content does not infringe on others' rights
- Using appropriate safety measures when running on production systems
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.