| name | _scrollclaw-system | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| description | ScrollClaw system — core doctrine, format selection, pipeline routing, and anti-patterns. Loaded into every UGC conversation. | ||||||||
| metadata |
|
AI video defaults to cinematic. Sweeping drone shots. Perfect lighting. Orchestral energy. Nobody scrolls past that thinking "real person" — they think "ad" and keep moving. UGC works because it looks like someone pulled out their phone and talked.
Messaging is the actual skill. The tools improve weekly — visuals, voice, motion are getting solved. AI cannot solve having something worth saying. Most AI UGC fails not because it looks like AI but because the script sounds like a copywriter, not a customer. Persona research before production. Real language before prompts.
Anti-polish is the product. Every decision optimizes for "could a real person have shot this on their phone." Impressive is the enemy.
Creators are persistent. AI creators have locked identities — same face, same hair, same build across every clip. First-frame consistency: generate one canonical face with Nano Banana, feed it to every Sora/Kling i2v generation. Read references/creator-system.md.
First frame controls everything. Generate frame 1 with Nano Banana 2 (composition, character, environment, color locked), then animate with Sora 2 i2v. Text-to-video is the fallback.
Save first, watch later. Replicate URLs are ephemeral. Every output gets downloaded immediately.
System not clip. Every run leaves reusable creator profiles, prompt logs, color references. Campaign 10 takes a fraction of campaign 1.
Setting. Cadence. Influencer realism. Story. Product. Audience. Every decision routes through these. Most people skip consumer psychology — that's why their AI looks obvious.
Ask: what are we making? Then route to the right format.
| If the goal is... | Format | What user provides | What AI generates |
|---|---|---|---|
| Product review / honest take | Talking Head | Brand context | Everything (face, voice, video) |
| App/tool demo (scroll-stopper) | Hook Face + Demo | Screen recording (recommended) or screenshots | Hook face + captions. Demo = real footage. |
| Authority / credibility | Podcast Clip | Brand context | Everything (face, set, voice) |
| Quick visual transformation | Visual Transformation | Brand context, before/after concept | All imagery + animation |
| Complex mechanism explanation | Hybrid Transformation | Brand context, deep persona research | Talking head bookends + slideshow |
| Hot take / faceless | Wall of Text | The text content | Static image + text overlay |
Read references/format-library.md for shot-by-shot blueprints. Read references/hook-emotions.md for the emotion taxonomy.
| Step | Skill | What happens |
|---|---|---|
| 0. Brand setup (one-time) | /brand-setup |
Research brand, generate voice/positioning/audience files |
| 1. Persona research | /persona |
Mine reviews, extract real language |
| 2. Brand context | /persona |
Load brand voice, know the product |
| 3. Creator profiles | /persona |
Lock identity in workspace/campaigns/<slug>/creators/ or workspace/creators/ |
| 4. Format + Script | /persona |
Choose format, write script with visual beats |
| 5. First frame | /first-frame |
Nano Banana 2 → canonical face image |
| 6. Animate (A-roll) | /animate |
Sora 2 i2v → talking head clips |
| 7. B-roll | /b-roll |
Kling 3 → environment/product shots |
| 8. Stitch + Audio | /assemble |
Multi-clip stitching, ElevenLabs S2S voice |
| 9. Post-production | /assemble |
Color grade, grain, frame rate, phone test |
| 10. Captions | /assemble |
Native-style caption overlays (LAST step) |
| 11. Score | /score |
Virality scoring — 70+ to publish |
Visual: Cinematic drift (override with handheld energy), model-pretty creators (specify normal-looking), clean room syndrome (demand real clutter), porcelain skin (specify visible pores), smooth steadicam (phone has micro-shake).
Color: Grey-scale Nano Banana default (fix with JSON color prompts). Too-clean color (lift shadows, slight fade). See references/color-reference-system.md.
Audio: Clean studio audio (add room ambience). Library ElevenLabs voices (voice design or instant clone only). Stock music (real UGC rarely has music).
Script: Testimonial cadence ("I've been using this for three weeks..."). Wardrobe drift across same-day clips.
Pipeline: Output loss (download immediately). Hallucinated text (add "no text" to negative prompt). Extra limbs (reroll, don't fix prompt). Resolution mismatch (normalize before stitching). Multi-clip voice drift (use ElevenLabs S2S).
Read references/taste-calibration.md for before/after examples that show what "anti-polish" actually sounds and looks like in practice.
Input: product/brand context (URL, description, or brand voice file). Optional: creator profiles, format preference, color references.
Output: video clips (MP4), first-frame images (PNG), creator profiles, scripts, prompt logs. Default 9:16 vertical.
Env: FAL_KEY (primary), REPLICATE_API_TOKEN (Nano Banana + fallback), ELEVENLABS_API_KEY (multi-clip voice), OPENROUTER_API_KEY (Gemini virality scoring).
ScrollClaw persists work across sessions using a structured workspace. Campaign 10 takes a fraction of campaign 1 because creator profiles, brand context, and learnings accumulate.
Full protocol: Read references/brand-campaign-context.md.
workspace/
├── brand/ ← Read-only for ScrollClaw (written by /brand-setup, GrowthClaw, or manually)
│ ├── voice-profile.md ← Brand voice → informs script tone
│ ├── positioning.md ← Differentiation → informs persona research
│ └── audience.md ← ICP → informs creator archetype selection
├── creators/ ← Global creator profiles (reusable across campaigns)
└── campaigns/<slug>/
├── brief.md ← Campaign brief (from assets/campaign-brief-template.md)
├── persona-research.md ← Written by /persona
├── creators/ ← Campaign-specific creator overrides
├── scripts/ ← Approved scripts
├── frames/ ← First frames + context frames
├── clips/ ← A-roll, B-roll, assembled finals
├── scores/ ← Virality score cards
├── output-log.md ← Prompt log, generation params (append-only)
└── learnings.md ← What worked, what didn't (append-only)
| Skill | Reads | Writes |
|---|---|---|
/brand-setup |
Brand website, social profiles, reviews, competitors (scraped) | brand/voice-profile.md, brand/positioning.md, brand/audience.md |
/persona |
brand/{voice-profile,positioning,audience}.md, campaign brief |
persona-research.md, creators/, scripts/ |
/first-frame |
creators/, scripts/, campaign brief |
frames/, output-log.md |
/animate |
frames/, scripts/, creators/ |
clips/a-roll-*.mp4, output-log.md |
/b-roll |
frames/, clips/a-roll-*, scripts/ |
clips/b-roll-*.mp4, output-log.md |
/assemble |
clips/*, scripts/, creators/ |
clips/final-*.mp4, output-log.md |
/score |
clips/final-*, campaign brief, persona-research.md |
scores/, learnings.md |
- Check if each brand file exists before reading. Never error on missing files.
- Show what was loaded:
✓ Loaded brand voice: conversational-direct/✗ No audience file — proceeding standalone - ScrollClaw reads
workspace/brand/but never writes there
output-log.mdandlearnings.mdare append-only — never overwrite- Creator profiles: global ones go in
workspace/creators/, campaign overrides inworkspace/campaigns/<slug>/creators/ - Skills own their outputs.
/personaownspersona-research.md./scoreownsscores/andlearnings.md.
These mistakes have been made in real sessions. Every one of them wasted time and tokens. Do not repeat them.
The creator system exists to lock visual identity. Every first frame must be generated from the creator's canonical reference image using i2v. Generating 7 random people instead of using Jess's locked reference was the biggest waste in session 2.
Wrong: text-to-image → random person → animate Right: reference image → Nano Banana 2 → first frame → Sora 2 i2v
t2v generates a completely different person every time. Once a creator has a reference image, always use image-to-video. The only acceptable use of t2v is initial exploration before a reference is locked.
Grain degrades caption text. Color grading shifts caption colors. The order is non-negotiable:
raw clips → stitch → post-produce → THEN captions (LAST)
Use Gemini Flash to scrub every clip frame-by-frame after stitching. It catches content drift (wrong person appearing mid-clip), hand artifacts, and scene inconsistencies that are easy to miss on a quick watch. Don't wait for the user to catch problems.
-c:v libx264 -profile:v main -pix_fmt yuv420p -crf 23 -preset medium -movflags +faststart -c:a aac -b:a 128k -ar 44100yuv444p and High 4:4:4 Predictive profile break Telegram Web, many mobile browsers, and some social platforms. Always yuv420p + main profile + faststart. Do this from the first encode, not as a fix after delivery fails.
bash scripts/pre-flight.sh <campaign-slug>Validates workspace structure, creator profiles, reference images, scripts, and caption plans. Catches missing prerequisites before you waste API calls.
Read references/production-pipeline.md for the strict step-by-step sequence. Every step depends on the previous one being done correctly.
Run scripts/check-deps.sh to verify all API keys and dependencies.