Give every AI coder a strict father.
yanfu is a Claude Code template that automatically triggers an E2E validation agent every time your coding agent tries to say "Done." Not a code reviewer -- a QA engineer that opens the browser, submits forms, hits APIs, and queries your database.
AI coding agents (Claude Code, Codex, Cursor) write code, run unit tests, and declare victory. But "code compiles + tests pass" does not mean "feature works."
A real developer adding a phone number field would:
- Open the page, check it renders correctly
- Fill the form, submit, see if it works
- Check the API received the data
- Query the database to confirm persistence
- Refresh the page to verify data loads back
AI skips all of this 90% of the time. Not because it can't -- it has Playwright MCP, terminal access, database tools. It skips because nobody told it that "Done" means all of the above.
yanfu fixes this by intercepting every "Done" and forcing a real QA pass.
Coder Agent (Claude Code) writes code
|
v tries to stop
+---------+
| Stop Hook| <-- yanfu intercepts here
+----+----+
v
+---------------------------+
| yanfu QA Agent |
| |
| 1. Reads task + git diff |
| 2. Determines what to |
| validate (dynamic, |
| not hardcoded rules) |
| 3. Executes validations: |
| - Playwright -> render |
| - Playwright -> interact|
| - curl -> API verify |
| - SQL -> DB check |
| 4. Collects evidence |
| 5. Verdict: PASS or FAIL |
+---------------------------+
|
PASS -> exit 0 -> truly done
FAIL -> exit 2 -> coder must fix, then retry
The QA agent is not a linter. It's not reading your diff and guessing. It runs your application and checks it works, the same way a human QA engineer would.
| Code Review Tools | Smoke Tests | yanfu | |
|---|---|---|---|
| How it validates | Reads diff | Fixed checks | Dynamic E2E based on task semantics |
| Who decides what to check | Hardcoded rules | Hardcoded rules | AI infers from task + diff |
| Opens a browser | No | Yes (basic) | Yes -- simulates real user flows |
| Queries database | No | No | Yes -- when changes touch data layer |
| Hits APIs | No | No | Yes -- when changes touch API layer |
| Adapts per task | No | No | Yes -- every validation plan is unique |
- Claude Code CLI installed
- Playwright MCP configured (for browser validation)
- Your project's dev server runnable locally
# From your project root:
curl -sSL https://raw.githubusercontent.com/spytensor/yanfu/main/install.sh | bashOr manually:
# 1. Copy the hook configuration
cp yanfu/.claude/settings.json .claude/settings.json
# (merge with existing settings.json if you have one)
# 2. Copy the hook script
cp yanfu/.claude/hooks/yanfu-gate.sh .claude/hooks/yanfu-gate.sh
chmod +x .claude/hooks/yanfu-gate.sh
# 3. Copy the QA agent definition
mkdir -p .claude/agents
cp yanfu/agents/yanfu-qa.md .claude/agents/yanfu-qa.md
# 4. Add yanfu rules to your CLAUDE.md (optional but recommended)
cat yanfu/CLAUDE.md.template >> CLAUDE.mdEdit .claude/hooks/yanfu-gate.sh to set your project-specific values:
# Your dev server URL
DEV_SERVER_URL="http://localhost:3000"
# Database query command (optional)
DB_QUERY_CMD="psql -U postgres -d myapp -c"yanfu's QA agent uses Playwright MCP for browser validation. The claude -p subagent inherits MCP servers from your project or user-level settings. Add Playwright MCP to .claude/settings.json:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@anthropic-ai/playwright-mcp@latest"]
}
},
"hooks": {
"...": "..."
}
}Or add it to ~/.claude/settings.json to make it available across all projects.
When Claude Code's coder agent finishes work and tries to hand control back to you, yanfu's Stop hook intercepts. It reads the JSON payload from stdin (provided by Claude Code) and collects five pieces of context:
- Original user task -- extracted from the session transcript (the first user message)
- Coder agent's completion message -- what the agent claims it did (
last_assistant_messagefrom the Stop hook input) - Git diff -- what actually changed in the code
- Change scope -- which layers were affected (frontend, backend, database, config)
- Project context -- framework type, CLAUDE.md contents, dev server URL
All of this is passed to the QA agent, so it knows both what was asked and what was done. This is critical -- without the task context, the QA agent would only see the diff and have to guess the intent.
Dependency: jq is recommended for reliable JSON parsing. Without it, the hook uses a regex fallback that handles simple cases but may miss multiline messages.
The QA agent is a separate Claude instance with a strict QA persona. It has access to:
- Playwright MCP -- browser control (navigate, click, fill forms, screenshot)
- Terminal -- run commands (curl, database queries, test suites)
- File system -- read configs, check file existence
The agent:
- Analyzes the task + diff to determine which layers were affected (UI, API, DB, all)
- Generates a dynamic validation plan (not hardcoded -- inferred from context)
- Executes each validation step, collecting evidence (screenshots, responses, query results)
- Returns PASS (exit 0) or FAIL with specific feedback (exit 2)
If FAIL, the feedback is injected back into the coder agent's context, forcing it to address the issues before trying to complete again.
yanfu validates across the full stack, but only the layers that are relevant to each change:
+---------------------------------------------+
| Layer 4: Data Persistence |
| Database queries, file system checks, |
| cache verification |
+---------------------------------------------+
| Layer 3: API / Backend |
| HTTP requests, response schema validation, |
| error handling, auth flows |
+---------------------------------------------+
| Layer 2: User Interaction |
| Form submission, button clicks, navigation, |
| error states, loading states |
+---------------------------------------------+
| Layer 1: Visual Rendering |
| Page loads, component renders, no console |
| errors, layout correctness |
+---------------------------------------------+
| Layer 0: Build & Types (always runs) |
| TypeScript, linting, unit tests, build |
+---------------------------------------------+
The QA agent determines which layers to validate based on the git diff:
- Changed
.tsx/.vue/.svelte-> Layers 0-2 minimum - Changed API routes/handlers -> Layers 0, 3
- Changed migrations/schema -> Layers 0, 3-4
- Changed form + API + migration -> All layers (full-stack change)
See the examples/ directory for project-specific configurations:
- Next.js + Prisma -- Full-stack React with database
- Express + PostgreSQL -- REST API with SQL database
- Django + DRF -- Python full-stack
- Astro + Supabase -- Static site with backend-as-a-service
In .claude/agents/yanfu-qa.md, you can adjust the QA agent's strictness:
## Strictness Level: strict (default)
- strict: Validate ALL affected layers. Any failure blocks completion.
- moderate: Validate critical paths only. Warnings don't block.
- smoke: Quick checks only -- page loads, no console errors, types pass.All behavior can be controlled via environment variables:
| Variable | Default | Description |
|---|---|---|
YANFU_SKIP |
0 |
Set to 1 to skip validation entirely |
YANFU_STRICTNESS |
strict |
strict / moderate / smoke |
YANFU_MODEL |
(default) | Model for QA agent, e.g. claude-haiku-4-5-20251001 |
YANFU_DEV_URL |
http://localhost:3000 |
Dev server URL |
YANFU_DB_CMD |
(empty) | Database query command |
YANFU_MAX_BUDGET |
0.50 |
Max budget in USD for QA agent |
YANFU_COMMIT_WINDOW |
30 |
Minutes to look back for recent commits |
Example: cheap QA with Haiku, moderate strictness:
YANFU_MODEL=claude-haiku-4-5-20251001 YANFU_STRICTNESS=moderate claudeyanfu spawns a separate Claude instance for each validation pass. Typical cost:
- Simple frontend change: ~5K tokens (Layer 0-1 only)
- Full-stack feature: ~15-25K tokens (all layers)
- Average across mixed workload: ~10K tokens per validation
The QA agent is designed to be efficient -- it validates only what changed, not the entire application.
yanfu builds on top of the existing ecosystem:
| Tool | Role in yanfu |
|---|---|
| Playwright MCP | Browser control for the QA agent |
| Claude Code Hooks | Stop hook mechanism for interception |
| Claude Code subagents | The QA agent runs as a subagent |
Inspired by:
- claude-review-loop -- Stop hook + cross-model code review (but review only, no E2E)
- super-smoke-test -- Stop hook + Playwright smoke (but shallow, no data flow validation)
- Meta-Harness -- The insight that
Agent = Model + Harnessand harness determines performance - Ralph -- Autonomous implement-verify-commit loop
yanfu fills the gap: automated, dynamic, full-stack E2E validation as a Stop hook.
Why 90% of AI coding sessions fail to complete the full validation loop:
If the AI has an 85% chance of completing each validation step:
- 1 step: 85% success
- 3 steps: 61% success
- 5 steps: 44% success
- 10 steps: 20% success
The compound failure rate explains why AI agents that seem capable on individual tasks consistently fail at end-to-end workflows. yanfu breaks this by making validation mandatory and automated rather than optional and manual.
Source: Verification Debt (ACM)
"Everyone has a plan until they get punched in the mouth." -- Mike Tyson
Every AI coder has a plan until the strict father checks their homework.
The name yanfu (yan fu) means "strict father" in Chinese. In Chinese internet culture, it refers to the parent who never accepts "trust me, it works" -- they check everything themselves.
Your AI coder is the child. yanfu is the father. The child cannot leave the table until the father has verified the homework is actually correct.
This is an early-stage project. Contributions welcome:
- More framework examples (Vue, SvelteKit, Rails, Spring Boot, Go)
- Validation evidence report generation
- Cross-session learning (remember validation patterns per project)
- Integration with CI/CD pipelines
- Support for Codex and other coding agents
MIT
- Meta-Harness: End-to-End Optimization of Model Harnesses -- Stanford IRIS Lab, 2026
- Verification Debt: When Generative AI Speeds Change Faster Than Proof -- ACM, 2026
- The 80% Problem in Agentic Coding -- Addy Osmani
- Auto-Reviewing Claude's Code -- O'Reilly
- Spotify: Feedback Loops for Background Coding Agents
- Playwright MCP -- Microsoft
- claude-review-loop -- Hamel Husain
- super-smoke-test