feat: gate pilot — LLM at batch decision boundaries by logpie · Pull Request #1 · logpie/otto

logpie · 2026-03-30T04:05:12Z

Summary

Adds a gate pilot that replaces simple replan() after batch failures with richer failure analysis, retry strategies, context routing, and skip recommendations
Stateless design: reads disk artifacts, returns structured JSON, orchestrator validates and applies
Falls back to replan() on failure. Config flag pilot: false to disable. Zero overhead when no failures.
Codex-reviewed (3 rounds, APPROVED). 484 unit tests pass. 53 tasks across 18 e2e runs, 0 regressions.

Status: NOT validated on real failures

The pilot never fired during benchmarking because the coding agent passed all tasks. This is expected — the pilot's value is at i2p scale (8+ tasks, multiple batches, partial failures). Shipping as a safe no-op upgrade.

What's new

File	What
`otto/pilot.py`	Gate pilot module — context assembly, LLM call, decision parsing
`otto/orchestrator.py`	Pilot at batch boundaries, fallback to replan, config flag
`otto/runner.py`	Pilot guidance separated in retry prompts
`tests/test_pilot.py`	22 unit tests
`tests/test_pilot_benchmark.py`	6 scenario tests
`bench/pilot-benchmark.sh`	A/B benchmark runner
`bench/pressure/projects/pilot-test-*`	3 synthetic test projects

Design docs

Spec: docs/superpowers/specs/2026-03-29-gate-pilot.md
Plan: docs/superpowers/plans/2026-03-29-gate-pilot-stage1.md
i2p spec: docs/superpowers/specs/2026-03-26-otto-intent-to-product.md

Test plan

484 unit tests pass (0 new failures)
Codex adversarial review: 3 rounds, APPROVED
18 e2e runs (6 projects × baseline/pilot): zero overhead, zero regressions
4 real-world combined runs (ufo, humanize, camelcase, pre-commit)
5-task greenfield run with merge conflicts
Pending: real-world pilot invocation — needs a run where batch has mixed pass/fail results with remaining tasks. Monitor pilot.log on next failure.

🤖 Generated with Claude Code

Adds a gate pilot that replaces the simple replan() call after batch failures. The pilot reads disk artifacts (verify logs, QA verdicts, task summaries, learnings) and returns structured decisions: failure analysis, retry strategies, routed context for upcoming tasks, skip recommendations, and re-batching. Key design: - Stateless: reconstructs context from files each invocation - No telephone game: pilot makes system-level decisions, coding agents interpret their own errors directly - Structured JSON output, orchestrator validates and applies - Same model as planner (configurable via planner_model) - Falls back to replan() on parse failure - Config flag: pilot: false in otto.yaml to disable - Zero overhead when no failures (pilot only invoked at batch boundary with failures + remaining tasks) Codex-reviewed: 3 rounds, all CRITICAL/IMPORTANT findings fixed, APPROVED. Benchmark: 53 tasks across 18 runs, 0 regressions, 0 pilot overhead. Pilot not yet validated on real failures — shipping as safe no-op upgrade for i2p readiness. Will prove value at scale (5+ tasks, multiple batches). New files: - otto/pilot.py — context assembly, LLM invocation, decision parsing - tests/test_pilot.py — 22 unit tests - tests/test_pilot_benchmark.py — 6 scenario benchmark tests - bench/pilot-benchmark.sh — A/B benchmark runner - bench/pressure/projects/pilot-test-* — 3 synthetic test projects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-30T04:05:19Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6de94498-333a-48d3-b2f0-f8f8313d2328

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-gate-pilot

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Supersedes the gates + gate pilot approach. Simplified to 5 steps: classify → plan → execute → verify → fix-or-replan. Key decisions: - Single-task is a valid plan (no forced decomposition) - Product artifacts at project root (not otto_arch/) - Persistent context.md accumulates across tasks - Vertical slices over horizontal layers - User journeys from user's perspective, not feature list - Fix rounds continue while making progress, replan on planning failures - Codex-reviewed design Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gate pilot — LLM at batch decision boundaries#1

feat: gate pilot — LLM at batch decision boundaries#1
logpie wants to merge 2 commits intomainfrom
worktree-gate-pilot

logpie commented Mar 30, 2026

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

logpie commented Mar 30, 2026

Summary

Status: NOT validated on real failures

What's new

Design docs

Test plan

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 30, 2026 •

edited

Loading