Adds an opt-in per-commit decision (heuristics + schema-constrained
classifier fallback) that routes design reviews automatically.
Disabled by default; enable via `[auto_design_review] enabled = true`
in ~/.roborev/config.toml or .roborev.toml.
Heuristics layer (internal/review/autotype):
- Path globs via doublestar/v4: trigger_paths (e.g. **/migrations/**,
**/*.sql); skip_paths (e.g. **/*.md, **/testdata/**).
- Diff size: large_diff_lines triggers; min_diff_lines below skips.
- Commit subject regex: trigger_message_patterns
(refactor|redesign|...); skip_message_patterns (^docs:|^test:|...).
- Trigger rules run before skip rules; both before classifier fallback.
- Reasons sanitized before storage: control characters stripped,
length capped (200 runes) so untrusted filenames/messages cannot
inject terminal escapes or markdown into PR comments and the TUI.
Schema-constrained classifier:
- New SchemaAgent capability. Only claude-code implements it today
via `--json-schema`. Claude classify runs with `--tools ""` (no
file or shell access), validated at runtime via
claudeSupportsToolsFlag so an older CLI that ignores the flag
fails closed. Env vars are stripped and proxy auth handled via a
shared buildClaudeEnv so inherited ANTHROPIC_API_KEY /
BASE_URL / AUTH_TOKEN cannot leak into classifier traffic.
- Codex is deliberately NOT a SchemaAgent: `codex exec
--sandbox read-only` blocks writes but not reads, and codex has no
equivalent to `--tools ""`. ValidateClassifyAgent rejects codex as
classify_agent.
- Classifier prompt wraps commit subject/body/diff-stat/diff in
context-only XML tags, XML-escapes all wrapped content (including
diff body), drops the markdown fence so a crafted diff can't close
it, and warns the model that tag contents are data not instructions.
Storage:
- New JobStatusSkipped terminal state and JobTypeClassify, threaded
through every hardcoded terminal-state filter (GetStaleBatches,
ReconcileBatch, sync, ReenqueueJob, HasMeaningfulBatchResult,
GetExpiredBatches, GetNonTerminalBatchJobIDs). Terminal-status
classification (done/skipped/applied/rebased as completed;
failed/canceled as failed) is aligned across AttachJobAndBumpTotal,
ReconcileBatch, and the filter helpers.
- New skip_reason and source columns on review_jobs plus two partial
unique indexes scoped to source='auto_design' (commit-backed and
ref-only). SQLite migration + Postgres v12 schema + v11->v12
migration step. Dedup indexes created in the currentVersion==0
first-time block so fresh Postgres bootstraps get them without
running the migration, and not in the embedded v12.sql so a
v1->v12 migration does not reference the source column before
it's added. Partial unique index widened to
(repo_id, git_ref, review_type) so the commit-backed + commitless
cross-case race is enforced at the storage layer; migration falls
back to the narrow form if pre-existing duplicates would block.
- Lifecycle helpers (PromoteClassifyToDesignReview,
MarkClassifyAsSkippedDesign, InsertSkippedDesignJob,
EnqueueAutoDesignJob, HasAutoDesignSlotForCommit, ListJobsByStatus,
AttachJobAndBumpTotal, ListCIBatchesByJobID). Classify rows convert
in place via UPDATE so the dedup index isn't fought.
AutoDesignAgentSentinel fills the NOT NULL agent column; Promote
rewrites agent/model to the real design-workflow agent at
promotion time. AttachJobAndBumpTotal is idempotent per
(batch_id, job_id) via a unique index plus INSERT OR IGNORE; the
migration dedupes pre-existing duplicate link rows and
recalculates inflated total/completed/failed counters.
- ReenqueueJob accepts 'skipped' and clears skip_reason so a
reenqueued auto-design row doesn't carry the stale reason forward.
Daemon integration:
- Worker handler for job_type='classify' runs the classifier and
converts the row in place. Classify uses agent.Get (not
GetAvailable) so an unknown / uninstalled classify_agent fails
closed — only the explicitly configured classify_backup_agent is
an allowed fallback. Classifier failures mark 'skipped' (not
'failed') so CI batch accounting stays accurate. err.Error() is
redacted out of skip_reason (maps to {timed out, unavailable,
failed} public categories) while the full err plus any
backup-resolve error is persisted to job.error for operators.
- Skip paths broadcast review.completed so CI batches and other
subscribers advance; otherwise the row would stay short on
completed_jobs until stale-batch reconciliation.
review.completed/review.failed handlers iterate every linked batch
via ListCIBatchesByJobID since auto-design rows are shared across
batches that dedup onto them.
- Enqueue handler dispatches auto-design for default-typed
single-commit review jobs only (dirty excluded: no ChangedFiles
list means a small uncommitted migration change would bypass
TriggerPaths; re-enable when dirty file-list extraction lands).
Opportunistic and never blocks the caller's HTTP response.
Explicit review_type=design or security bypasses. Follow-up
design reviews are persisted with the resolved design-workflow
agent/model.
- CI poller runs the same path per-commit across the PR range
(git rev-list mergeBase..head). Each auto-design outcome is
attached to the CI batch via AttachJobAndBumpTotal so the batch
total reflects the new rows and synthesis waits for them. When an
auto_design row already exists for the head commit, the poller
attaches the existing row to the new batch instead of returning
early.
Test-hook hygiene:
- SetTestClassifierVerdict lives in a _test.go file so the setter
is only compiled into the test binary; production code cannot
flip the hook to force an auto-design routing decision.
User surface:
- Skipped rows render in the TUI with a dimmed style and truncated
reason.
- PR synthesis includes skipped rows as a short
"Auto-design-review skipped: <reason>" section.
- /api/status surfaces a SkippedJobs aggregate count and, when the
feature is effectively enabled, an auto_design subobject with
five per-outcome counters (triggered/skipped x
heuristic/classifier, plus classifier_failed). Subobject is
omitted from JSON when disabled. Counters update from all four
producers via a process-global AutoDesignMetrics with atomic ops.
Heuristic engine benchmark on AMD Ryzen 9 9950X — well under the
25 ms spec target:
HeuristicTrigger 113.5 ns/op
HeuristicSkip 5006 ns/op
Ambiguous 7668 ns/op
Testing:
- Unit coverage in internal/review/autotype, internal/agent,
internal/daemon, internal/storage: every heuristic branch,
schema-agent validation, classify-row lifecycle transitions,
dedup invariants, observability counters, reason sanitization,
backup-config error composition, skip-path event broadcast,
multi-batch list lookup.
- End-to-end coverage in internal/daemon/e2e_auto_design_test.go
against real git repos: heuristic triggers (path / message /
large diff / large file count), heuristic skips (trivial /
doc-only / conventional prefix), classifier promotions + skips,
dedup, classifier failure, disabled, and HTTP-level bypass for
explicit --type design / security.
- CI-path e2e in internal/daemon/e2e_ci_auto_design_test.go:
attach-to-batch for trigger / skip / existing-row, disabled
no-op, listCommitsInRange multi-commit and empty range.
User docs at docs/auto-design-review.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
internal/review/autotypepackage: doublestar path globs, diff-size thresholds, commit-subject regexes, with trigger-before-skip-before-classifier precedence.SchemaAgentagent capability returning JSON conforming to an embedded schema viaclaude-code --json-schemaandcodex exec --output-schema; non-schema agents rejected at config-resolve time.JobStatusSkippedterminal state andJobTypeClassify, plumbed through every existing terminal-state filter (GetStaleBatches,ReconcileBatch, sync,ReenqueueJob,HasMeaningfulBatchResult,GetExpiredBatches).skip_reasonandsourcecolumns onreview_jobsplus two partial unique indexes scoped tosource='auto_design'(commit-backed and ref-only); SQLite migration + Postgres v12 schema mirror.PromoteClassifyToDesignReview,MarkClassifyAsSkippedDesign,InsertSkippedDesignJob,EnqueueAutoDesignJob) — classify rows convert in place via UPDATE so the dedup index isn't fought.classifyjobs; enqueue-handler dispatches the auto-design path for default-typedreview/range/dirtyjobs only; CI poller runs the same path on the PR head SHA whendesignisn't already in the matrix; heuristic-input failures degrade to classifier (not skip).Auto-design-review skipped: <reason>section;/api/statusexposes aSkippedJobsaggregate count and, when the feature is enabled anywhere, anauto_designsubobject with five per-outcome counters. Subobject omitted from JSON when disabled.docs/superpowers/specs/2026-04-17-auto-design-review-design.md, plan atdocs/superpowers/plans/2026-04-17-auto-design-review.md, user docs atdocs/auto-design-review.md.