feat: opt-in automatic design-review router by cpcloud · Pull Request #661 · roborev-dev/roborev

cpcloud · 2026-04-19T15:56:24Z

Summary

Adds an opt-in per-commit decision (heuristics + schema-constrained classifier fallback) that routes design reviews automatically; off by default.
New internal/review/autotype package: doublestar path globs, diff-size thresholds, commit-subject regexes, with trigger-before-skip-before-classifier precedence.
New SchemaAgent agent capability returning JSON conforming to an embedded schema via claude-code --json-schema and codex exec --output-schema; non-schema agents rejected at config-resolve time.
New JobStatusSkipped terminal state and JobTypeClassify, plumbed through every existing terminal-state filter (GetStaleBatches, ReconcileBatch, sync, ReenqueueJob, HasMeaningfulBatchResult, GetExpiredBatches).
New skip_reason and source columns on review_jobs plus two partial unique indexes scoped to source='auto_design' (commit-backed and ref-only); SQLite migration + Postgres v12 schema mirror.
Classify-row lifecycle helpers (PromoteClassifyToDesignReview, MarkClassifyAsSkippedDesign, InsertSkippedDesignJob, EnqueueAutoDesignJob) — classify rows convert in place via UPDATE so the dedup index isn't fought.
Daemon integration: worker handler for classify jobs; enqueue-handler dispatches the auto-design path for default-typed review/range/dirty jobs only; CI poller runs the same path on the PR head SHA when design isn't already in the matrix; heuristic-input failures degrade to classifier (not skip).
TUI renders skipped design rows with a dimmed style and truncated reason; PR synthesis includes a short Auto-design-review skipped: <reason> section; /api/status exposes a SkippedJobs aggregate count and, when the feature is enabled anywhere, an auto_design subobject with five per-outcome counters. Subobject omitted from JSON when disabled.
Heuristic-engine benchmark on the spec's target hardware: trigger 113 ns/op, skip 5 µs/op, ambiguous 7.7 µs/op (spec target: < 25 ms).
Spec at docs/superpowers/specs/2026-04-17-auto-design-review-design.md, plan at docs/superpowers/plans/2026-04-17-auto-design-review.md, user docs at docs/auto-design-review.md.

roborev-ci · 2026-04-19T16:01:13Z

roborev: Combined Review (`5b14876`)

Verdict: 2 review findings require changes before merge (1 High, 1 Medium).

High

internal/agent/claude.go:522-526
Reachable from internal/daemon/server_auto_design.go:160-161 and internal/daemon/ci_poller.go:776-783: the new classifier path always launches Claude with --dangerously-skip-permissions and does not apply the read-only --allowedTools restriction used in normal review mode. Because auto-design classification ingests commit messages and diffs from shared repos, a prompt-injected commit could turn this classifier step into arbitrary Bash or edit tool execution on the reviewer’s machine.
Suggested remediation: do not use --dangerously-skip-permissions for classify jobs; restrict tools to the same read-only set used by normal review (Read, Glob, Grep) or disable tool use entirely for schema classification.

Medium

internal/agent/claude.go:577-580
Compared with the sanitized review path at internal/agent/claude.go:288-330, ClassifyWithSchema starts a fresh subprocess without the existing Anthropic env filtering and proxy validation logic. Inherited vars such as ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, and related routing settings can therefore affect classifier traffic and may send repo contents to an unintended endpoint. Combined with the issue above, any prompt-injected tool use would also inherit those secrets.
Suggested remediation: factor the existing Claude env/model setup into a shared helper and reuse it here so classification strips inherited auth/routing vars and only injects validated proxy settings explicitly.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-19T16:22:04Z

roborev: Combined Review (`4cdbfdc`)

Verdict: Changes are not safe to merge yet due to one high-severity security issue and one medium-severity agent-selection issue.

High

Prompt-injection risk can exfiltrate local secrets through classifier output
- Files: internal/agent/claude.go:520-530, internal/agent/codex.go:400-463, internal/daemon/worker_classify.go:55-88, internal/daemon/ci_poller.go:736-754, internal/review/synthesis.go:62-67
- The new classifier feeds untrusted commit subjects, bodies, and diffs into agents that still retain filesystem read capability. A malicious diff or commit message could instruct the model to read sensitive local files and include their contents in the JSON reason field. That reason is then persisted as skip_reason and can be rendered into PR comments, creating a direct exfiltration path.
- This needs to run with no file/tool access, or inside a sandbox limited to the checked-out repo with $HOME and network blocked. Classifier output should also be treated as untrusted and redacted or strictly allowlisted before storage or publication.

Medium

Classifier agent selection can silently downgrade to a different backend
- Files: internal/daemon/worker_classify.go:70-97, internal/agent/agent.go:209-258
- The classify flow validates classify_agent, but then uses agent.GetAvailable(name), which may substitute a different installed agent if the requested one is unavailable. That changes the security and behavior profile for processing untrusted commit metadata without explicit user intent.
- The code should use the exact configured agent after availability checks, fall back only to an explicitly configured classify_backup_agent, and otherwise fail closed.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-19T16:29:14Z

roborev: Combined Review (`29ae171`)

Verdict: Changes are not ready to merge due to 2 High and 2 Medium severity issues.

High

internal/storage/jobs.go (InsertSkippedDesignJob, EnqueueAutoDesignJob): New inserts into review_jobs do not provide agent, but the schema still requires agent TEXT NOT NULL. As written, auto-design skip/classify/follow-up design job creation will fail at runtime.
Fix: Pass agent information into these helpers and include it in the INSERTs; persist any other required execution fields for follow-up review jobs as needed.
internal/daemon/ci_poller.go around processPR and maybeDispatchAutoDesignForCI: Auto-design CI jobs are enqueued only after CreateBatchWithJobs creates the batch, so those jobs are not added to ci_pr_batch_jobs and are not counted in total_jobs. Batch reconciliation and PR synthesis can therefore finish without waiting for or reporting auto-design results.
Fix: Decide/create auto-design jobs before batch creation so they are included up front, or explicitly attach them to the batch and update batch counts when added.

Medium

internal/daemon/ci_poller.go (processPR, maybeDispatchAutoDesignForCI): CI auto-design logic only evaluates pr.HeadRefOid. On multi-commit PRs, this yields at most one auto-design outcome for the entire PR instead of one per commit, so earlier commits never receive their own design review/classify/skipped record.
Fix: Move auto-design routing into the existing per-commit processing so each commit SHA gets its own result.
internal/storage/models_test.go: Newly added tests do not compile. The file is in package storage, but the assertions reference storage.JobStatusSkipped / storage.ReviewJob, and the required assert import is missing.
Fix: Either switch to package storage_test and import storage, or keep package storage, remove the storage. qualifiers, and add the missing testify/assert import.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-19T16:37:58Z

roborev: Combined Review (`41334db`)

Verdict: changes are not ready to merge due to one high-severity security issue and one medium-severity output-sanitization issue.

High

internal/agent/codex.go:401-406, internal/agent/codex.go:430-463
The new Codex classifier path processes untrusted commit messages/diffs from shared repos with --sandbox read-only, but it does not disable tool or file access. Unlike the Claude classifier path, there is no allowlist/denylist to prevent the model from reading local files or invoking other tools before returning the JSON result. A prompt-injected commit could coerce Codex into reading sensitive local data and exfiltrating it through the "reason" field or via tool/network side effects.
Suggested fix: make Codex classify mode no-tools/no-files by default, or reject codex as classify_agent until equivalent hardening exists. Add a regression test asserting classify args disable all tools, mirroring the Claude coverage.

Medium

internal/review/autotype/autotype.go:66-76, internal/review/autotype/autotype.go:96-103, internal/review/autotype/autotype.go:126-133, cmd/roborev/tui/render_queue.go:682-705, internal/review/synthesis.go:62-68
Heuristic skip/trigger reasons are built from raw external repo data (ChangedFiles and commit-subject regex matches) and then rendered directly into the TUI and synthesis prompt without sanitization. Git paths and commit subjects can contain control characters or ANSI escapes, so a malicious shared repo can inject terminal control sequences or spoofed multiline content into terminal output and downstream PR-comment generation.
Suggested fix: sanitize heuristic reasons before storage/rendering the same way classifier reasons are sanitized in internal/daemon/classifier.go: strip control characters, fold newlines/tabs to spaces, cap length, and escape or quote embedded filenames/matches before displaying them.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-19T16:49:26Z

roborev: Combined Review (`0ab9705`)

Verdict: Changes are not ready to merge due to 1 High and 2 Medium-severity issues in the auto-design review flow.

High

Unexecutable auto-design jobs
- Location: internal/storage/jobs.go:1090-1137, internal/storage/jobs.go:1151-1168, internal/daemon/worker.go:481-485
- Auto-generated design-review rows are persisted with agent = "auto-design", but workers execute jobs via agent.GetAvailableWithConfig(job.Agent, cfg). Since "auto-design" is not a registered agent and promotion does not rewrite it to a real executable agent, heuristic-triggered and classifier-promoted design reviews will fail when claimed by a worker.
- Suggested fix: Persist the actual execution agent/model when enqueueing or promoting these jobs, or translate the sentinel value to the configured design/review agent before worker lookup.

Medium

CI poller only evaluates the PR head commit for auto-design routing
- Location: internal/daemon/ci_poller.go:719-760
- The CI path checks only pr.HeadRefOid. On multi-commit PRs, earlier commits are never classified, so qualifying design-review commits can be skipped and intended per-commit skipped outcomes are not recorded.
- Suggested fix: Iterate all PR commits and make the auto-design decision per SHA, attaching or enqueueing one outcome per commit.
Existing auto-design rows are not attached to new CI batches
- Location: internal/daemon/ci_poller.go:727-728
- If an auto-design row already exists for the head commit, the CI poller returns early instead of attaching that existing job to the new batch. The batch can then complete without the design-review result, causing the PR comment to omit it.
- Suggested fix: When HasAutoDesignSlotForCommit is true, resolve the existing auto-design job ID and attach it to the batch rather than returning early.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-19T17:00:22Z

roborev: Combined Review (`0036dca`)

Verdict: No medium, high, or critical findings.

The reviewed changes do not raise any issues at Medium severity or above.

Low-severity notes were reported by one reviewer, but they are omitted here per instructions.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-19T17:07:43Z

roborev: Combined Review (`a1bfd15`)

Verdict: 2 medium-severity issues need to be addressed before merge.

Medium

internal/agent/claude.go:201-233,520-531 The classifier path disables Claude tool access via --tools "", but the rest of the integration uses --allowedTools, and there is no compatibility check proving the installed Claude CLI treats --tools "" as deny-all. Because auto-design classification processes commit messages/diffs from shared repos, a prompt-injected commit could regain Read/Bash access if this flag is ignored or interpreted differently, allowing local secret exfiltration via the classifier JSON reason. Use a verified no-tools mechanism, add a runtime/startup capability check, and fail closed if tool disablement cannot be proven.
internal/agent/claude.go:621-623, internal/daemon/worker_classify.go:109-110,153-156, internal/review/synthesis.go:62-67, cmd/roborev/tui/render_queue.go:682-705 On classifier failure, raw error/stderr text is copied into skip_reason and later rendered verbatim in PR synthesis and the TUI. Since this path can contain externally influenced text, it creates an injection surface for ANSI escape sequences in terminals and markdown/@mention spam in posted PR comments. Sanitize and length-cap failure-derived skip_reason values the same way as successful classifier reasons, and escape or quote them before rendering.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-19T17:16:39Z

roborev: Combined Review (`ce63446`)

Verdict: No medium-or-higher issues found in the reviewed diff.

The reviewed outputs did not identify any Medium, High, or Critical findings.

Security review also reported no issues, including no noted vulnerability advisory for github.com/bmatcuk/doublestar/v4 on pkg.go.dev at review time.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-20T12:11:40Z

roborev: Combined Review (`bab555b`)

No medium-or-higher findings were reported.

All reviewed outputs were clean at Medium severity or above.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-20T12:19:49Z

roborev: Combined Review (`8f3180c`)

Verdict: No medium-or-higher findings; the reviewed changes look clean based on the provided agent outputs.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-20T12:25:42Z

roborev: Combined Review (`d8b1873`)

Verdict: No medium-or-higher issues found in the reviewed changes.

The submitted review outputs did not identify any Medium, High, or Critical findings.

Security review specifically found no concrete vulnerabilities and noted the new classifier path appears intentionally hardened.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-20T12:49:40Z

roborev: Combined Review (`142f80b`)

Verdict: changes are not ready to merge due to two build-breaking regressions and two medium-severity correctness issues.

High

Build break from missing symbols
- Location: internal/agent/schema_agent.go:39-55, internal/daemon/server.go:1198-1206, internal/daemon/server_auto_design.go:193-227, internal/daemon/worker_classify.go:137-149
- The auto-design path references identifiers that do not exist in the diff or current tree: registryMu, config.IsDefaultReviewType, storage.JobTypeReview, storage.JobTypeDirty, agent.ResolveWorkflowConfig, agent.GetAvailableWithConfig, and ModelForSelectedAgent. This leaves the branch uncompilable.
Build break from GetJobCounts() signature change
- Location: internal/storage/jobs.go:929-958, internal/daemon/server.go:1114-1128, internal/daemon/server_test.go:1763-1788, internal/storage/db_test.go:587-596
- GetJobCounts() now returns an extra skipped value, but existing callers still destructure the old return list. The status handler and affected tests will not compile until all call sites are updated.

Medium

CI batch accounting can leave attached terminal jobs permanently undercounted
- Location: internal/storage/ci.go:190-212, internal/daemon/ci_poller.go:785-868
- AttachJobAndBumpTotal() only increments total_jobs. If CI attaches an already-terminal auto-design job, especially a skipped one, completed_jobs is not incremented, so the batch can remain short until stale-batch reconciliation runs. That can delay or stall PR synthesis/status.
Backup classifier is not capability-validated
- Location: internal/config/config.go:487-545, internal/daemon/worker_classify.go:67-107, docs/auto-design-review.md:18-31
- Only classify_agent is validated for SchemaAgent support; classify_backup_agent is not. The documented backup configuration can therefore fall back to an agent that cannot run this workflow, causing skips instead of a real failover.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-20T13:01:44Z

roborev: Combined Review (`838a54b`)

Verdict: 1 medium-severity issue found; otherwise no actionable medium-or-higher findings were reported.

Medium

internal/daemon/worker_classify.go:113, internal/review/synthesis.go:62, internal/review/synthesis.go:154
Raw classifier errors are copied into skip_reason and then rendered into PR comments. In shared repos, a crafted ambiguous commit could cause CI to expose operational details from the review host, such as local paths, missing tooling, or backend/proxy stderr.
Recommended fix: keep detailed classifier errors in logs or job.Error only, and map public-facing skip reasons to a fixed set like classifier unavailable, classifier timed out, or classifier failed.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-20T13:16:24Z

roborev: Combined Review (`9413084`)

Verdict: changes are not clean; there is 1 medium-severity issue that should be addressed before merge.

Medium

internal/daemon/worker_classify.go:99-105,144-150, internal/agent/claude.go:650-651

Classifier error redaction is incomplete. While skip_reason is now sanitized, classifier failures can still be logged verbatim via ClassifyWithSchema error strings and %v logging in worker failure paths. In CI, those daemon logs may be exposed in runner output, which can leak local paths or backend/proxy diagnostics from attacker-influenced inputs.

Suggested fix: log only a redacted public category such as classifier timed out, classifier unavailable, or classifier failed by default, and restrict full stderr/details to an explicit debug or verbose path after additional scrubbing.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-21T12:19:45Z

roborev: Combined Review (`b0cc217`)

Summary verdict: One medium-severity issue found; no high or critical findings reported.

Medium

Prompt injection can suppress design reviews
internal/prompt/classify.go:35

The classifier prompt embeds untrusted commit subject, body, and diff content directly into the model instruction stream, then uses the model’s design_review decision. A PR author could add prompt-injection text in commit metadata or diffs to influence the classifier toward {"design_review": false}, suppressing automatic design review for externally supplied changes.

Suggested remediation: treat classifier inputs as hostile data. Wrap untrusted sections with explicit data-only boundaries, neutralize Markdown fence breaks, and prefer fail-open behavior where ambiguous, suspicious, or failed classifier results run the design review instead of skipping it. Consider deterministic triggers for high-risk file patterns so external text cannot suppress review through the model path.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-21T12:31:34Z

roborev: Combined Review (`06af0d6`)

Review verdict: three medium-severity findings need attention; no high or critical issues were reported.

Medium

internal/daemon/worker_classify.go:89
Classify jobs use Diff: derefString(job.DiffContent), but auto-design classify rows are enqueued without diff_content. Commit-backed ambiguous classifications can therefore run with an empty diff and a misleading ~0 line changes stat.

Fix: Fetch git.GetDiff(job.RepoPath, job.GitRef) in processClassifyJob when DiffContent is nil for commit-backed jobs, or persist the diff when enqueueing the classify row.
internal/daemon/ci_poller.go:872
If EnqueueAutoDesignJob returns 0 because another producer won the dedup race after HasAutoDesignSlotForCommit, attachAutoDesignToBatch returns without linking the existing auto-design row to the CI batch. The batch may synthesize without waiting for or including that auto-design outcome.

Fix: When jobID == 0, look up the existing auto-design job ID for (repoID, headSHA) and attach it to the batch.
internal/prompt/classify.go:41
The classifier prompt wraps untrusted diff text in <diff context-only="true">, but the diff content is not escaped. A malicious commit can include a closing tag or fence, then inject instructions outside the intended wrapper. Because this classifier decides whether to suppress a design review, commit-controlled prompt injection can bias the router toward design_review=false.

Fix: Escape or encode diff content before embedding it in XML-like tags. At minimum, escape <, >, and & in Diff and DiffStat, and add regression tests with injected </diff> and code-fence breakout content.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-21T12:40:52Z

roborev: Combined Review (`8cfd017`)

Medium severity issues were found; no Critical or High findings were reported.

Medium

internal/daemon/ci_poller.go:737 - Auto-design deduplication can miss existing commitless rows. If an earlier CI dispatch inserted an auto_design row with commit_id = NULL after commit metadata lookup failed, a later dispatch that resolves commit_id will not be caught by HasAutoDesignSlotForCommit, and the (repo_id, commit_id, review_type) index can allow a duplicate row for the same git_ref.

Suggested fix: ensure CI always creates or uses a commit row before auto-design inserts, or make the dedup check/index cover git_ref for all auto-design rows, not only commit_id IS NULL.
internal/prompt/classify.go:70 - BuildClassifyPrompt XML-escapes untrusted diff content but still embeds it inside a Markdown code fence. A PR author can include triple backticks in a diff to close the fence and make injected text appear as normal prompt text inside the <diff> block, potentially manipulating the classifier into returning design_review: false and bypassing auto-design review routing.

Suggested fix: avoid Markdown fences for untrusted classifier inputs. Encode commit message, diff stat, and diff as JSON string fields, base64, or a strict length-delimited format. If fences are retained, escape or replace triple-backtick sequences and add regression tests.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-21T12:49:45Z

roborev: Combined Review (`b58c6f9`)

Summary verdict: changes are close, but two Medium issues should be fixed before merge.

Medium

internal/daemon/server_auto_design.go:136 - Dirty jobs routed through auto-design do not populate ChangedFiles when DiffContent is present. That means path heuristics cannot trigger for dirty diffs, so a small dirty migration/schema change can be incorrectly recorded as skipped by the trivial-diff rule.
- Fix: exclude dirty jobs from auto-design until file metadata is available, or parse changed paths from diff headers and pass them into autotype.Input.
internal/storage/sync.go:349 - Skipped jobs are syncable, but sync structs, SQLite selection, Postgres upsert, and pull paths do not carry skip_reason or source. A synced skipped auto-design row can lose its audit reason and no longer be marked as auto_design in Postgres.
- Fix: add skip_reason and source to the sync model/query/upsert paths, plus round-trip tests.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-21T14:19:46Z

roborev: Combined Review (`8b21a9f`)

Auto-design routing has several medium-severity issues that should be addressed before merge.

Medium

internal/daemon/server_auto_design.go:131
Dirty review jobs are sent through auto-design, but ChangedFiles is only populated for JobTypeReview. As a result, dirty diffs do not trigger path-based classification rules, and a small dirty migration/schema change can be incorrectly marked skipped by the trivial-diff path.
Fix: Exclude JobTypeDirty from auto-design until dirty changed-file collection exists, or populate dirty changed files before calling autotype.Classify.
internal/daemon/worker_classify.go:62
SetTestClassifierVerdict is compiled into production code and globally overrides classifier outcomes. Because it is exported from a non-test file, in-process package code or future integrations could bypass the configured classifier and force auto-design routing decisions.
Fix: Move the test override behind _test.go files or a test-only build tag, and inject classifier behavior through test fixtures.
docs/auto-design-review.md:23
The docs list codex as a supported classifier and backup classifier, but CodexAgent does not implement SchemaAgent and ValidateClassifyAgent("codex") rejects it. Users following the documented config will hit validation errors or configure a nonfunctional backup path.
Fix: Update the docs to state that only supported schema-capable classifiers should be used, currently excluding Codex unless a safe no-tools schema classification mode is implemented.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-21T14:30:48Z

roborev: Combined Review (`11210d1`)

Review verdict: two Medium correctness issues remain; no High or Critical findings were reported.

Medium

internal/daemon/ci_poller.go:735 - CI auto-design dispatch reloads global config from disk instead of using the repo/poller-resolved cfg. This can incorrectly skip auto-design in CI when auto_design_review.enabled or heuristics are supplied by the CI-loaded repo config rather than local/global config.
- Fix: pass the resolved cfg from processPR into maybeDispatchAutoDesignForCI and use it for ResolveAutoDesign* / classify settings instead of calling config.LoadGlobal().
internal/daemon/server.go:1216 - The enqueue gate comments say dirty reviews are excluded, but the condition only checks job_type == review and default review type. Dirty review jobs can still reach maybeDispatchAutoDesign, and missing ChangedFiles can cause path-trigger changes to fall through to an incorrect trivial skip/classify decision.
- Fix: explicitly exclude dirty jobs at the gate, for example by checking job.GitRef != "dirty" and any existing dirty/range indicators before calling maybeDispatchAutoDesign.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-21T15:06:49Z

roborev: Combined Review (`1f98ad1`)

Review found 2 Medium issues that can leave CI batches incomplete until stale reconciliation.

Medium

internal/daemon/worker_classify.go:185 - CI-attached classify jobs marked as skipped do not appear to broadcast a terminal event or increment the linked CI batch’s completed counter. Because AttachJobAndBumpTotal only increments completed counts for jobs already terminal at attach time, queued classify rows that later resolve to skip can leave the batch waiting instead of synthesizing promptly.
Suggested fix: After MarkClassifyAsSkippedDesign succeeds, update the linked CI batch as completed and trigger the same synthesis path used for review.completed, or emit a terminal event that CI polling handles for skipped jobs.
internal/daemon/ci_poller.go:747 - Existing auto-design jobs can be attached to multiple CI batches, but completion handling still uses GetCIBatchByJobID, which returns only one batch. When that shared job completes, only one attached batch is advanced; other linked batches can remain incomplete until stale reconciliation.
Suggested fix: Fetch and update all unsynthesized batches linked to the job, or prevent one non-terminal job from being linked to multiple batches.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-04-22T00:42:17Z

roborev: Combined Review (`9e7bb93`)

Auto-design dispatch has one actionable issue: dirty/range review refs are not fully excluded before dispatch.

High

internal/daemon/server.go:1217 / around handleEnqueue line 1272
- Problem: The auto-design dispatch condition only checks job.JobType == storage.JobTypeReview and config.IsDefaultReviewType(req.ReviewType), even though the intended behavior excludes dirty refs and range refs. Dirty jobs still use JobTypeReview, so they can enter maybeDispatchAutoDesign with empty or inappropriate changed-file data and create bogus skipped/classify rows.
- Fix: Add explicit guards before dispatch, such as job.GitRef != "dirty" and rejecting range syntax like .., or centralize this in a helper covered by dirty/range tests.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Adds an opt-in per-commit decision (heuristics + schema-constrained classifier fallback) that routes design reviews automatically. Disabled by default; enable via `[auto_design_review] enabled = true` in ~/.roborev/config.toml or .roborev.toml. Heuristics layer (internal/review/autotype): - Path globs via doublestar/v4: trigger_paths (e.g. **/migrations/**, **/*.sql); skip_paths (e.g. **/*.md, **/testdata/**). - Diff size: large_diff_lines triggers; min_diff_lines below skips. - Commit subject regex: trigger_message_patterns (refactor|redesign|...); skip_message_patterns (^docs:|^test:|...). - Trigger rules run before skip rules; both before classifier fallback. - Reasons sanitized before storage: control characters stripped, length capped (200 runes) so untrusted filenames/messages cannot inject terminal escapes or markdown into PR comments and the TUI. Schema-constrained classifier: - New SchemaAgent capability. Only claude-code implements it today via `--json-schema`. Claude classify runs with `--tools ""` (no file or shell access), validated at runtime via claudeSupportsToolsFlag so an older CLI that ignores the flag fails closed. Env vars are stripped and proxy auth handled via a shared buildClaudeEnv so inherited ANTHROPIC_API_KEY / BASE_URL / AUTH_TOKEN cannot leak into classifier traffic. - Codex is deliberately NOT a SchemaAgent: `codex exec --sandbox read-only` blocks writes but not reads, and codex has no equivalent to `--tools ""`. ValidateClassifyAgent rejects codex as classify_agent. - Classifier prompt wraps commit subject/body/diff-stat/diff in context-only XML tags, XML-escapes all wrapped content (including diff body), drops the markdown fence so a crafted diff can't close it, and warns the model that tag contents are data not instructions. Storage: - New JobStatusSkipped terminal state and JobTypeClassify, threaded through every hardcoded terminal-state filter (GetStaleBatches, ReconcileBatch, sync, ReenqueueJob, HasMeaningfulBatchResult, GetExpiredBatches, GetNonTerminalBatchJobIDs). Terminal-status classification (done/skipped/applied/rebased as completed; failed/canceled as failed) is aligned across AttachJobAndBumpTotal, ReconcileBatch, and the filter helpers. - New skip_reason and source columns on review_jobs plus two partial unique indexes scoped to source='auto_design' (commit-backed and ref-only). SQLite migration + Postgres v12 schema + v11->v12 migration step. Dedup indexes created in the currentVersion==0 first-time block so fresh Postgres bootstraps get them without running the migration, and not in the embedded v12.sql so a v1->v12 migration does not reference the source column before it's added. Partial unique index widened to (repo_id, git_ref, review_type) so the commit-backed + commitless cross-case race is enforced at the storage layer; migration falls back to the narrow form if pre-existing duplicates would block. - Lifecycle helpers (PromoteClassifyToDesignReview, MarkClassifyAsSkippedDesign, InsertSkippedDesignJob, EnqueueAutoDesignJob, HasAutoDesignSlotForCommit, ListJobsByStatus, AttachJobAndBumpTotal, ListCIBatchesByJobID). Classify rows convert in place via UPDATE so the dedup index isn't fought. AutoDesignAgentSentinel fills the NOT NULL agent column; Promote rewrites agent/model to the real design-workflow agent at promotion time. AttachJobAndBumpTotal is idempotent per (batch_id, job_id) via a unique index plus INSERT OR IGNORE; the migration dedupes pre-existing duplicate link rows and recalculates inflated total/completed/failed counters. - ReenqueueJob accepts 'skipped' and clears skip_reason so a reenqueued auto-design row doesn't carry the stale reason forward. Daemon integration: - Worker handler for job_type='classify' runs the classifier and converts the row in place. Classify uses agent.Get (not GetAvailable) so an unknown / uninstalled classify_agent fails closed — only the explicitly configured classify_backup_agent is an allowed fallback. Classifier failures mark 'skipped' (not 'failed') so CI batch accounting stays accurate. err.Error() is redacted out of skip_reason (maps to {timed out, unavailable, failed} public categories) while the full err plus any backup-resolve error is persisted to job.error for operators. - Skip paths broadcast review.completed so CI batches and other subscribers advance; otherwise the row would stay short on completed_jobs until stale-batch reconciliation. review.completed/review.failed handlers iterate every linked batch via ListCIBatchesByJobID since auto-design rows are shared across batches that dedup onto them. - Enqueue handler dispatches auto-design for default-typed single-commit review jobs only (dirty excluded: no ChangedFiles list means a small uncommitted migration change would bypass TriggerPaths; re-enable when dirty file-list extraction lands). Opportunistic and never blocks the caller's HTTP response. Explicit review_type=design or security bypasses. Follow-up design reviews are persisted with the resolved design-workflow agent/model. - CI poller runs the same path per-commit across the PR range (git rev-list mergeBase..head). Each auto-design outcome is attached to the CI batch via AttachJobAndBumpTotal so the batch total reflects the new rows and synthesis waits for them. When an auto_design row already exists for the head commit, the poller attaches the existing row to the new batch instead of returning early. Test-hook hygiene: - SetTestClassifierVerdict lives in a _test.go file so the setter is only compiled into the test binary; production code cannot flip the hook to force an auto-design routing decision. User surface: - Skipped rows render in the TUI with a dimmed style and truncated reason. - PR synthesis includes skipped rows as a short "Auto-design-review skipped: <reason>" section. - /api/status surfaces a SkippedJobs aggregate count and, when the feature is effectively enabled, an auto_design subobject with five per-outcome counters (triggered/skipped x heuristic/classifier, plus classifier_failed). Subobject is omitted from JSON when disabled. Counters update from all four producers via a process-global AutoDesignMetrics with atomic ops. Heuristic engine benchmark on AMD Ryzen 9 9950X — well under the 25 ms spec target: HeuristicTrigger 113.5 ns/op HeuristicSkip 5006 ns/op Ambiguous 7668 ns/op Testing: - Unit coverage in internal/review/autotype, internal/agent, internal/daemon, internal/storage: every heuristic branch, schema-agent validation, classify-row lifecycle transitions, dedup invariants, observability counters, reason sanitization, backup-config error composition, skip-path event broadcast, multi-batch list lookup. - End-to-end coverage in internal/daemon/e2e_auto_design_test.go against real git repos: heuristic triggers (path / message / large diff / large file count), heuristic skips (trivial / doc-only / conventional prefix), classifier promotions + skips, dedup, classifier failure, disabled, and HTTP-level bypass for explicit --type design / security. - CI-path e2e in internal/daemon/e2e_ci_auto_design_test.go: attach-to-batch for trigger / skip / existing-row, disabled no-op, listCommitsInRange multi-commit and empty range. User docs at docs/auto-design-review.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

roborev-ci · 2026-04-22T00:54:46Z

roborev: Combined Review (`db52639`)

Review verdict: changes have medium-risk routing issues that can create invalid auto-design review jobs.

Medium

internal/daemon/server.go:1215
The auto-design dispatch gate says range and dirty jobs are excluded, but the condition only checks for review jobs with the default review type. Normal review jobs using refs like base..head or dirty can still dispatch auto-design, which can produce bogus classify/skipped rows.
Fix: Add explicit guards before maybeDispatchAutoDesign, such as skipping refs containing .., ..., or dirty, and skipping jobs with inline dirty diff content unless file-list extraction is implemented.
internal/storage/jobs.go:581
ReenqueueJob now permits skipped jobs, but auto-design skipped rows are inserted with agent "auto-design". Re-enqueuing one clears skip_reason and queues a review job that the worker cannot execute because "auto-design" is only a sentinel, not a real agent.
Fix: Reject re-enqueue for skipped auto-design rows, or resolve and persist the real design agent/model before re-enqueuing them.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

codecov-commenter · 2026-04-22T00:58:20Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

The squash-rebase combined this branch's bmatcuk/doublestar/v4 add with main's batch dep bump (roborev-dev#666), producing a go.sum neither existing vendorHash matched. Hash copied from the nix CI job output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

roborev-ci · 2026-04-22T01:08:52Z

roborev: Combined Review (`cd1c0d4`)

Summary verdict: one Medium issue remains; no High or Critical findings were reported.

Medium

internal/daemon/server_auto_design.go:167 - Invalid auto-design glob/regex config is treated as a heuristic runtime error and converted into a skipped design row. A typo in trigger_paths, skip_paths, or message patterns can silently suppress all automatic design reviews instead of failing configuration validation.
- Fix: Validate resolved AutoDesignHeuristics with autotype.Heuristics.Validate() when loading/resolving config, and fail startup or disable auto-design clearly rather than inserting “heuristic error” skipped rows for config errors.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Invalid trigger_paths/skip_paths globs or message-pattern regexes previously flowed into Classify at dispatch time, where the returned error was coerced into a skipped "auto-design: heuristic error" row. A typo in any of these fields silently suppressed every automatic design review instead of surfacing a configuration error. Validate resolved heuristics up front: fail daemon startup when the globally enabled config is invalid, and at per-repo dispatch log the offending pattern and no-op rather than inserting a phantom skipped row.

roborev-ci · 2026-04-25T02:51:53Z

roborev: Combined Review (`217ec60`)

Verdict: Changes need fixes before merge due to build/startup breakages and a stranded-job failure path.

High

internal/daemon/server_auto_design.go (parent.CommitIDValue() calls)
*storage.ReviewJob does not define CommitIDValue(), while the field is CommitID *int64. This will fail to compile. Safely dereference parent.CommitID or add a real accessor method.
internal/storage/db.go (ensureCIPRBatchJobsUniqueIndex)
The deduplication SQL references id, but ci_pr_batch_jobs has no id column. This can fail daemon startup with no such column: id. Use SQLite rowid or another actual unique row identifier.

Medium

internal/daemon/worker_classify.go (applyClassifyVerdict)
If PromoteClassifyToDesignReview or MarkClassifyAsSkippedDesign fails, the error is only logged and the job can remain stuck in running indefinitely. Mark the job failed or otherwise abort/retry the cycle explicitly.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

If PromoteClassifyToDesignReview or MarkClassifyAsSkippedDesign returns an error, the row is still in 'running' status. Logging the error and returning leaves it stranded indefinitely: a linked CI batch waits on the row, the worker quota stays consumed, and stale-batch reconciliation is the only recovery. Add a shared failClassifyOnDBError helper that calls FailJob on the same (jobID, workerID) pair and broadcasts review.failed when the transition succeeds. broadcastClassifyFailed mirrors the existing broadcastClassifyTerminal but emits review.failed so subscribers counting failed_jobs advance instead of waiting. Wire the helper into both applyClassifyVerdict branches (promote and clean-skip) and into completeClassifyAsSkip (the classifier-failure degrade-to-skip path). Tests force the inner UPDATE's WHERE clause to miss by mutating review_jobs.source after claim — Promote/Mark pin source='auto_design' but FailJob doesn't, so the recovery path can succeed even though the primary call returns sql.ErrNoRows. Asserts: review.failed broadcast, job status transitioned to 'failed', error message identifies the failing op for operators.

cpcloud · 2026-04-25T13:29:26Z

Re: roborev-ci review on 217ec60 (comment) — the two High findings are not real bugs. The Medium is, and is fixed in 052a866.

High #1 — `parent.CommitIDValue()` "doesn't exist, will fail to compile"

The method is defined at internal/storage/models.go:110-112:

// CommitIDValue returns the commit ID as a plain int64 (0 if nil).
func (j ReviewJob) CommitIDValue() int64 {
    ...
}

go build ./... at 217ec60 exits 0; the CI test jobs on this commit (ubuntu/macos/windows) all passed.

High #2 — `ensureCIPRBatchJobsUniqueIndex` "references `id`, but `ci_pr_batch_jobs` has no `id` column"

The schema at internal/storage/db.go:107-112:

CREATE TABLE IF NOT EXISTS ci_pr_batch_jobs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  batch_id INTEGER NOT NULL REFERENCES ci_pr_batches(id),
  job_id INTEGER NOT NULL REFERENCES review_jobs(id),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

The id column exists. go test ./internal/storage/... (which calls OpenDB, which calls ensureCIPRBatchJobsUniqueIndex) passes locally and in CI.

Medium — stuck `running` classify row on DB transition error

This one is real. applyClassifyVerdict and completeClassifyAsSkip logged the PromoteClassifyToDesignReview / MarkClassifyAsSkippedDesign error and returned, leaving the row in running. Fixed in 052a866: a shared failClassifyOnDBError helper now calls FailJob and broadcasts review.failed so any linked CI batch advances instead of waiting for stale-batch reconciliation. Tests in worker_classify_test.go force the inner UPDATE's WHERE clause to miss (via source mutation) and assert the row transitions to failed with review.failed broadcast.

roborev-ci · 2026-04-25T13:32:38Z

roborev: Combined Review (`052a866`)

No Medium, High, or Critical findings were reported.

All review agents either found no issues or only reported Low severity findings, which are omitted per the requested threshold.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The PR mistakenly added a docs/ tree to this repo. Project docs live in roborev-dev/roborev-docs and render at https://roborev.io. The auto- design-review page will be authored over there.

roborev-ci · 2026-04-25T13:55:09Z

roborev: Combined Review (`9f897db`)

Summary verdict: No Medium, High, or Critical findings were reported.

All agents agree the reviewed code is clean for reportable issues.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

cpcloud · 2026-04-25T15:08:30Z

This should be ready to go.

wesm · 2026-04-25T16:49:46Z

Ok, firing torpedoes!

cpcloud force-pushed the worktree-splendid-wishing-yao branch from 8f3180c to d8b1873 Compare April 20, 2026 12:21

cpcloud force-pushed the worktree-splendid-wishing-yao branch from d8b1873 to 142f80b Compare April 20, 2026 12:45

cpcloud force-pushed the worktree-splendid-wishing-yao branch from b58c6f9 to 8b21a9f Compare April 21, 2026 14:14

wesm force-pushed the worktree-splendid-wishing-yao branch from 9e7bb93 to db52639 Compare April 22, 2026 00:50

docs: remove docs/auto-design-review.md (lives on roborev.io)

9f897db

The PR mistakenly added a docs/ tree to this repo. Project docs live in roborev-dev/roborev-docs and render at https://roborev.io. The auto- design-review page will be authored over there.

wesm merged commit 4879fa8 into roborev-dev:main Apr 25, 2026
8 checks passed

cpcloud deleted the worktree-splendid-wishing-yao branch April 25, 2026 20:22

cpcloud mentioned this pull request Apr 25, 2026

docs: add Auto Design Review guide roborev-dev/roborev-docs#65

Open

Conversation

cpcloud commented Apr 19, 2026

Summary

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (5b14876)

High

Medium

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (4cdbfdc)

High

Medium

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (29ae171)

High

Medium

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (41334db)

High

Medium

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (0ab9705)

High

Medium

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (0036dca)

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (a1bfd15)

Medium

Uh oh!

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (ce63446)

Uh oh!

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (bab555b)

Uh oh!

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (8f3180c)

Uh oh!

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (d8b1873)

Uh oh!

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (142f80b)

High

Medium

Uh oh!

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (838a54b)

Medium

Uh oh!

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (9413084)

Medium

Uh oh!

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (b0cc217)

Medium

Uh oh!

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (06af0d6)

Medium

Uh oh!

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (8cfd017)

Medium

Uh oh!

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (b58c6f9)

Medium

Uh oh!

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (8b21a9f)

Medium

roborev: Combined Review (`5b14876`)

roborev: Combined Review (`4cdbfdc`)

roborev: Combined Review (`29ae171`)

roborev: Combined Review (`41334db`)

roborev: Combined Review (`0ab9705`)

roborev: Combined Review (`0036dca`)

roborev: Combined Review (`a1bfd15`)

roborev: Combined Review (`ce63446`)

roborev: Combined Review (`bab555b`)

roborev: Combined Review (`8f3180c`)

roborev: Combined Review (`d8b1873`)

roborev: Combined Review (`142f80b`)

roborev: Combined Review (`838a54b`)

roborev: Combined Review (`9413084`)

roborev: Combined Review (`b0cc217`)

roborev: Combined Review (`06af0d6`)

roborev: Combined Review (`8cfd017`)

roborev: Combined Review (`b58c6f9`)

roborev: Combined Review (`8b21a9f`)

roborev: Combined Review (`11210d1`)

roborev: Combined Review (`1f98ad1`)

roborev: Combined Review (`9e7bb93`)

roborev: Combined Review (`db52639`)

roborev: Combined Review (`cd1c0d4`)

roborev: Combined Review (`217ec60`)

High #1 — `parent.CommitIDValue()` "doesn't exist, will fail to compile"

High #2 — `ensureCIPRBatchJobsUniqueIndex` "references `id`, but `ci_pr_batch_jobs` has no `id` column"

Medium — stuck `running` classify row on DB transition error

roborev: Combined Review (`052a866`)

roborev: Combined Review (`9f897db`)