Skip to content

feat: opt-in automatic design-review router#661

Merged
wesm merged 5 commits intoroborev-dev:mainfrom
cpcloud:worktree-splendid-wishing-yao
Apr 25, 2026
Merged

feat: opt-in automatic design-review router#661
wesm merged 5 commits intoroborev-dev:mainfrom
cpcloud:worktree-splendid-wishing-yao

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 19, 2026

Summary

  • Adds an opt-in per-commit decision (heuristics + schema-constrained classifier fallback) that routes design reviews automatically; off by default.
  • New internal/review/autotype package: doublestar path globs, diff-size thresholds, commit-subject regexes, with trigger-before-skip-before-classifier precedence.
  • New SchemaAgent agent capability returning JSON conforming to an embedded schema via claude-code --json-schema and codex exec --output-schema; non-schema agents rejected at config-resolve time.
  • New JobStatusSkipped terminal state and JobTypeClassify, plumbed through every existing terminal-state filter (GetStaleBatches, ReconcileBatch, sync, ReenqueueJob, HasMeaningfulBatchResult, GetExpiredBatches).
  • New skip_reason and source columns on review_jobs plus two partial unique indexes scoped to source='auto_design' (commit-backed and ref-only); SQLite migration + Postgres v12 schema mirror.
  • Classify-row lifecycle helpers (PromoteClassifyToDesignReview, MarkClassifyAsSkippedDesign, InsertSkippedDesignJob, EnqueueAutoDesignJob) — classify rows convert in place via UPDATE so the dedup index isn't fought.
  • Daemon integration: worker handler for classify jobs; enqueue-handler dispatches the auto-design path for default-typed review/range/dirty jobs only; CI poller runs the same path on the PR head SHA when design isn't already in the matrix; heuristic-input failures degrade to classifier (not skip).
  • TUI renders skipped design rows with a dimmed style and truncated reason; PR synthesis includes a short Auto-design-review skipped: <reason> section; /api/status exposes a SkippedJobs aggregate count and, when the feature is enabled anywhere, an auto_design subobject with five per-outcome counters. Subobject omitted from JSON when disabled.
  • Heuristic-engine benchmark on the spec's target hardware: trigger 113 ns/op, skip 5 µs/op, ambiguous 7.7 µs/op (spec target: < 25 ms).
  • Spec at docs/superpowers/specs/2026-04-17-auto-design-review-design.md, plan at docs/superpowers/plans/2026-04-17-auto-design-review.md, user docs at docs/auto-design-review.md.

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (5b14876)

Verdict: 2 review findings require changes before merge (1 High, 1 Medium).

High

  • internal/agent/claude.go:522-526
    Reachable from internal/daemon/server_auto_design.go:160-161 and internal/daemon/ci_poller.go:776-783: the new classifier path always launches Claude with --dangerously-skip-permissions and does not apply the read-only --allowedTools restriction used in normal review mode. Because auto-design classification ingests commit messages and diffs from shared repos, a prompt-injected commit could turn this classifier step into arbitrary Bash or edit tool execution on the reviewer’s machine.
    Suggested remediation: do not use --dangerously-skip-permissions for classify jobs; restrict tools to the same read-only set used by normal review (Read, Glob, Grep) or disable tool use entirely for schema classification.

Medium

  • internal/agent/claude.go:577-580
    Compared with the sanitized review path at internal/agent/claude.go:288-330, ClassifyWithSchema starts a fresh subprocess without the existing Anthropic env filtering and proxy validation logic. Inherited vars such as ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, and related routing settings can therefore affect classifier traffic and may send repo contents to an unintended endpoint. Combined with the issue above, any prompt-injected tool use would also inherit those secrets.
    Suggested remediation: factor the existing Claude env/model setup into a shared helper and reuse it here so classification strips inherited auth/routing vars and only injects validated proxy settings explicitly.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (4cdbfdc)

Verdict: Changes are not safe to merge yet due to one high-severity security issue and one medium-severity agent-selection issue.

High

  • Prompt-injection risk can exfiltrate local secrets through classifier output
    • Files: internal/agent/claude.go:520-530, internal/agent/codex.go:400-463, internal/daemon/worker_classify.go:55-88, internal/daemon/ci_poller.go:736-754, internal/review/synthesis.go:62-67
    • The new classifier feeds untrusted commit subjects, bodies, and diffs into agents that still retain filesystem read capability. A malicious diff or commit message could instruct the model to read sensitive local files and include their contents in the JSON reason field. That reason is then persisted as skip_reason and can be rendered into PR comments, creating a direct exfiltration path.
    • This needs to run with no file/tool access, or inside a sandbox limited to the checked-out repo with $HOME and network blocked. Classifier output should also be treated as untrusted and redacted or strictly allowlisted before storage or publication.

Medium

  • Classifier agent selection can silently downgrade to a different backend
    • Files: internal/daemon/worker_classify.go:70-97, internal/agent/agent.go:209-258
    • The classify flow validates classify_agent, but then uses agent.GetAvailable(name), which may substitute a different installed agent if the requested one is unavailable. That changes the security and behavior profile for processing untrusted commit metadata without explicit user intent.
    • The code should use the exact configured agent after availability checks, fall back only to an explicitly configured classify_backup_agent, and otherwise fail closed.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (29ae171)

Verdict: Changes are not ready to merge due to 2 High and 2 Medium severity issues.

High

  • internal/storage/jobs.go (InsertSkippedDesignJob, EnqueueAutoDesignJob): New inserts into review_jobs do not provide agent, but the schema still requires agent TEXT NOT NULL. As written, auto-design skip/classify/follow-up design job creation will fail at runtime.
    Fix: Pass agent information into these helpers and include it in the INSERTs; persist any other required execution fields for follow-up review jobs as needed.

  • internal/daemon/ci_poller.go around processPR and maybeDispatchAutoDesignForCI: Auto-design CI jobs are enqueued only after CreateBatchWithJobs creates the batch, so those jobs are not added to ci_pr_batch_jobs and are not counted in total_jobs. Batch reconciliation and PR synthesis can therefore finish without waiting for or reporting auto-design results.
    Fix: Decide/create auto-design jobs before batch creation so they are included up front, or explicitly attach them to the batch and update batch counts when added.

Medium

  • internal/daemon/ci_poller.go (processPR, maybeDispatchAutoDesignForCI): CI auto-design logic only evaluates pr.HeadRefOid. On multi-commit PRs, this yields at most one auto-design outcome for the entire PR instead of one per commit, so earlier commits never receive their own design review/classify/skipped record.
    Fix: Move auto-design routing into the existing per-commit processing so each commit SHA gets its own result.

  • internal/storage/models_test.go: Newly added tests do not compile. The file is in package storage, but the assertions reference storage.JobStatusSkipped / storage.ReviewJob, and the required assert import is missing.
    Fix: Either switch to package storage_test and import storage, or keep package storage, remove the storage. qualifiers, and add the missing testify/assert import.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (41334db)

Verdict: changes are not ready to merge due to one high-severity security issue and one medium-severity output-sanitization issue.

High

  • internal/agent/codex.go:401-406, internal/agent/codex.go:430-463
    The new Codex classifier path processes untrusted commit messages/diffs from shared repos with --sandbox read-only, but it does not disable tool or file access. Unlike the Claude classifier path, there is no allowlist/denylist to prevent the model from reading local files or invoking other tools before returning the JSON result. A prompt-injected commit could coerce Codex into reading sensitive local data and exfiltrating it through the "reason" field or via tool/network side effects.
    Suggested fix: make Codex classify mode no-tools/no-files by default, or reject codex as classify_agent until equivalent hardening exists. Add a regression test asserting classify args disable all tools, mirroring the Claude coverage.

Medium


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (0ab9705)

Verdict: Changes are not ready to merge due to 1 High and 2 Medium-severity issues in the auto-design review flow.

High

  • Unexecutable auto-design jobs
    • Location: internal/storage/jobs.go:1090-1137, internal/storage/jobs.go:1151-1168, internal/daemon/worker.go:481-485
    • Auto-generated design-review rows are persisted with agent = "auto-design", but workers execute jobs via agent.GetAvailableWithConfig(job.Agent, cfg). Since "auto-design" is not a registered agent and promotion does not rewrite it to a real executable agent, heuristic-triggered and classifier-promoted design reviews will fail when claimed by a worker.
    • Suggested fix: Persist the actual execution agent/model when enqueueing or promoting these jobs, or translate the sentinel value to the configured design/review agent before worker lookup.

Medium

  • CI poller only evaluates the PR head commit for auto-design routing

    • Location: internal/daemon/ci_poller.go:719-760
    • The CI path checks only pr.HeadRefOid. On multi-commit PRs, earlier commits are never classified, so qualifying design-review commits can be skipped and intended per-commit skipped outcomes are not recorded.
    • Suggested fix: Iterate all PR commits and make the auto-design decision per SHA, attaching or enqueueing one outcome per commit.
  • Existing auto-design rows are not attached to new CI batches

    • Location: internal/daemon/ci_poller.go:727-728
    • If an auto-design row already exists for the head commit, the CI poller returns early instead of attaching that existing job to the new batch. The batch can then complete without the design-review result, causing the PR comment to omit it.
    • Suggested fix: When HasAutoDesignSlotForCommit is true, resolve the existing auto-design job ID and attach it to the batch rather than returning early.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (0036dca)

Verdict: No medium, high, or critical findings.

The reviewed changes do not raise any issues at Medium severity or above.

Low-severity notes were reported by one reviewer, but they are omitted here per instructions.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (a1bfd15)

Verdict: 2 medium-severity issues need to be addressed before merge.

Medium

  • internal/agent/claude.go:201-233,520-531 The classifier path disables Claude tool access via --tools "", but the rest of the integration uses --allowedTools, and there is no compatibility check proving the installed Claude CLI treats --tools "" as deny-all. Because auto-design classification processes commit messages/diffs from shared repos, a prompt-injected commit could regain Read/Bash access if this flag is ignored or interpreted differently, allowing local secret exfiltration via the classifier JSON reason. Use a verified no-tools mechanism, add a runtime/startup capability check, and fail closed if tool disablement cannot be proven.

  • internal/agent/claude.go:621-623, internal/daemon/worker_classify.go:109-110,153-156, internal/review/synthesis.go:62-67, cmd/roborev/tui/render_queue.go:682-705 On classifier failure, raw error/stderr text is copied into skip_reason and later rendered verbatim in PR synthesis and the TUI. Since this path can contain externally influenced text, it creates an injection surface for ANSI escape sequences in terminals and markdown/@mention spam in posted PR comments. Sanitize and length-cap failure-derived skip_reason values the same way as successful classifier reasons, and escape or quote them before rendering.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 19, 2026

roborev: Combined Review (ce63446)

Verdict: No medium-or-higher issues found in the reviewed diff.

The reviewed outputs did not identify any Medium, High, or Critical findings.

Security review also reported no issues, including no noted vulnerability advisory for github.com/bmatcuk/doublestar/v4 on pkg.go.dev at review time.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (bab555b)

No medium-or-higher findings were reported.

All reviewed outputs were clean at Medium severity or above.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (8f3180c)

Verdict: No medium-or-higher findings; the reviewed changes look clean based on the provided agent outputs.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud cpcloud force-pushed the worktree-splendid-wishing-yao branch from 8f3180c to d8b1873 Compare April 20, 2026 12:21
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (d8b1873)

Verdict: No medium-or-higher issues found in the reviewed changes.

The submitted review outputs did not identify any Medium, High, or Critical findings.

Security review specifically found no concrete vulnerabilities and noted the new classifier path appears intentionally hardened.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud cpcloud force-pushed the worktree-splendid-wishing-yao branch from d8b1873 to 142f80b Compare April 20, 2026 12:45
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (142f80b)

Verdict: changes are not ready to merge due to two build-breaking regressions and two medium-severity correctness issues.

High

  • Build break from missing symbols

    • Location: internal/agent/schema_agent.go:39-55, internal/daemon/server.go:1198-1206, internal/daemon/server_auto_design.go:193-227, internal/daemon/worker_classify.go:137-149
    • The auto-design path references identifiers that do not exist in the diff or current tree: registryMu, config.IsDefaultReviewType, storage.JobTypeReview, storage.JobTypeDirty, agent.ResolveWorkflowConfig, agent.GetAvailableWithConfig, and ModelForSelectedAgent. This leaves the branch uncompilable.
  • Build break from GetJobCounts() signature change

    • Location: internal/storage/jobs.go:929-958, internal/daemon/server.go:1114-1128, internal/daemon/server_test.go:1763-1788, internal/storage/db_test.go:587-596
    • GetJobCounts() now returns an extra skipped value, but existing callers still destructure the old return list. The status handler and affected tests will not compile until all call sites are updated.

Medium

  • CI batch accounting can leave attached terminal jobs permanently undercounted

    • Location: internal/storage/ci.go:190-212, internal/daemon/ci_poller.go:785-868
    • AttachJobAndBumpTotal() only increments total_jobs. If CI attaches an already-terminal auto-design job, especially a skipped one, completed_jobs is not incremented, so the batch can remain short until stale-batch reconciliation runs. That can delay or stall PR synthesis/status.
  • Backup classifier is not capability-validated

    • Location: internal/config/config.go:487-545, internal/daemon/worker_classify.go:67-107, docs/auto-design-review.md:18-31
    • Only classify_agent is validated for SchemaAgent support; classify_backup_agent is not. The documented backup configuration can therefore fall back to an agent that cannot run this workflow, causing skips instead of a real failover.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (838a54b)

Verdict: 1 medium-severity issue found; otherwise no actionable medium-or-higher findings were reported.

Medium

  • internal/daemon/worker_classify.go:113, internal/review/synthesis.go:62, internal/review/synthesis.go:154
    Raw classifier errors are copied into skip_reason and then rendered into PR comments. In shared repos, a crafted ambiguous commit could cause CI to expose operational details from the review host, such as local paths, missing tooling, or backend/proxy stderr.
    Recommended fix: keep detailed classifier errors in logs or job.Error only, and map public-facing skip reasons to a fixed set like classifier unavailable, classifier timed out, or classifier failed.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 20, 2026

roborev: Combined Review (9413084)

Verdict: changes are not clean; there is 1 medium-severity issue that should be addressed before merge.

Medium

  1. internal/daemon/worker_classify.go:99-105,144-150, internal/agent/claude.go:650-651

    Classifier error redaction is incomplete. While skip_reason is now sanitized, classifier failures can still be logged verbatim via ClassifyWithSchema error strings and %v logging in worker failure paths. In CI, those daemon logs may be exposed in runner output, which can leak local paths or backend/proxy diagnostics from attacker-influenced inputs.

    Suggested fix: log only a redacted public category such as classifier timed out, classifier unavailable, or classifier failed by default, and restrict full stderr/details to an explicit debug or verbose path after additional scrubbing.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (b0cc217)

Summary verdict: One medium-severity issue found; no high or critical findings reported.

Medium

Prompt injection can suppress design reviews
internal/prompt/classify.go:35

The classifier prompt embeds untrusted commit subject, body, and diff content directly into the model instruction stream, then uses the model’s design_review decision. A PR author could add prompt-injection text in commit metadata or diffs to influence the classifier toward {"design_review": false}, suppressing automatic design review for externally supplied changes.

Suggested remediation: treat classifier inputs as hostile data. Wrap untrusted sections with explicit data-only boundaries, neutralize Markdown fence breaks, and prefer fail-open behavior where ambiguous, suspicious, or failed classifier results run the design review instead of skipping it. Consider deterministic triggers for high-risk file patterns so external text cannot suppress review through the model path.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (06af0d6)

Review verdict: three medium-severity findings need attention; no high or critical issues were reported.

Medium

  • internal/daemon/worker_classify.go:89
    Classify jobs use Diff: derefString(job.DiffContent), but auto-design classify rows are enqueued without diff_content. Commit-backed ambiguous classifications can therefore run with an empty diff and a misleading ~0 line changes stat.

    Fix: Fetch git.GetDiff(job.RepoPath, job.GitRef) in processClassifyJob when DiffContent is nil for commit-backed jobs, or persist the diff when enqueueing the classify row.

  • internal/daemon/ci_poller.go:872
    If EnqueueAutoDesignJob returns 0 because another producer won the dedup race after HasAutoDesignSlotForCommit, attachAutoDesignToBatch returns without linking the existing auto-design row to the CI batch. The batch may synthesize without waiting for or including that auto-design outcome.

    Fix: When jobID == 0, look up the existing auto-design job ID for (repoID, headSHA) and attach it to the batch.

  • internal/prompt/classify.go:41
    The classifier prompt wraps untrusted diff text in <diff context-only="true">, but the diff content is not escaped. A malicious commit can include a closing tag or fence, then inject instructions outside the intended wrapper. Because this classifier decides whether to suppress a design review, commit-controlled prompt injection can bias the router toward design_review=false.

    Fix: Escape or encode diff content before embedding it in XML-like tags. At minimum, escape <, >, and & in Diff and DiffStat, and add regression tests with injected </diff> and code-fence breakout content.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (8cfd017)

Medium severity issues were found; no Critical or High findings were reported.

Medium

  • internal/daemon/ci_poller.go:737 - Auto-design deduplication can miss existing commitless rows. If an earlier CI dispatch inserted an auto_design row with commit_id = NULL after commit metadata lookup failed, a later dispatch that resolves commit_id will not be caught by HasAutoDesignSlotForCommit, and the (repo_id, commit_id, review_type) index can allow a duplicate row for the same git_ref.

    Suggested fix: ensure CI always creates or uses a commit row before auto-design inserts, or make the dedup check/index cover git_ref for all auto-design rows, not only commit_id IS NULL.

  • internal/prompt/classify.go:70 - BuildClassifyPrompt XML-escapes untrusted diff content but still embeds it inside a Markdown code fence. A PR author can include triple backticks in a diff to close the fence and make injected text appear as normal prompt text inside the <diff> block, potentially manipulating the classifier into returning design_review: false and bypassing auto-design review routing.

    Suggested fix: avoid Markdown fences for untrusted classifier inputs. Encode commit message, diff stat, and diff as JSON string fields, base64, or a strict length-delimited format. If fences are retained, escape or replace triple-backtick sequences and add regression tests.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (b58c6f9)

Summary verdict: changes are close, but two Medium issues should be fixed before merge.

Medium

  • internal/daemon/server_auto_design.go:136 - Dirty jobs routed through auto-design do not populate ChangedFiles when DiffContent is present. That means path heuristics cannot trigger for dirty diffs, so a small dirty migration/schema change can be incorrectly recorded as skipped by the trivial-diff rule.

    • Fix: exclude dirty jobs from auto-design until file metadata is available, or parse changed paths from diff headers and pass them into autotype.Input.
  • internal/storage/sync.go:349 - Skipped jobs are syncable, but sync structs, SQLite selection, Postgres upsert, and pull paths do not carry skip_reason or source. A synced skipped auto-design row can lose its audit reason and no longer be marked as auto_design in Postgres.

    • Fix: add skip_reason and source to the sync model/query/upsert paths, plus round-trip tests.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud cpcloud force-pushed the worktree-splendid-wishing-yao branch from b58c6f9 to 8b21a9f Compare April 21, 2026 14:14
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (8b21a9f)

Auto-design routing has several medium-severity issues that should be addressed before merge.

Medium

  • internal/daemon/server_auto_design.go:131
    Dirty review jobs are sent through auto-design, but ChangedFiles is only populated for JobTypeReview. As a result, dirty diffs do not trigger path-based classification rules, and a small dirty migration/schema change can be incorrectly marked skipped by the trivial-diff path.
    Fix: Exclude JobTypeDirty from auto-design until dirty changed-file collection exists, or populate dirty changed files before calling autotype.Classify.

  • internal/daemon/worker_classify.go:62
    SetTestClassifierVerdict is compiled into production code and globally overrides classifier outcomes. Because it is exported from a non-test file, in-process package code or future integrations could bypass the configured classifier and force auto-design routing decisions.
    Fix: Move the test override behind _test.go files or a test-only build tag, and inject classifier behavior through test fixtures.

  • docs/auto-design-review.md:23
    The docs list codex as a supported classifier and backup classifier, but CodexAgent does not implement SchemaAgent and ValidateClassifyAgent("codex") rejects it. Users following the documented config will hit validation errors or configure a nonfunctional backup path.
    Fix: Update the docs to state that only supported schema-capable classifiers should be used, currently excluding Codex unless a safe no-tools schema classification mode is implemented.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (11210d1)

Review verdict: two Medium correctness issues remain; no High or Critical findings were reported.

Medium

  • internal/daemon/ci_poller.go:735 - CI auto-design dispatch reloads global config from disk instead of using the repo/poller-resolved cfg. This can incorrectly skip auto-design in CI when auto_design_review.enabled or heuristics are supplied by the CI-loaded repo config rather than local/global config.

    • Fix: pass the resolved cfg from processPR into maybeDispatchAutoDesignForCI and use it for ResolveAutoDesign* / classify settings instead of calling config.LoadGlobal().
  • internal/daemon/server.go:1216 - The enqueue gate comments say dirty reviews are excluded, but the condition only checks job_type == review and default review type. Dirty review jobs can still reach maybeDispatchAutoDesign, and missing ChangedFiles can cause path-trigger changes to fall through to an incorrect trivial skip/classify decision.

    • Fix: explicitly exclude dirty jobs at the gate, for example by checking job.GitRef != "dirty" and any existing dirty/range indicators before calling maybeDispatchAutoDesign.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 21, 2026

roborev: Combined Review (1f98ad1)

Review found 2 Medium issues that can leave CI batches incomplete until stale reconciliation.

Medium

  • internal/daemon/worker_classify.go:185 - CI-attached classify jobs marked as skipped do not appear to broadcast a terminal event or increment the linked CI batch’s completed counter. Because AttachJobAndBumpTotal only increments completed counts for jobs already terminal at attach time, queued classify rows that later resolve to skip can leave the batch waiting instead of synthesizing promptly.
    Suggested fix: After MarkClassifyAsSkippedDesign succeeds, update the linked CI batch as completed and trigger the same synthesis path used for review.completed, or emit a terminal event that CI polling handles for skipped jobs.

  • internal/daemon/ci_poller.go:747 - Existing auto-design jobs can be attached to multiple CI batches, but completion handling still uses GetCIBatchByJobID, which returns only one batch. When that shared job completes, only one attached batch is advanced; other linked batches can remain incomplete until stale reconciliation.
    Suggested fix: Fetch and update all unsynthesized batches linked to the job, or prevent one non-terminal job from being linked to multiple batches.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 22, 2026

roborev: Combined Review (9e7bb93)

Auto-design dispatch has one actionable issue: dirty/range review refs are not fully excluded before dispatch.

High

  • internal/daemon/server.go:1217 / around handleEnqueue line 1272
    • Problem: The auto-design dispatch condition only checks job.JobType == storage.JobTypeReview and config.IsDefaultReviewType(req.ReviewType), even though the intended behavior excludes dirty refs and range refs. Dirty jobs still use JobTypeReview, so they can enter maybeDispatchAutoDesign with empty or inappropriate changed-file data and create bogus skipped/classify rows.
    • Fix: Add explicit guards before dispatch, such as job.GitRef != "dirty" and rejecting range syntax like .., or centralize this in a helper covered by dirty/range tests.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Adds an opt-in per-commit decision (heuristics + schema-constrained
classifier fallback) that routes design reviews automatically.
Disabled by default; enable via `[auto_design_review] enabled = true`
in ~/.roborev/config.toml or .roborev.toml.

Heuristics layer (internal/review/autotype):
- Path globs via doublestar/v4: trigger_paths (e.g. **/migrations/**,
  **/*.sql); skip_paths (e.g. **/*.md, **/testdata/**).
- Diff size: large_diff_lines triggers; min_diff_lines below skips.
- Commit subject regex: trigger_message_patterns
  (refactor|redesign|...); skip_message_patterns (^docs:|^test:|...).
- Trigger rules run before skip rules; both before classifier fallback.
- Reasons sanitized before storage: control characters stripped,
  length capped (200 runes) so untrusted filenames/messages cannot
  inject terminal escapes or markdown into PR comments and the TUI.

Schema-constrained classifier:
- New SchemaAgent capability. Only claude-code implements it today
  via `--json-schema`. Claude classify runs with `--tools ""` (no
  file or shell access), validated at runtime via
  claudeSupportsToolsFlag so an older CLI that ignores the flag
  fails closed. Env vars are stripped and proxy auth handled via a
  shared buildClaudeEnv so inherited ANTHROPIC_API_KEY /
  BASE_URL / AUTH_TOKEN cannot leak into classifier traffic.
- Codex is deliberately NOT a SchemaAgent: `codex exec
  --sandbox read-only` blocks writes but not reads, and codex has no
  equivalent to `--tools ""`. ValidateClassifyAgent rejects codex as
  classify_agent.
- Classifier prompt wraps commit subject/body/diff-stat/diff in
  context-only XML tags, XML-escapes all wrapped content (including
  diff body), drops the markdown fence so a crafted diff can't close
  it, and warns the model that tag contents are data not instructions.

Storage:
- New JobStatusSkipped terminal state and JobTypeClassify, threaded
  through every hardcoded terminal-state filter (GetStaleBatches,
  ReconcileBatch, sync, ReenqueueJob, HasMeaningfulBatchResult,
  GetExpiredBatches, GetNonTerminalBatchJobIDs). Terminal-status
  classification (done/skipped/applied/rebased as completed;
  failed/canceled as failed) is aligned across AttachJobAndBumpTotal,
  ReconcileBatch, and the filter helpers.
- New skip_reason and source columns on review_jobs plus two partial
  unique indexes scoped to source='auto_design' (commit-backed and
  ref-only). SQLite migration + Postgres v12 schema + v11->v12
  migration step. Dedup indexes created in the currentVersion==0
  first-time block so fresh Postgres bootstraps get them without
  running the migration, and not in the embedded v12.sql so a
  v1->v12 migration does not reference the source column before
  it's added. Partial unique index widened to
  (repo_id, git_ref, review_type) so the commit-backed + commitless
  cross-case race is enforced at the storage layer; migration falls
  back to the narrow form if pre-existing duplicates would block.
- Lifecycle helpers (PromoteClassifyToDesignReview,
  MarkClassifyAsSkippedDesign, InsertSkippedDesignJob,
  EnqueueAutoDesignJob, HasAutoDesignSlotForCommit, ListJobsByStatus,
  AttachJobAndBumpTotal, ListCIBatchesByJobID). Classify rows convert
  in place via UPDATE so the dedup index isn't fought.
  AutoDesignAgentSentinel fills the NOT NULL agent column; Promote
  rewrites agent/model to the real design-workflow agent at
  promotion time. AttachJobAndBumpTotal is idempotent per
  (batch_id, job_id) via a unique index plus INSERT OR IGNORE; the
  migration dedupes pre-existing duplicate link rows and
  recalculates inflated total/completed/failed counters.
- ReenqueueJob accepts 'skipped' and clears skip_reason so a
  reenqueued auto-design row doesn't carry the stale reason forward.

Daemon integration:
- Worker handler for job_type='classify' runs the classifier and
  converts the row in place. Classify uses agent.Get (not
  GetAvailable) so an unknown / uninstalled classify_agent fails
  closed — only the explicitly configured classify_backup_agent is
  an allowed fallback. Classifier failures mark 'skipped' (not
  'failed') so CI batch accounting stays accurate. err.Error() is
  redacted out of skip_reason (maps to {timed out, unavailable,
  failed} public categories) while the full err plus any
  backup-resolve error is persisted to job.error for operators.
- Skip paths broadcast review.completed so CI batches and other
  subscribers advance; otherwise the row would stay short on
  completed_jobs until stale-batch reconciliation.
  review.completed/review.failed handlers iterate every linked batch
  via ListCIBatchesByJobID since auto-design rows are shared across
  batches that dedup onto them.
- Enqueue handler dispatches auto-design for default-typed
  single-commit review jobs only (dirty excluded: no ChangedFiles
  list means a small uncommitted migration change would bypass
  TriggerPaths; re-enable when dirty file-list extraction lands).
  Opportunistic and never blocks the caller's HTTP response.
  Explicit review_type=design or security bypasses. Follow-up
  design reviews are persisted with the resolved design-workflow
  agent/model.
- CI poller runs the same path per-commit across the PR range
  (git rev-list mergeBase..head). Each auto-design outcome is
  attached to the CI batch via AttachJobAndBumpTotal so the batch
  total reflects the new rows and synthesis waits for them. When an
  auto_design row already exists for the head commit, the poller
  attaches the existing row to the new batch instead of returning
  early.

Test-hook hygiene:
- SetTestClassifierVerdict lives in a _test.go file so the setter
  is only compiled into the test binary; production code cannot
  flip the hook to force an auto-design routing decision.

User surface:
- Skipped rows render in the TUI with a dimmed style and truncated
  reason.
- PR synthesis includes skipped rows as a short
  "Auto-design-review skipped: <reason>" section.
- /api/status surfaces a SkippedJobs aggregate count and, when the
  feature is effectively enabled, an auto_design subobject with
  five per-outcome counters (triggered/skipped x
  heuristic/classifier, plus classifier_failed). Subobject is
  omitted from JSON when disabled. Counters update from all four
  producers via a process-global AutoDesignMetrics with atomic ops.

Heuristic engine benchmark on AMD Ryzen 9 9950X — well under the
25 ms spec target:
  HeuristicTrigger     113.5 ns/op
  HeuristicSkip       5006   ns/op
  Ambiguous           7668   ns/op

Testing:
- Unit coverage in internal/review/autotype, internal/agent,
  internal/daemon, internal/storage: every heuristic branch,
  schema-agent validation, classify-row lifecycle transitions,
  dedup invariants, observability counters, reason sanitization,
  backup-config error composition, skip-path event broadcast,
  multi-batch list lookup.
- End-to-end coverage in internal/daemon/e2e_auto_design_test.go
  against real git repos: heuristic triggers (path / message /
  large diff / large file count), heuristic skips (trivial /
  doc-only / conventional prefix), classifier promotions + skips,
  dedup, classifier failure, disabled, and HTTP-level bypass for
  explicit --type design / security.
- CI-path e2e in internal/daemon/e2e_ci_auto_design_test.go:
  attach-to-batch for trigger / skip / existing-row, disabled
  no-op, listCommitsInRange multi-commit and empty range.

User docs at docs/auto-design-review.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wesm wesm force-pushed the worktree-splendid-wishing-yao branch from 9e7bb93 to db52639 Compare April 22, 2026 00:50
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 22, 2026

roborev: Combined Review (db52639)

Review verdict: changes have medium-risk routing issues that can create invalid auto-design review jobs.

Medium

  • internal/daemon/server.go:1215
    The auto-design dispatch gate says range and dirty jobs are excluded, but the condition only checks for review jobs with the default review type. Normal review jobs using refs like base..head or dirty can still dispatch auto-design, which can produce bogus classify/skipped rows.
    Fix: Add explicit guards before maybeDispatchAutoDesign, such as skipping refs containing .., ..., or dirty, and skipping jobs with inline dirty diff content unless file-list extraction is implemented.

  • internal/storage/jobs.go:581
    ReenqueueJob now permits skipped jobs, but auto-design skipped rows are inserted with agent "auto-design". Re-enqueuing one clears skip_reason and queues a review job that the worker cannot execute because "auto-design" is only a sentinel, not a real agent.
    Fix: Reject re-enqueue for skipped auto-design rows, or resolve and persist the real design agent/model before re-enqueuing them.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@codecov-commenter
Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

The squash-rebase combined this branch's bmatcuk/doublestar/v4 add
with main's batch dep bump (roborev-dev#666), producing a go.sum neither
existing vendorHash matched. Hash copied from the nix CI job output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 22, 2026

roborev: Combined Review (cd1c0d4)

Summary verdict: one Medium issue remains; no High or Critical findings were reported.

Medium

  • internal/daemon/server_auto_design.go:167 - Invalid auto-design glob/regex config is treated as a heuristic runtime error and converted into a skipped design row. A typo in trigger_paths, skip_paths, or message patterns can silently suppress all automatic design reviews instead of failing configuration validation.
    • Fix: Validate resolved AutoDesignHeuristics with autotype.Heuristics.Validate() when loading/resolving config, and fail startup or disable auto-design clearly rather than inserting “heuristic error” skipped rows for config errors.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Invalid trigger_paths/skip_paths globs or message-pattern regexes previously
flowed into Classify at dispatch time, where the returned error was coerced
into a skipped "auto-design: heuristic error" row. A typo in any of these
fields silently suppressed every automatic design review instead of surfacing
a configuration error.

Validate resolved heuristics up front: fail daemon startup when the globally
enabled config is invalid, and at per-repo dispatch log the offending pattern
and no-op rather than inserting a phantom skipped row.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 25, 2026

roborev: Combined Review (217ec60)

Verdict: Changes need fixes before merge due to build/startup breakages and a stranded-job failure path.

High

  • internal/daemon/server_auto_design.go (parent.CommitIDValue() calls)
    *storage.ReviewJob does not define CommitIDValue(), while the field is CommitID *int64. This will fail to compile. Safely dereference parent.CommitID or add a real accessor method.

  • internal/storage/db.go (ensureCIPRBatchJobsUniqueIndex)
    The deduplication SQL references id, but ci_pr_batch_jobs has no id column. This can fail daemon startup with no such column: id. Use SQLite rowid or another actual unique row identifier.

Medium

  • internal/daemon/worker_classify.go (applyClassifyVerdict)
    If PromoteClassifyToDesignReview or MarkClassifyAsSkippedDesign fails, the error is only logged and the job can remain stuck in running indefinitely. Mark the job failed or otherwise abort/retry the cycle explicitly.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

If PromoteClassifyToDesignReview or MarkClassifyAsSkippedDesign returns
an error, the row is still in 'running' status. Logging the error and
returning leaves it stranded indefinitely: a linked CI batch waits on
the row, the worker quota stays consumed, and stale-batch reconciliation
is the only recovery.

Add a shared failClassifyOnDBError helper that calls FailJob on the
same (jobID, workerID) pair and broadcasts review.failed when the
transition succeeds. broadcastClassifyFailed mirrors the existing
broadcastClassifyTerminal but emits review.failed so subscribers
counting failed_jobs advance instead of waiting.

Wire the helper into both applyClassifyVerdict branches (promote and
clean-skip) and into completeClassifyAsSkip (the classifier-failure
degrade-to-skip path).

Tests force the inner UPDATE's WHERE clause to miss by mutating
review_jobs.source after claim — Promote/Mark pin source='auto_design'
but FailJob doesn't, so the recovery path can succeed even though the
primary call returns sql.ErrNoRows. Asserts: review.failed broadcast,
job status transitioned to 'failed', error message identifies the
failing op for operators.
@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 25, 2026

Re: roborev-ci review on 217ec60 (comment) — the two High findings are not real bugs. The Medium is, and is fixed in 052a866.

High #1parent.CommitIDValue() "doesn't exist, will fail to compile"

The method is defined at internal/storage/models.go:110-112:

// CommitIDValue returns the commit ID as a plain int64 (0 if nil).
func (j ReviewJob) CommitIDValue() int64 {
    ...
}

go build ./... at 217ec60 exits 0; the CI test jobs on this commit (ubuntu/macos/windows) all passed.

High #2ensureCIPRBatchJobsUniqueIndex "references id, but ci_pr_batch_jobs has no id column"

The schema at internal/storage/db.go:107-112:

CREATE TABLE IF NOT EXISTS ci_pr_batch_jobs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  batch_id INTEGER NOT NULL REFERENCES ci_pr_batches(id),
  job_id INTEGER NOT NULL REFERENCES review_jobs(id),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

The id column exists. go test ./internal/storage/... (which calls OpenDB, which calls ensureCIPRBatchJobsUniqueIndex) passes locally and in CI.

Medium — stuck running classify row on DB transition error

This one is real. applyClassifyVerdict and completeClassifyAsSkip logged the PromoteClassifyToDesignReview / MarkClassifyAsSkippedDesign error and returned, leaving the row in running. Fixed in 052a866: a shared failClassifyOnDBError helper now calls FailJob and broadcasts review.failed so any linked CI batch advances instead of waiting for stale-batch reconciliation. Tests in worker_classify_test.go force the inner UPDATE's WHERE clause to miss (via source mutation) and assert the row transitions to failed with review.failed broadcast.

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 25, 2026

roborev: Combined Review (052a866)

No Medium, High, or Critical findings were reported.

All review agents either found no issues or only reported Low severity findings, which are omitted per the requested threshold.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The PR mistakenly added a docs/ tree to this repo. Project docs live in
roborev-dev/roborev-docs and render at https://roborev.io. The auto-
design-review page will be authored over there.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 25, 2026

roborev: Combined Review (9f897db)

Summary verdict: No Medium, High, or Critical findings were reported.

All agents agree the reviewed code is clean for reportable issues.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 25, 2026

This should be ready to go.

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Apr 25, 2026

Ok, firing torpedoes!

@wesm wesm merged commit 4879fa8 into roborev-dev:main Apr 25, 2026
8 checks passed
@cpcloud cpcloud deleted the worktree-splendid-wishing-yao branch April 25, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants