feat: oracle delta + grouping + honest "Automated review" header#31
Merged
Conversation
## Why
spar#175 (the docs PR) got an oracle review with **91 findings, every one
pre-existing**, with the *same message* repeated 50+ times. The user's
reaction was "not sure if i like it, unclear what to do with it" — the
right reaction. A PR review should answer "did this PR make things worse",
not "list every diagnostic in the project".
Three independent changes, shipped together because they're all the same
"make the review legible" theme:
## 1. Delta filter (`subtractFindings`)
Run `rivet validate` at HEAD AND at BASE. Surface only findings present in
HEAD but not in BASE. Pre-existing diagnostics get filtered. On a docs PR
this drops 91 findings to 0 — which is the correct answer.
Cost: one extra tarball fetch + extract + validate (~5–15 s). Acceptable.
## 2. Grouping (`groupOracleFindings`)
Findings sharing the same `(source, severity, message)` collapse into one
finding listing all affected `artifact_ids`:
Before: 50 lines of "REQ-XXX: every requirement should be satisfied by …"
After: 1 line: "Every requirement should be satisfied by … —
affecting: REQ-INST-003, REQ-TRANSFORM-002, … (+45 more)"
`message` is now stored separately on findings (was inlined into `claim`).
`maxIdsShown` defaults to 10, then "… (+N more)".
## 3. Honest header rename
"AI Code Review" was misleading on PRs where the AI contributed nothing
beyond the summary line (most PRs to date). Now:
- Header: "## Automated review for PR #N"
- HTML marker `<!-- temper-automated-review -->` for supersede detection
(independent of visible header text)
- Footer breakdown: "_Findings: X mechanical (rivet) · Y from local AI model._"
- supersedePreviousReviews matches BOTH the new marker AND the legacy
"AI Code Review" string so old comments still get marked outdated.
## Test plan
- [x] All 788 tests pass (was 773 — added 15 covering subtract / group /
header rename / breakdown footer / grouped finding render)
- [x] eslint clean
- [ ] After deploy: `/review-pr` on spar#175 (or any non-trivial PR) should
now show: (a) a much shorter findings list (delta only), (b) grouped
messages, (c) the new "Automated review" header with breakdown.
- [ ] On a docs-only PR the oracle should report 0 new findings → comment
either skipped (verdict: approve, findings: []) or shows just the
model summary.
## Risk & rollout
- Risk: medium. Three behaviour changes in one release. Each is opt-in
via existing config (rivet_oracle.enabled), and the verdict-from-findings
pipeline gracefully handles 0 findings (silence > slop).
- Rollout: self-update on merge.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
spar#175 got an oracle review with 91 findings, every one pre-existing, the same message repeated 50+ times. User feedback: "not sure if i like it, unclear what to do with it". A PR review should answer "did this PR make things worse?" — not "what's the static state of the project?".
Three changes
1. Delta filter — `subtractFindings`
Run `rivet validate` at HEAD AND BASE. Surface only findings present at HEAD but not BASE. Pre-existing backlog filtered. On a docs PR: 91 → 0. Cost: one extra tarball + validate (~5–15 s).
2. Grouping — `groupOracleFindings`
Same `(source, severity, message)` → one finding with `artifact_ids[]` list. 50 "every requirement should be satisfied by ..." lines → 1 line listing all affected REQ ids.
3. Honest header rename
Old "AI Code Review" was misleading when the AI contributed only the summary line. Now:
Test plan
Risk & rollout
🤖 Generated with Claude Code