feat(review-pr): auto-filter low-risk files before chunking#215
feat(review-pr): auto-filter low-risk files before chunking#215derekmisler wants to merge 1 commit into
Conversation
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
This PR correctly implements a two-phase auto-filter mechanism and moves risk scoring before diff chunking. The new TEST_FILE_RE additions for Rust/Ruby patterns look correct, and the tsup.config.ts entry is properly wired. One medium-severity logic issue was found in Phase 2 of autoFilterDiff.
derekmisler
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
No new findings from this review pass the quality gate. One confirmed bug is already flagged in an existing inline comment (line 164 of src/auto-filter-diff/auto-filter-diff.ts): the Phase 2 progressive cap uses slice(0, -1) to protect the last file in the ascending-risk sort, but when multiple unscored files exist they all sort to Infinity and only the very last is protected — the others remain removable, violating the "unknown = needs review" invariant.
All other items examined were verified clean: missing set -euo pipefail in the auto-filter step (single-command step, no functional impact), remainingLines preamble accounting (no production trigger path), TEST_FILE_RE regex (works correctly), step ordering (correct: filter → score → auto-filter → chunk), lock-check guard asymmetry (harmless: pr.diff is never created when skip=true), and extractFilePath edge cases (correct for normal paths).
da46da7 to
77bdd24
Compare
Signed-off-by: Derek Misler <derek.misler@docker.com>
77bdd24 to
0b9207e
Compare
Summary
Fixes the root cause of large-PR review timeouts by making the risk-scoring pipeline automatically filter out low-risk files before diff chunking, without requiring per-repo
exclude-pathsconfiguration.Root causes addressed
Risk scoring happened after chunking — scores were computed but never used to filter the diff. Every file (including test files, generated code, etc.) got chunked and processed by the full multi-agent pipeline.
TEST_FILE_REinscore-risk.tsmissed Rust/Ruby test files — Rust test/bench/spec files and Ruby spec files scored non-zero and were included in chunks unnecessarily.Changes
Change 1 — Extend
TEST_FILE_REinsrc/score-risk/score-risk.tsExtended the regex to also match:
_test.rs,_bench.rs,_spec.rs_spec.rbtests/,test/,benches/,__tests__/, orspecs/directory componentAdded 9 new Vitest tests covering these new patterns.
Change 2 — New module
src/auto-filter-diff/Pure TypeScript module with two-phase filtering:
maxDiffLines(default 3000), removes lowest-risk files first until it fits. Always keeps at least 1 file.Includes 18 Vitest unit tests covering all phases and edge cases.
Change 3 — Wire up in
review-pr/action.ymlmax-diff-linesinput (default"3000", set"0"to disable)New step order: Filter excluded paths → Score file risk → Auto-filter → Split into chunks → Generate file history
Change 4 — Propagate through
.github/workflows/review-pr.ymlAdded
max-diff-linesas a workflow-levelnumberinput (default3000) and passed it through to thereview-praction call.Change 5 —
tsup.config.tsAdded
'src/auto-filter-diff/index.ts'to theentrymap so tsup bundles it asdist/auto-filter-diff.js.Test results
Biome formatting and TypeScript type-check pass cleanly. The pre-existing
actionlintwarning about thenode24runner intest-e2e.ymlis unrelated to this PR (present onmainbefore these changes).