fix: always post a review comment even with zero findings#15
Conversation
The pr-review agent would exit 0 without posting any GitHub comment when review_complete=true and zero findings were found — only the review_complete=false + zero findings path explicitly posted. Changes: - pr-review.yaml step 5: 'review_complete=true AND zero findings' path now mandates posting a 🟢 APPROVE COMMENT review via gh api before exiting. - pr-review.yaml step 8 rule docker#4: clarify that COMMENT event applies even when the findings list is empty. - pr-review.yaml 'Delivering the Review': add a concrete zero-findings bash example with an empty comments array so the agent has a literal template to follow. - review-pr/action.yml post-summary step: defense-in-depth fallback — when EXIT_CODE=0 but the verbose log contains no pullrequestreview ID, post a fallback '🟢 No issues found — LGTM!' issue comment so the PR always gets feedback. Fixes the root cause (agent instruction) and adds an action-level safety net in case the agent still misses the posting step.
docker-agent
left a comment
There was a problem hiding this comment.
direction is right (the silent-exit bug was real), but the fallback comment in action.yml needs a dedup guard, otherwise every re-run stacks another LGTM on the PR. also left a couple of non-blocking notes on the agent YAML (contradictory --arg body example and ambiguous step ordering) as things to keep in mind if the zero-findings path still misbehaves after this.
| # This guards against the agent exiting 0 without calling gh api (e.g., zero-findings early exit). | ||
| if [ -n "$VERBOSE_LOG_FILE" ] && [ -f "$VERBOSE_LOG_FILE" ] && ! grep -qE 'pullrequestreview-[0-9]+' "$VERBOSE_LOG_FILE" 2>/dev/null; then | ||
| echo "::warning::Agent exited 0 but no review was posted — posting fallback LGTM comment" | ||
| gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \ |
There was a problem hiding this comment.
issue: the fallback gh api call on line 877 posts a brand-new issues-API comment every time the workflow re-runs and the agent exits 0 without a pullrequestreview-* log entry. no dedup check at all: three re-runs = three identical 🟢 No issues found, LGTM! comments piling up.
easiest fix is embedding a sentinel marker in the body and checking for it before posting:
MARKER="<!-- docker-agent-lgtm-fallback -->"
EXISTING=$(gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \
--jq "[.[] | select(.body | contains(\"docker-agent-lgtm-fallback\"))] | length")
if [ "$EXISTING" -eq 0 ]; then
printf '%s\n🟢 **No issues found**, LGTM! [View logs](%s).' "$MARKER" "$RUN_URL" \
| gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" --input -
fi(alternatively, delete-and-recreate, or just switch to gh pr review --approve which GitHub deduplicates naturally.)
There was a problem hiding this comment.
Fixed in the follow-up commit (780c0a1). I used a startswith prefix match on 🟢 **No issues found** rather than an embedded HTML marker — same dedup effect, slightly simpler body. If the sentinel-marker approach is preferred for robustness (e.g. if the body text ever changes), happy to switch, but the current check covers the re-run duplicate scenario.
There was a problem hiding this comment.
the startswith check works fine for this. you're right that it covers the re-run scenario, and the body is unlikely to change independently of the guard. html marker is slightly more robust against accidental body edits but not worth the extra noise here. good fix.
Before posting the defense-in-depth '🟢 No issues found' comment, query the PR's existing comments and skip posting if one matching the prefix already exists. This prevents duplicate identical comments when the workflow is re-run multiple times on the same PR.
## Summary Adds a precise, multi-criteria confidence scoring model for the PR reviewer. Each verified finding gets a 0 to 100 confidence score, a band, and a posting disposition, computed deterministically from several signals rather than a subjective guess. The model lives in two synchronized surfaces: | Surface | Role | | ------- | ---- | | `src/score-confidence/` (pure module + CLI + unit tests) | Single source of truth for the model | | `review-pr/agents/pr-review.yaml` "Confidence Scoring" section | Strict lookup table the orchestrator applies inline (the gitignored `dist/` is not available at agent runtime) | ## Criteria | Criterion | Source | Role | | --------- | ------ | ---- | | Verdict (CONFIRMED / LIKELY / DISMISSED) | verifier | Primary agreement signal | | `evidence_strength` (direct / circumstantial / speculative) | verifier (new field) | Pattern / snippet match strength | | `context_completeness` (full / partial / none) | verifier (new field) | Whether the verifier saw the code it needed | | Severity concordance | derived | Drafter vs verifier severity rank distance | | Scope (`in_diff` and `in_changed_code`) | derived | Hard gate | | Category / severity | verifier | Posting policy only (security floor, high-severity always-post) | ## Scale and threshold - Score 0 to 100 from a precomputed `verdict x evidence x context` table (`CONFIRMED 70` / `LIKELY 40` base; evidence `+18 / +8 / -4`; context `+12 / +4 / -10`), plus concordance (`+5 / 0 / -8`), clamped. - Bands: `strong >= 80`, `moderate 55..79`, `weak 30..54`, `negligible < 30`. - Default posting threshold is 55 (the moderate-band floor, so the band and the post cutoff cannot drift apart). - Only CONFIRMED can reach `strong` (LIKELY tops out at 75), a property the unit tests pin. ## Posting policy (first match wins; the 5-comment cap is applied last) | Rule | Behavior | | ---- | -------- | | Out-of-scope / DISMISSED non-security | not surfaced | | Security floor (CONFIRMED/LIKELY) | always inline, never auto-suppressed, cap-exempt | | High-severity (CONFIRMED/LIKELY) | always inline, cap-exempt | | Band strong or moderate | inline (subject to the cap) | | Band weak, non-forced | lower-confidence summary list (not silently dropped) | | Negligible band, verifier severity medium | summary (visibility floor) | | DISMISSED security | dismissed-security audit line (human-reviewable) | | Non-forced inline overflow past 5 | demoted to the summary list | ## Maps to the requested design | Requested | Implemented | | --------- | ----------- | | Verifier agreement score | verdict + drafter/verifier severity concordance | | Pattern match strength | `evidence_strength` field | | Context completeness | `context_completeness` field | | Scale (high/medium/low or 0-100) | 0 to 100 plus four named bands | | Default threshold | 55 (moderate floor) | | Precise, multiple criteria | deterministic lookup table; six criteria; 84 unit tests pin every value | ## How it was hardened A design review (three independent lenses: calibration, security policy, LLM reproducibility) locked the constants, replacing error-prone post-hoc caps with the lookup table and removing band dead-zones. An adversarial verification pass then confirmed and fixed three defects: | Defect | Severity | Resolution | | ------ | -------- | ---------- | | Verifier escalating severity could silently drop a medium-severity finding (concordance is non-monotonic in severity) | major | Medium-severity visibility floor: a still-believed medium finding is kept in the summary, never dropped | | `category` was the only enum not validated, so a misspelled value silently disabled the security floor | minor | Validate `category` via `assertEnum` | | Severity-disagreement can move a borderline finding from inline to summary | minor | Documented as intended: the finding stays visible, and confidence legitimately reflects assessor agreement (a different axis from severity) | TS-to-prompt numeric consistency (all 18 table cells, bands, threshold, posting precedence, schema) was verified clean. ## Compatibility with #15 #15 ("always post a review comment even with zero findings") and this change touch distinct regions of `pr-review.yaml` and are complementary: an empty inline set yields a `APPROVE` assessment label while the summary and audit sections still go in the review body, and the review is still posted. ## Validation | Check | Result | | ----- | ------ | | `pnpm build` | pass | | `pnpm test` (554 tests, 84 new) | pass | | `tsc --noEmit` | pass | | `biome ci` | pass | | `actionlint` | pass | ## Note on runtime placement The model is applied by the orchestrator prompt (mirroring the tested TS module) rather than invoked as a `dist` bundle at agent runtime, because `dist/` is gitignored and not present in the agent's working tree. If invoking the compiled scorer at runtime is preferred, that is a larger pipeline change (staging the bundle plus a permission) and can be done separately.
Problem
The
review-pragent exits 0 without posting any GitHub PR comment whenreview_complete=trueand zero findings. Thereview_complete=false + zero findingspath explicitly says 'post a COMMENT review', but thereview_complete=truepath jumps to step 8 and then step 9 ('Build inline comments') — with nothing to post, the agent exits silently.Changes
review-pr/agents/pr-review.yaml(primary fix)review_complete=true+ zero findings path): now explicitly mandates posting a 🟢 APPROVE COMMENT review with an empty comments array viagh apibefore exiting.commentsarray, giving the agent a literal template to follow.review-pr/action.yml(defense-in-depth)EXIT_CODE=0branch): added a check — if the verbose log exists but contains nopullrequestreview-[0-9]+reference, post a fallback issue comment (🟢 **No issues found** — LGTM!) to guarantee PR visibility.Testing
pnpm test).tests/test-job-summary.sh,tests/test-output-extraction.sh) pass.dist/.