fix: always post a review comment even with zero findings by docker-agent · Pull Request #15 · docker/docker-agent-action

docker-agent · 2026-06-24T15:39:15Z

Problem

The review-pr agent exits 0 without posting any GitHub PR comment when review_complete=true and zero findings. The review_complete=false + zero findings path explicitly says 'post a COMMENT review', but the review_complete=true path jumps to step 8 and then step 9 ('Build inline comments') — with nothing to post, the agent exits silently.

Changes

`review-pr/agents/pr-review.yaml` (primary fix)

Step 5 (review_complete=true + zero findings path): now explicitly mandates posting a 🟢 APPROVE COMMENT review with an empty comments array via gh api before exiting.
Step 8 rule rename: cagent to docker-agent-action across entire repo #4: clarified that the COMMENT event applies even when the findings list is empty — the assessment-label body must still be posted.
"Delivering the Review" section: added a concrete zero-findings bash example with an empty comments array, giving the agent a literal template to follow.

`review-pr/action.yml` (defense-in-depth)

Post-clean-summary step (EXIT_CODE=0 branch): added a check — if the verbose log exists but contains no pullrequestreview-[0-9]+ reference, post a fallback issue comment (🟢 **No issues found** — LGTM!) to guarantee PR visibility.

Testing

All 470 existing unit tests pass (pnpm test).
Shell tests (tests/test-job-summary.sh, tests/test-output-extraction.sh) pass.
No changes to TypeScript source or dist/.

The pr-review agent would exit 0 without posting any GitHub comment when review_complete=true and zero findings were found — only the review_complete=false + zero findings path explicitly posted. Changes: - pr-review.yaml step 5: 'review_complete=true AND zero findings' path now mandates posting a 🟢 APPROVE COMMENT review via gh api before exiting. - pr-review.yaml step 8 rule docker#4: clarify that COMMENT event applies even when the findings list is empty. - pr-review.yaml 'Delivering the Review': add a concrete zero-findings bash example with an empty comments array so the agent has a literal template to follow. - review-pr/action.yml post-summary step: defense-in-depth fallback — when EXIT_CODE=0 but the verbose log contains no pullrequestreview ID, post a fallback '🟢 No issues found — LGTM!' issue comment so the PR always gets feedback. Fixes the root cause (agent instruction) and adds an action-level safety net in case the agent still misses the posting step.

docker-agent

direction is right (the silent-exit bug was real), but the fallback comment in action.yml needs a dedup guard, otherwise every re-run stacks another LGTM on the PR. also left a couple of non-blocking notes on the agent YAML (contradictory --arg body example and ambiguous step ordering) as things to keep in mind if the zero-findings path still misbehaves after this.

docker-agent · 2026-06-24T15:48:09Z

+          # This guards against the agent exiting 0 without calling gh api (e.g., zero-findings early exit).
+          if [ -n "$VERBOSE_LOG_FILE" ] && [ -f "$VERBOSE_LOG_FILE" ] && ! grep -qE 'pullrequestreview-[0-9]+' "$VERBOSE_LOG_FILE" 2>/dev/null; then
+            echo "::warning::Agent exited 0 but no review was posted — posting fallback LGTM comment"
+            gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \


issue: the fallback gh api call on line 877 posts a brand-new issues-API comment every time the workflow re-runs and the agent exits 0 without a pullrequestreview-* log entry. no dedup check at all: three re-runs = three identical 🟢 No issues found, LGTM! comments piling up.

easiest fix is embedding a sentinel marker in the body and checking for it before posting:

MARKER="" EXISTING=$(gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \ --jq "[.[] | select(.body | contains(\"docker-agent-lgtm-fallback\"))] | length") if [ "$EXISTING" -eq 0 ]; then printf '%s\n🟢 **No issues found**, LGTM! [View logs](%s).' "$MARKER" "$RUN_URL" \ | gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" --input - fi

(alternatively, delete-and-recreate, or just switch to gh pr review --approve which GitHub deduplicates naturally.)

Fixed in the follow-up commit (780c0a1). I used a startswith prefix match on 🟢 **No issues found** rather than an embedded HTML marker — same dedup effect, slightly simpler body. If the sentinel-marker approach is preferred for robustness (e.g. if the body text ever changes), happy to switch, but the current check covers the re-run duplicate scenario.

the startswith check works fine for this. you're right that it covers the re-run scenario, and the body is unlikely to change independently of the guard. html marker is slightly more robust against accidental body edits but not worth the extra noise here. good fix.

Before posting the defense-in-depth '🟢 No issues found' comment, query the PR's existing comments and skip posting if one matching the prefix already exists. This prevents duplicate identical comments when the workflow is re-run multiple times on the same PR.

## Summary Adds a precise, multi-criteria confidence scoring model for the PR reviewer. Each verified finding gets a 0 to 100 confidence score, a band, and a posting disposition, computed deterministically from several signals rather than a subjective guess. The model lives in two synchronized surfaces: | Surface | Role | | ------- | ---- | | `src/score-confidence/` (pure module + CLI + unit tests) | Single source of truth for the model | | `review-pr/agents/pr-review.yaml` "Confidence Scoring" section | Strict lookup table the orchestrator applies inline (the gitignored `dist/` is not available at agent runtime) | ## Criteria | Criterion | Source | Role | | --------- | ------ | ---- | | Verdict (CONFIRMED / LIKELY / DISMISSED) | verifier | Primary agreement signal | | `evidence_strength` (direct / circumstantial / speculative) | verifier (new field) | Pattern / snippet match strength | | `context_completeness` (full / partial / none) | verifier (new field) | Whether the verifier saw the code it needed | | Severity concordance | derived | Drafter vs verifier severity rank distance | | Scope (`in_diff` and `in_changed_code`) | derived | Hard gate | | Category / severity | verifier | Posting policy only (security floor, high-severity always-post) | ## Scale and threshold - Score 0 to 100 from a precomputed `verdict x evidence x context` table (`CONFIRMED 70` / `LIKELY 40` base; evidence `+18 / +8 / -4`; context `+12 / +4 / -10`), plus concordance (`+5 / 0 / -8`), clamped. - Bands: `strong >= 80`, `moderate 55..79`, `weak 30..54`, `negligible < 30`. - Default posting threshold is 55 (the moderate-band floor, so the band and the post cutoff cannot drift apart). - Only CONFIRMED can reach `strong` (LIKELY tops out at 75), a property the unit tests pin. ## Posting policy (first match wins; the 5-comment cap is applied last) | Rule | Behavior | | ---- | -------- | | Out-of-scope / DISMISSED non-security | not surfaced | | Security floor (CONFIRMED/LIKELY) | always inline, never auto-suppressed, cap-exempt | | High-severity (CONFIRMED/LIKELY) | always inline, cap-exempt | | Band strong or moderate | inline (subject to the cap) | | Band weak, non-forced | lower-confidence summary list (not silently dropped) | | Negligible band, verifier severity medium | summary (visibility floor) | | DISMISSED security | dismissed-security audit line (human-reviewable) | | Non-forced inline overflow past 5 | demoted to the summary list | ## Maps to the requested design | Requested | Implemented | | --------- | ----------- | | Verifier agreement score | verdict + drafter/verifier severity concordance | | Pattern match strength | `evidence_strength` field | | Context completeness | `context_completeness` field | | Scale (high/medium/low or 0-100) | 0 to 100 plus four named bands | | Default threshold | 55 (moderate floor) | | Precise, multiple criteria | deterministic lookup table; six criteria; 84 unit tests pin every value | ## How it was hardened A design review (three independent lenses: calibration, security policy, LLM reproducibility) locked the constants, replacing error-prone post-hoc caps with the lookup table and removing band dead-zones. An adversarial verification pass then confirmed and fixed three defects: | Defect | Severity | Resolution | | ------ | -------- | ---------- | | Verifier escalating severity could silently drop a medium-severity finding (concordance is non-monotonic in severity) | major | Medium-severity visibility floor: a still-believed medium finding is kept in the summary, never dropped | | `category` was the only enum not validated, so a misspelled value silently disabled the security floor | minor | Validate `category` via `assertEnum` | | Severity-disagreement can move a borderline finding from inline to summary | minor | Documented as intended: the finding stays visible, and confidence legitimately reflects assessor agreement (a different axis from severity) | TS-to-prompt numeric consistency (all 18 table cells, bands, threshold, posting precedence, schema) was verified clean. ## Compatibility with #15 #15 ("always post a review comment even with zero findings") and this change touch distinct regions of `pr-review.yaml` and are complementary: an empty inline set yields a `APPROVE` assessment label while the summary and audit sections still go in the review body, and the review is still posted. ## Validation | Check | Result | | ----- | ------ | | `pnpm build` | pass | | `pnpm test` (554 tests, 84 new) | pass | | `tsc --noEmit` | pass | | `biome ci` | pass | | `actionlint` | pass | ## Note on runtime placement The model is applied by the orchestrator prompt (mirroring the tested TS module) rather than invoked as a `dist` bundle at agent runtime, because `dist/` is gitignored and not present in the agent's working tree. If invoking the compiled scorer at runtime is preferred, that is a larger pipeline change (staging the bundle plus a permission) and can be done separately.

docker-agent assigned derekmisler Jun 24, 2026

docker-agent marked this pull request as ready for review June 24, 2026 15:39

docker-agent commented Jun 24, 2026

View reviewed changes

Sayt-0 approved these changes Jun 24, 2026

View reviewed changes

Sayt-0 merged commit 11cb23d into docker:main Jun 24, 2026
7 checks passed

Sayt-0 mentioned this pull request Jun 24, 2026

feat(review-pr): add per-finding confidence scoring model #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: always post a review comment even with zero findings#15

fix: always post a review comment even with zero findings#15
Sayt-0 merged 2 commits into
docker:mainfrom
docker-agent:fix/review-always-post-comment

docker-agent commented Jun 24, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent Jun 24, 2026

Uh oh!

docker-agent Jun 24, 2026

Uh oh!

docker-agent Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

docker-agent commented Jun 24, 2026

Problem

Changes

review-pr/agents/pr-review.yaml (primary fix)

review-pr/action.yml (defense-in-depth)

Testing

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`review-pr/agents/pr-review.yaml` (primary fix)

`review-pr/action.yml` (defense-in-depth)