Skip to content

fix: always post a review comment even with zero findings#15

Merged
Sayt-0 merged 2 commits into
docker:mainfrom
docker-agent:fix/review-always-post-comment
Jun 24, 2026
Merged

fix: always post a review comment even with zero findings#15
Sayt-0 merged 2 commits into
docker:mainfrom
docker-agent:fix/review-always-post-comment

Conversation

@docker-agent

Copy link
Copy Markdown
Contributor

Problem

The review-pr agent exits 0 without posting any GitHub PR comment when review_complete=true and zero findings. The review_complete=false + zero findings path explicitly says 'post a COMMENT review', but the review_complete=true path jumps to step 8 and then step 9 ('Build inline comments') — with nothing to post, the agent exits silently.

Changes

review-pr/agents/pr-review.yaml (primary fix)

  • Step 5 (review_complete=true + zero findings path): now explicitly mandates posting a 🟢 APPROVE COMMENT review with an empty comments array via gh api before exiting.
  • Step 8 rule rename: cagent to docker-agent-action across entire repo #4: clarified that the COMMENT event applies even when the findings list is empty — the assessment-label body must still be posted.
  • "Delivering the Review" section: added a concrete zero-findings bash example with an empty comments array, giving the agent a literal template to follow.

review-pr/action.yml (defense-in-depth)

  • Post-clean-summary step (EXIT_CODE=0 branch): added a check — if the verbose log exists but contains no pullrequestreview-[0-9]+ reference, post a fallback issue comment (🟢 **No issues found** — LGTM!) to guarantee PR visibility.

Testing

  • All 470 existing unit tests pass (pnpm test).
  • Shell tests (tests/test-job-summary.sh, tests/test-output-extraction.sh) pass.
  • No changes to TypeScript source or dist/.

The pr-review agent would exit 0 without posting any GitHub comment when
review_complete=true and zero findings were found — only the
review_complete=false + zero findings path explicitly posted.

Changes:
- pr-review.yaml step 5: 'review_complete=true AND zero findings' path now
  mandates posting a 🟢 APPROVE COMMENT review via gh api before exiting.
- pr-review.yaml step 8 rule docker#4: clarify that COMMENT event applies even
  when the findings list is empty.
- pr-review.yaml 'Delivering the Review': add a concrete zero-findings
  bash example with an empty comments array so the agent has a literal
  template to follow.
- review-pr/action.yml post-summary step: defense-in-depth fallback —
  when EXIT_CODE=0 but the verbose log contains no pullrequestreview ID,
  post a fallback '🟢 No issues found — LGTM!' issue comment so the PR
  always gets feedback.

Fixes the root cause (agent instruction) and adds an action-level safety
net in case the agent still misses the posting step.
@docker-agent docker-agent marked this pull request as ready for review June 24, 2026 15:39

@docker-agent docker-agent left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

direction is right (the silent-exit bug was real), but the fallback comment in action.yml needs a dedup guard, otherwise every re-run stacks another LGTM on the PR. also left a couple of non-blocking notes on the agent YAML (contradictory --arg body example and ambiguous step ordering) as things to keep in mind if the zero-findings path still misbehaves after this.

Comment thread review-pr/action.yml Outdated
# This guards against the agent exiting 0 without calling gh api (e.g., zero-findings early exit).
if [ -n "$VERBOSE_LOG_FILE" ] && [ -f "$VERBOSE_LOG_FILE" ] && ! grep -qE 'pullrequestreview-[0-9]+' "$VERBOSE_LOG_FILE" 2>/dev/null; then
echo "::warning::Agent exited 0 but no review was posted — posting fallback LGTM comment"
gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: the fallback gh api call on line 877 posts a brand-new issues-API comment every time the workflow re-runs and the agent exits 0 without a pullrequestreview-* log entry. no dedup check at all: three re-runs = three identical 🟢 No issues found, LGTM! comments piling up.

easiest fix is embedding a sentinel marker in the body and checking for it before posting:

MARKER="<!-- docker-agent-lgtm-fallback -->"
EXISTING=$(gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \
  --jq "[.[] | select(.body | contains(\"docker-agent-lgtm-fallback\"))] | length")
if [ "$EXISTING" -eq 0 ]; then
  printf '%s\n🟢 **No issues found**, LGTM! [View logs](%s).' "$MARKER" "$RUN_URL" \
    | gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" --input -
fi

(alternatively, delete-and-recreate, or just switch to gh pr review --approve which GitHub deduplicates naturally.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the follow-up commit (780c0a1). I used a startswith prefix match on 🟢 **No issues found** rather than an embedded HTML marker — same dedup effect, slightly simpler body. If the sentinel-marker approach is preferred for robustness (e.g. if the body text ever changes), happy to switch, but the current check covers the re-run duplicate scenario.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the startswith check works fine for this. you're right that it covers the re-run scenario, and the body is unlikely to change independently of the guard. html marker is slightly more robust against accidental body edits but not worth the extra noise here. good fix.

Before posting the defense-in-depth '🟢 No issues found' comment, query
the PR's existing comments and skip posting if one matching the prefix
already exists. This prevents duplicate identical comments when the
workflow is re-run multiple times on the same PR.
@Sayt-0 Sayt-0 merged commit 11cb23d into docker:main Jun 24, 2026
7 checks passed
Sayt-0 added a commit that referenced this pull request Jun 25, 2026
## Summary

Adds a precise, multi-criteria confidence scoring model for the PR
reviewer. Each verified finding gets a 0 to 100 confidence score, a
band, and a posting disposition, computed deterministically from several
signals rather than a subjective guess.

The model lives in two synchronized surfaces:

| Surface | Role |
| ------- | ---- |
| `src/score-confidence/` (pure module + CLI + unit tests) | Single
source of truth for the model |
| `review-pr/agents/pr-review.yaml` "Confidence Scoring" section |
Strict lookup table the orchestrator applies inline (the gitignored
`dist/` is not available at agent runtime) |

## Criteria

| Criterion | Source | Role |
| --------- | ------ | ---- |
| Verdict (CONFIRMED / LIKELY / DISMISSED) | verifier | Primary
agreement signal |
| `evidence_strength` (direct / circumstantial / speculative) | verifier
(new field) | Pattern / snippet match strength |
| `context_completeness` (full / partial / none) | verifier (new field)
| Whether the verifier saw the code it needed |
| Severity concordance | derived | Drafter vs verifier severity rank
distance |
| Scope (`in_diff` and `in_changed_code`) | derived | Hard gate |
| Category / severity | verifier | Posting policy only (security floor,
high-severity always-post) |

## Scale and threshold

- Score 0 to 100 from a precomputed `verdict x evidence x context` table
(`CONFIRMED 70` / `LIKELY 40` base; evidence `+18 / +8 / -4`; context
`+12 / +4 / -10`), plus concordance (`+5 / 0 / -8`), clamped.
- Bands: `strong >= 80`, `moderate 55..79`, `weak 30..54`, `negligible <
30`.
- Default posting threshold is 55 (the moderate-band floor, so the band
and the post cutoff cannot drift apart).
- Only CONFIRMED can reach `strong` (LIKELY tops out at 75), a property
the unit tests pin.

## Posting policy (first match wins; the 5-comment cap is applied last)

| Rule | Behavior |
| ---- | -------- |
| Out-of-scope / DISMISSED non-security | not surfaced |
| Security floor (CONFIRMED/LIKELY) | always inline, never
auto-suppressed, cap-exempt |
| High-severity (CONFIRMED/LIKELY) | always inline, cap-exempt |
| Band strong or moderate | inline (subject to the cap) |
| Band weak, non-forced | lower-confidence summary list (not silently
dropped) |
| Negligible band, verifier severity medium | summary (visibility floor)
|
| DISMISSED security | dismissed-security audit line (human-reviewable)
|
| Non-forced inline overflow past 5 | demoted to the summary list |

## Maps to the requested design

| Requested | Implemented |
| --------- | ----------- |
| Verifier agreement score | verdict + drafter/verifier severity
concordance |
| Pattern match strength | `evidence_strength` field |
| Context completeness | `context_completeness` field |
| Scale (high/medium/low or 0-100) | 0 to 100 plus four named bands |
| Default threshold | 55 (moderate floor) |
| Precise, multiple criteria | deterministic lookup table; six criteria;
84 unit tests pin every value |

## How it was hardened

A design review (three independent lenses: calibration, security policy,
LLM reproducibility) locked the constants, replacing error-prone
post-hoc caps with the lookup table and removing band dead-zones. An
adversarial verification pass then confirmed and fixed three defects:

| Defect | Severity | Resolution |
| ------ | -------- | ---------- |
| Verifier escalating severity could silently drop a medium-severity
finding (concordance is non-monotonic in severity) | major |
Medium-severity visibility floor: a still-believed medium finding is
kept in the summary, never dropped |
| `category` was the only enum not validated, so a misspelled value
silently disabled the security floor | minor | Validate `category` via
`assertEnum` |
| Severity-disagreement can move a borderline finding from inline to
summary | minor | Documented as intended: the finding stays visible, and
confidence legitimately reflects assessor agreement (a different axis
from severity) |

TS-to-prompt numeric consistency (all 18 table cells, bands, threshold,
posting precedence, schema) was verified clean.

## Compatibility with #15

#15 ("always post a review comment even with zero findings") and this
change touch distinct regions of `pr-review.yaml` and are complementary:
an empty inline set yields a `APPROVE` assessment label while the
summary and audit sections still go in the review body, and the review
is still posted.

## Validation

| Check | Result |
| ----- | ------ |
| `pnpm build` | pass |
| `pnpm test` (554 tests, 84 new) | pass |
| `tsc --noEmit` | pass |
| `biome ci` | pass |
| `actionlint` | pass |

## Note on runtime placement

The model is applied by the orchestrator prompt (mirroring the tested TS
module) rather than invoked as a `dist` bundle at agent runtime, because
`dist/` is gitignored and not present in the agent's working tree. If
invoking the compiled scorer at runtime is preferred, that is a larger
pipeline change (staging the bundle plus a permission) and can be done
separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants