fix(pr-management-triage): four classifier heuristic fixes from a live session#344
Merged
Merged
Conversation
…]` suffix The `copilot_review_stale` precondition (row 2 of the decision table) listed `copilot-pull-request-reviewer[bot]`, `copilot[bot]`, etc. as the matchable logins. GitHub's GraphQL `Actor.login` field returns some of these without the `[bot]` suffix — most visibly for `copilot-pull-request-reviewer` itself, which appears as `copilot-pull-request-reviewer` (no suffix) on the PR's `author.login` even though the underlying account is the same bot. Strict `[bot]`-suffix matching excludes those threads from row 2. Real-world impact seen during triage on `apache/airflow`: a PR with two stale (55-day-old) Copilot review threads classified as `deterministic_flag → comment` (only static-check failure surfaced) instead of `stale_copilot_review → draft`. The maintainer had to manually re-route the action. Relax the matcher to "case-insensitive substring `copilot` in login; `[bot]` suffix optional". The 7-day age threshold and the no-author-reply qualifier are unchanged — only the login-string match is widened to align with what GraphQL actually returns.
…ent`, not `rerun` Insert decision row 12b between rows 12 and 13. The original ordering left a hole: row 12 fires only when *every* failed check is a static check; if even one non-static failure was mixed in, row 12 missed and row 13 (`failed_count <= 2 -> rerun`) caught the case and proposed a rerun. Rerunning is a waste when at least one of the failures is a static check that needs a code fix to pass. Real-world impact seen during triage on `apache/airflow`: a PR with `CI image checks / Static checks: FAILURE` plus `provider distributions tests / Compat 3.0.6: FAILURE` (one static + one possibly-flaky compat test, total = 2 failures) classified as `deterministic_flag -> rerun`. The maintainer overrode to `comment` because the static-check failure will deterministically re-fail without a code change. Row 12b is intentionally placed *after* row 12 so the all-static case still takes the more specific row (same action, different reason template clarifies which case the maintainer is seeing).
…comments
F5b previously inspected only the *most recent* collaborator comment
for an unanswered `@`-mention to another maintainer. That misses the
common case where:
1. Maintainer A pings maintainer B for an opinion ("@b what do
you think of this approach?") in comment #N
2. Maintainer C (often the viewer) posts a quality-criteria
comment in comment #N+1 because of an orthogonal issue (CI
red, merge conflicts, etc.)
The A->B ping is still outstanding — B has not replied — but F5b
only looks at comment #N+1, doesn't see the older mention, and the
PR slides into the decision table. The maintainer ends up auto-
proposing actions on a PR where two maintainers are mid-conversation.
Real-world impact seen during triage on `apache/airflow`: a PR
where one maintainer asked another for an opinion 2 days ago, then
the viewer drafted the PR yesterday for quality issues. The
classifier proposed `request-author-confirmation` today — talking
over the older unanswered ping.
Widen F5b to walk the last 5 collaborator comments, newest-first,
and fire on the first one matching the unanswered-`@`-mention
condition. The 5-comment window matches the bounded depth the
batch query already fetches; no additional GraphQL cost.
…imer PRs Decision row 1 fires `approve-workflow` for any first-time-contributor PR with no real CI run, regardless of how long ago the maintainer already gave the contributor feedback or how long the contributor has been silent. The result: a PR that was triaged 8+ weeks ago, with no new commits since, keeps surfacing every triage sweep and the maintainer either has to re-approve CI on stalled code (which then re-fails on the same unaddressed quality issues) or override to `skip` manually. Add row 0 + the `first_time_stale_abandoned` precondition. The row fires *before* row 1 and routes to `skip` when all of these hold: - author is FIRST_TIME_CONTRIBUTOR / FIRST_TIMER, - a prior viewer triage marker exists, - no commits have been pushed since the marker, and - the PR's head commit is >= 30 days old. The PR is then left to the stale-sweep flow (sweep 1b — "untriaged draft >= 14 days" — or its successor), which is the appropriate closure pathway. The 30-day threshold is deliberately longer than the grace windows (24h / 96h) and the F5a cooldown (72h). It captures *abandonment*, not slow response — a contributor who replies within a week and then stalls for another week is not abandoned, just busy. Real-world impact seen during triage on `apache/airflow`: a first-time-contributor PR triaged on 2026-04-02 with no push since classified as `approve-workflow` 8 weeks later. The maintainer skipped it manually; row 0 retires it automatically.
1 task
potiuk
added a commit
to apache/airflow
that referenced
this pull request
May 28, 2026
* Update apache-steward snapshot to 5c211a4 Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22 upstream commits). The only committed change in this PR is a 1-line frontmatter addition (capability: capability:setup) to .github/skills/setup-steward/SKILL.md, propagated from the new framework version via /setup-steward upgrade. Everything else lives in the gitignored .apache-steward/ snapshot. Highlights from upstream (apache/airflow-steward): - pr-management-triage: session-history gist persistence Step 6b (apache/magpie#343), four classifier heuristic fixes (apache/magpie#344), fetch-all-upfront pattern (apache/magpie#346) - security-issue-triage: fetch-all-upfront analogue (apache/magpie#347) - Framework labels + capability taxonomy (apache/magpie#340) — the source of the frontmatter line in this PR - New skill pairing-self-review and tool spec-status-index - claude-code pin 2.1.141 -> 2.1.150 /setup-steward upgrade ran cleanly locally: snapshot refreshed, symlinks resolve, post-checkout hook in sync, sandbox-add-project-root reconciled across 3 worktrees. .apache-steward.local.lock updated to fetched_commit 5c211a4. All .apache-steward-overrides/ files unchanged. * Gitignore .apache-steward.session-state.json Adds the per-machine session-state file to .gitignore. The file is written by steward skills that maintain adopter-local persistence anchors — currently pr-management-triage Step 6b's session-history gist URL (apache/magpie#343), but the structure is deliberately shared so other skills can add their own keys later. The file is per-user, per-machine state; it should never be committed even when a contributor stages everything with `git add -A`.
choo121600
pushed a commit
to apache/airflow
that referenced
this pull request
May 29, 2026
* Update apache-steward snapshot to 5c211a4 Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22 upstream commits). The only committed change in this PR is a 1-line frontmatter addition (capability: capability:setup) to .github/skills/setup-steward/SKILL.md, propagated from the new framework version via /setup-steward upgrade. Everything else lives in the gitignored .apache-steward/ snapshot. Highlights from upstream (apache/airflow-steward): - pr-management-triage: session-history gist persistence Step 6b (apache/magpie#343), four classifier heuristic fixes (apache/magpie#344), fetch-all-upfront pattern (apache/magpie#346) - security-issue-triage: fetch-all-upfront analogue (apache/magpie#347) - Framework labels + capability taxonomy (apache/magpie#340) — the source of the frontmatter line in this PR - New skill pairing-self-review and tool spec-status-index - claude-code pin 2.1.141 -> 2.1.150 /setup-steward upgrade ran cleanly locally: snapshot refreshed, symlinks resolve, post-checkout hook in sync, sandbox-add-project-root reconciled across 3 worktrees. .apache-steward.local.lock updated to fetched_commit 5c211a4. All .apache-steward-overrides/ files unchanged. * Gitignore .apache-steward.session-state.json Adds the per-machine session-state file to .gitignore. The file is written by steward skills that maintain adopter-local persistence anchors — currently pr-management-triage Step 6b's session-history gist URL (apache/magpie#343), but the structure is deliberately shared so other skills can add their own keys later. The file is per-user, per-machine state; it should never be committed even when a contributor stages everything with `git add -A`. (cherry picked from commit c521078) Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001
pushed a commit
to apache/airflow
that referenced
this pull request
May 29, 2026
* Update apache-steward snapshot to 5c211a4 Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22 upstream commits). The only committed change in this PR is a 1-line frontmatter addition (capability: capability:setup) to .github/skills/setup-steward/SKILL.md, propagated from the new framework version via /setup-steward upgrade. Everything else lives in the gitignored .apache-steward/ snapshot. Highlights from upstream (apache/airflow-steward): - pr-management-triage: session-history gist persistence Step 6b (apache/magpie#343), four classifier heuristic fixes (apache/magpie#344), fetch-all-upfront pattern (apache/magpie#346) - security-issue-triage: fetch-all-upfront analogue (apache/magpie#347) - Framework labels + capability taxonomy (apache/magpie#340) — the source of the frontmatter line in this PR - New skill pairing-self-review and tool spec-status-index - claude-code pin 2.1.141 -> 2.1.150 /setup-steward upgrade ran cleanly locally: snapshot refreshed, symlinks resolve, post-checkout hook in sync, sandbox-add-project-root reconciled across 3 worktrees. .apache-steward.local.lock updated to fetched_commit 5c211a4. All .apache-steward-overrides/ files unchanged. * Gitignore .apache-steward.session-state.json Adds the per-machine session-state file to .gitignore. The file is written by steward skills that maintain adopter-local persistence anchors — currently pr-management-triage Step 6b's session-history gist URL (apache/magpie#343), but the structure is deliberately shared so other skills can add their own keys later. The file is per-user, per-machine state; it should never be committed even when a contributor stages everything with `git add -A`. (cherry picked from commit c521078) Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four small, independent classifier fixes calibrated from a real triage session on
apache/airflow(2026-05-27). Each was a case where the rule fired technically correctly but the maintainer manually overrode the proposed action because the rule didn't match real-world signal. Each commit is independently revertible.Changes
9afd558[bot]suffix in loginActor.loginreturnscopilot-pull-request-reviewerwithout the[bot]suffix; strict suffix matching excluded real Copilot threads from row 2 (stale_copilot_review). Saw a PR with 55-day-old unresolved Copilot threads classify ascommentinstead ofdraft.91b5eadcomment, notrerunreruneven when one of two failures was a static check. Rerunning a static-check failure is wasted — needs a code fix. Saw a PR withStatic checks: FAILURE+Compat 3.0.6: FAILURE(one static + one possibly-flaky) classify asrerun.878ab1f@-mentions. Missed the case where maintainer A pings B in comment N, viewer posts a quality comment in N+1 (orthogonal), and B's reply is still pending. Saw on a PR where vincbeck pinged bbovenzi 2d ago, viewer drafted yesterday for quality issues, classifier proposedrequest-author-confirmationtoday — talking over the older ping.b284afffirst_time_stale_abandoned→skipapprove-workflow) fired for first-time-contributor PRs with no push for 2+ months since prior triage. Re-approving CI on stalled code re-fails on the same unaddressed quality issues. New row 0 (FIRST_TIMER + viewer triage marker + no commits since marker + ≥30d) routes toskip; stale-sweep retires it.Origin
Came out of the same triage session on
apache/airflow(2026-05-27) that produced PR #343 (session-history gist persistence). The session's gist captured each manual override and the four patterns above stood out as systematic classifier mis-calibrations rather than per-PR judgment calls. PR #343 builds the infrastructure to make this kind of cross-session signal repeatable.Test plan
copilot-pull-request-reviewer(no[bot]suffix) ≥ 7d old → routes tostale_copilot_reviewcomment@-mention 5 comments ago, viewer comment in between → routes to F5b skipskip, NOTapprove-workflow🤖 Generated with Claude Code (Opus 4.7)