From 9afd5580d20ef4dca8d84b3668ffeacf91e3df88 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Wed, 27 May 2026 22:41:48 +0200 Subject: [PATCH 1/4] fix(pr-management-triage): match Copilot bots without requiring `[bot]` suffix MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The `copilot_review_stale` precondition (row 2 of the decision table) listed `copilot-pull-request-reviewer[bot]`, `copilot[bot]`, etc. as the matchable logins. GitHub's GraphQL `Actor.login` field returns some of these without the `[bot]` suffix — most visibly for `copilot-pull-request-reviewer` itself, which appears as `copilot-pull-request-reviewer` (no suffix) on the PR's `author.login` even though the underlying account is the same bot. Strict `[bot]`-suffix matching excludes those threads from row 2. Real-world impact seen during triage on `apache/airflow`: a PR with two stale (55-day-old) Copilot review threads classified as `deterministic_flag → comment` (only static-check failure surfaced) instead of `stale_copilot_review → draft`. The maintainer had to manually re-route the action. Relax the matcher to "case-insensitive substring `copilot` in login; `[bot]` suffix optional". The 7-day age threshold and the no-author-reply qualifier are unchanged — only the login-string match is widened to align with what GraphQL actually returns. --- .../pr-management-triage/classify-and-act.md | 22 ++++++++++++++----- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/.claude/skills/pr-management-triage/classify-and-act.md b/.claude/skills/pr-management-triage/classify-and-act.md index 7d5df4c6..18854f8e 100644 --- a/.claude/skills/pr-management-triage/classify-and-act.md +++ b/.claude/skills/pr-management-triage/classify-and-act.md @@ -379,12 +379,22 @@ with the reply visible. ### `copilot_review_stale` The PR has at least one unresolved review thread whose first -comment author matches a Copilot bot login (case-insensitive: -`copilot-pull-request-reviewer[bot]`, `copilot[bot]`, -`github-copilot[bot]`, or any login matching `copilot*[bot]` / -`github-copilot*[bot]`) AND that comment's `createdAt` is ≥ 7 -days ago AND no author reply in the same thread or on the PR -after that timestamp. +comment author matches a Copilot bot login (case-insensitive +substring match on the login). The skill matches any of: +`copilot-pull-request-reviewer`, `copilot`, `github-copilot`, +or any login containing the substring `copilot` followed +optionally by `[bot]`. The `[bot]` suffix is matched when +present but NOT required — GitHub's GraphQL `Actor.login` +returns the bot username without the `[bot]` suffix for some +Copilot integrations (e.g. `copilot-pull-request-reviewer`), +even though the same bot appears as `copilot-pull-request-reviewer[bot]` +on the REST API. Requiring `[bot]` excludes real Copilot +threads from this rule and lets them age past the 7-day +threshold without triggering. The threshold itself is unchanged. + +The thread must also satisfy: that comment's `createdAt` is +≥ 7 days ago AND no author reply in the same thread or on the +PR after that timestamp. ### `static_check` From 91b5eadf90e0f2371b54a991ef16544f0992f667 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Wed, 27 May 2026 22:42:19 +0200 Subject: [PATCH 2/4] fix(pr-management-triage): route mixed static-check failures to `comment`, not `rerun` Insert decision row 12b between rows 12 and 13. The original ordering left a hole: row 12 fires only when *every* failed check is a static check; if even one non-static failure was mixed in, row 12 missed and row 13 (`failed_count <= 2 -> rerun`) caught the case and proposed a rerun. Rerunning is a waste when at least one of the failures is a static check that needs a code fix to pass. Real-world impact seen during triage on `apache/airflow`: a PR with `CI image checks / Static checks: FAILURE` plus `provider distributions tests / Compat 3.0.6: FAILURE` (one static + one possibly-flaky compat test, total = 2 failures) classified as `deterministic_flag -> rerun`. The maintainer overrode to `comment` because the static-check failure will deterministically re-fail without a code change. Row 12b is intentionally placed *after* row 12 so the all-static case still takes the more specific row (same action, different reason template clarifies which case the maintainer is seeing). --- .claude/skills/pr-management-triage/classify-and-act.md | 1 + 1 file changed, 1 insertion(+) diff --git a/.claude/skills/pr-management-triage/classify-and-act.md b/.claude/skills/pr-management-triage/classify-and-act.md index 18854f8e..d587ce4c 100644 --- a/.claude/skills/pr-management-triage/classify-and-act.md +++ b/.claude/skills/pr-management-triage/classify-and-act.md @@ -85,6 +85,7 @@ Action verbs are defined in [`actions.md`](actions.md). | 10 | [`ci_failures_only`](#ci_failures_only) AND every failure ∈ `recent_main_failures` | `deterministic_flag` | `rerun` | All N CI failures also appear in recent main-branch PRs — likely systemic, suggest rerun | | 11 | [`ci_failures_only`](#ci_failures_only) AND any failure ∈ `recent_main_failures` | `deterministic_flag` | `rerun` | K/N CI failures match recent main-branch PRs — likely systemic | | 12 | [`ci_failures_only`](#ci_failures_only) AND every failed check is a [static check](#static_check) | `deterministic_flag` | `comment` | Only static-check failures — needs a code fix, not a rerun | +| 12b | [`ci_failures_only`](#ci_failures_only) AND **any** failed check is a [static check](#static_check) (and not all — row 12 captured that case) | `deterministic_flag` | `comment` | Mixed failures including a static-check failure — code fix needed; a rerun would re-fail on the static check | | 13 | [`ci_failures_only`](#ci_failures_only) AND `failed_count <= 2` AND `commits_behind <= 50` | `deterministic_flag` | `rerun` | N CI failure(s) on otherwise clean PR — likely flaky, suggest rerun | | 14a | [`author_confirmation_received`](#author_confirmation_received) | `author_confirmed_ready` | `mark-ready` | Author confirmed PR is ready for maintainer review — apply label | | 14b | [`pending_author_confirmation`](#pending_author_confirmation) | `awaiting_author_confirmation` | `skip` | Awaiting author confirmation requested M days ago | From 878ab1f09db02011597aef2c9682c07e2a759de9 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Wed, 27 May 2026 22:42:51 +0200 Subject: [PATCH 3/4] fix(pr-management-triage): widen F5b to scan the last 5 collaborator comments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit F5b previously inspected only the *most recent* collaborator comment for an unanswered `@`-mention to another maintainer. That misses the common case where: 1. Maintainer A pings maintainer B for an opinion ("@B what do you think of this approach?") in comment #N 2. Maintainer C (often the viewer) posts a quality-criteria comment in comment #N+1 because of an orthogonal issue (CI red, merge conflicts, etc.) The A->B ping is still outstanding — B has not replied — but F5b only looks at comment #N+1, doesn't see the older mention, and the PR slides into the decision table. The maintainer ends up auto- proposing actions on a PR where two maintainers are mid-conversation. Real-world impact seen during triage on `apache/airflow`: a PR where one maintainer asked another for an opinion 2 days ago, then the viewer drafted the PR yesterday for quality issues. The classifier proposed `request-author-confirmation` today — talking over the older unanswered ping. Widen F5b to walk the last 5 collaborator comments, newest-first, and fire on the first one matching the unanswered-`@`-mention condition. The 5-comment window matches the bounded depth the batch query already fetches; no additional GraphQL cost. --- .claude/skills/pr-management-triage/classify-and-act.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/skills/pr-management-triage/classify-and-act.md b/.claude/skills/pr-management-triage/classify-and-act.md index d587ce4c..1f0d5825 100644 --- a/.claude/skills/pr-management-triage/classify-and-act.md +++ b/.claude/skills/pr-management-triage/classify-and-act.md @@ -46,7 +46,7 @@ filter is skipped silently from the main triage flow. | F3 | Draft and not stale | `isDraft == true` and any activity within the last 14 days. Stale-sweep classifications in [`stale-sweeps.md`](stale-sweeps.md) may still pull the PR back in. | | F4 | Already marked ready, no regression | `labels` contains `ready for maintainer review` AND CI green AND `mergeable != CONFLICTING` AND no unresolved **collaborator** threads (same collaborator-author qualifier as rows 19/20 / [`unresolved_threads_only`](#unresolved_threads_only) — contributor-author threads alone don't count as a regression for an already-ready PR). **Regression bypasses this filter** — any of: CI red, new conflict, or a new unresolved collaborator thread whose triggering event (failing check `startedAt`, conflict detection, thread `createdAt`) is *after* the label-add timestamp. The typical case is a contributor pushing a rebase or fixup commit to a ready-for-review PR that re-introduces deterministic failures, OR a maintainer leaving a new review thread post-label-add. PRs bypassing F4 fall through to the decision table normally; the cross-cutting [`strip-ready-on-downgrade` hard rule](#hard-rules-cross-cutting-the-table) ensures the label comes off if a `deterministic_flag` row fires. | | F5a | Recent collaborator comment (author cooldown) | Most recent comment from the **union of** general-issue comments (`comments(last:10)`) and **review-thread comments** (`reviewThreads.nodes.comments`) is by a `COLLABORATOR`/`MEMBER`/`OWNER`, `createdAt < 72h` ago, AND posted after `commits(last:1).committedDate`. The review-thread leg is essential — a maintainer asking a clarifying question in-thread is just as much an active conversation as a top-level comment, and treating only the latter routes the PR to `ping` / `request-author-confirmation` while the maintainer is still mid-sentence. | -| F5b | Maintainer-to-maintainer ping unanswered | Most recent collaborator comment from the **union** described in F5a above `@`-mentions one or more logins other than the PR author AND none of those mentioned logins have posted on the PR (general comments **or** review threads) or in `latestReviews` after that comment. Team mentions (e.g. `@-committers`) are conservatively treated as F5b matches. | +| F5b | Maintainer-to-maintainer ping unanswered | **Walk the most recent 5 collaborator comments** from the **union** described in F5a above (newest-first). For each, if it `@`-mentions one or more logins other than the PR author AND none of those mentioned logins have posted on the PR (general comments **or** review threads) or in `latestReviews` after that comment, F5b fires. The 5-comment window catches the case where maintainer A pings maintainer B, then a different maintainer C (often the viewer) posts a quality-criteria comment on the PR — the more-recent quality comment is not itself a ping, but the older A→B ping is still unanswered and the PR is still mid-conversation between maintainers. Without the deep scan F5b only inspects the most recent collaborator comment and the older ping slides past unfiltered. The 5-comment cap is the same one driving `comments(last:10)` / `reviewThreads.comments(first:5)` — bounded scan, no extra GraphQL. Team mentions (e.g. `@-committers`) are conservatively treated as F5b matches. | | F6 | Maintainer co-drafted | `isDraft == true` AND any of: (a) `latestReviews` has a node with `authorAssociation ∈ {OWNER, MEMBER, COLLABORATOR}` AND `author.login ≠ ` AND `state ∈ {COMMENTED, CHANGES_REQUESTED, APPROVED}` AND `submittedAt > commits(last:1).committedDate` AND review body is non-empty (avoids the "review with only inline thread comments and an empty top-level body" false positive — those are already counted by row 14/15 unresolved-thread logic); (b) `comments(last:10)` has a node with `authorAssociation ∈ {OWNER, MEMBER, COLLABORATOR}` AND `author.login ≠ ` AND `length(bodyText) ≥ 80` AND `createdAt > commits(last:1).committedDate`. Trivial signals (emoji-only, `+1`, `lgtm`, pure `@team` pings without prose) do not count — those are already covered by F5a/F5b or are below the substantive-engagement threshold. **Stale-sweep classifications in [`stale-sweeps.md`](stale-sweeps.md) may still pull the PR back in** — F6 only suppresses duplicate-proposal rows from the decision table, not eventual-resurfacing on a different action. | F5a, F5b, and F6 override every signal in the decision table — From b284aff73d3b7ab052656978d424a33c97c6f10f Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Wed, 27 May 2026 22:44:00 +0200 Subject: [PATCH 4/4] feat(pr-management-triage): add row 0 to skip stale-abandoned first-timer PRs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Decision row 1 fires `approve-workflow` for any first-time-contributor PR with no real CI run, regardless of how long ago the maintainer already gave the contributor feedback or how long the contributor has been silent. The result: a PR that was triaged 8+ weeks ago, with no new commits since, keeps surfacing every triage sweep and the maintainer either has to re-approve CI on stalled code (which then re-fails on the same unaddressed quality issues) or override to `skip` manually. Add row 0 + the `first_time_stale_abandoned` precondition. The row fires *before* row 1 and routes to `skip` when all of these hold: - author is FIRST_TIME_CONTRIBUTOR / FIRST_TIMER, - a prior viewer triage marker exists, - no commits have been pushed since the marker, and - the PR's head commit is >= 30 days old. The PR is then left to the stale-sweep flow (sweep 1b — "untriaged draft >= 14 days" — or its successor), which is the appropriate closure pathway. The 30-day threshold is deliberately longer than the grace windows (24h / 96h) and the F5a cooldown (72h). It captures *abandonment*, not slow response — a contributor who replies within a week and then stalls for another week is not abandoned, just busy. Real-world impact seen during triage on `apache/airflow`: a first-time-contributor PR triaged on 2026-04-02 with no push since classified as `approve-workflow` 8 weeks later. The maintainer skipped it manually; row 0 retires it automatically. --- .../pr-management-triage/classify-and-act.md | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/.claude/skills/pr-management-triage/classify-and-act.md b/.claude/skills/pr-management-triage/classify-and-act.md index 1f0d5825..2c76e631 100644 --- a/.claude/skills/pr-management-triage/classify-and-act.md +++ b/.claude/skills/pr-management-triage/classify-and-act.md @@ -72,6 +72,7 @@ Action verbs are defined in [`actions.md`](actions.md). | # | Precondition (all must hold) | Classification | Action | Reason template | |----|-----------------------------------------------------------------------------------------------|---------------------------|------------------------|-----------------| +| 0 | [`first_time_stale_abandoned`](#first_time_stale_abandoned) — first-time contributor PR with a prior viewer triage marker and no commits since the marker, ≥ 30 days old | `first_time_stale_abandoned` | `skip` | First-time contributor's PR was triaged ≥ 30d ago, no push since — let the stale-sweep retire it rather than re-approving CI | | 1 | `head_sha` appears in the per-page `action_required` REST index, OR ([`first_time_no_real_ci`](#first_time_no_real_ci)) | `pending_workflow_approval` | `approve-workflow` | First-time contributor — review the diff and approve CI, or flag suspicious | | 2 | [`copilot_review_stale`](#copilot_review_stale) | `stale_copilot_review` | `draft` (include the specific Copilot thread URL in the violation body) | Unaddressed Copilot review ≥ 7 days old — convert to draft | | 3 | Viewer comment containing the triage marker exists, posted after last commit, age < 7 days, sub-state `waiting` | `already_triaged` | `skip` | Already triaged M days ago — still waiting on author | @@ -430,6 +431,40 @@ Count of PRs by the same author seen on the **current page** that already matched a `deterministic_flag` row. Per-page only — does not persist across sessions. +### `first_time_stale_abandoned` + +All of: + +- `authorAssociation == FIRST_TIME_CONTRIBUTOR` or + `FIRST_TIMER`. +- A viewer triage marker exists in `comments(last:10)` (i.e. + a comment by the viewer containing the literal string + `Pull Request quality criteria`). +- The PR's `commits(last:1).committedDate` is **at or before** + the triage marker's `createdAt` — author has not pushed + since the maintainer's feedback. +- ` - committedDate >= 30 days`. + +Order matters: this precondition is evaluated by **row 0**, +which fires *before* row 1 (`pending_workflow_approval`). +Without it, an abandoned first-time PR that has sat for months +since being triaged keeps surfacing in the +`approve-workflow` group every sweep, where the maintainer +either re-approves CI on dead code or has to skip it manually. + +A row-0 hit routes to `skip` — the [stale-sweep +flow](stale-sweeps.md) is the right place to retire the PR +(via sweep 1b's "untriaged draft >= 14 days" trigger, since +the row-0 marker test does *not* set the draft state). The +classifier's job here is just to keep the PR out of the +workflow-approval group. + +The 30-day threshold is intentionally longer than the +[grace periods](#grace-periods) (24h / 96h) and the F5a +cooldown (72h). It captures *abandonment*, not slow response — +a contributor who replies within a week and then stalls for +another week is not abandoned, just busy. + ### `first_time_no_real_ci` Author is `FIRST_TIME_CONTRIBUTOR` or `FIRST_TIMER` AND one of: