Skip to content

feat(pr-management-triage): fetch all PRs upfront, classify in batch#346

Merged
potiuk merged 1 commit into
apache:mainfrom
potiuk:feat-pr-triage-full-queue-fetch
May 27, 2026
Merged

feat(pr-management-triage): fetch all PRs upfront, classify in batch#346
potiuk merged 1 commit into
apache:mainfrom
potiuk:feat-pr-triage-full-queue-fetch

Conversation

@potiuk

@potiuk potiuk commented May 27, 2026

Copy link
Copy Markdown
Member

Summary

  • Restructure PR triage from per-page interactive loop to fetch-all → classify-once → present-groups. Step 1 walks every page until hasNextPage=false without prompting; Step 2 classifies the full set; Step 3 forms groups that span the entire queue.
  • A mark-ready group can carry 30+ PRs across what was previously six pages — one screen, one decision.
  • Add Bash(gh api graphql *) to permissions.allow (in .claude/settings.json, tools/sandbox-lint/expected.json baseline, and the isolation-setup template at docs/setup/secure-agent-setup.md) so the read-only fetch loop bypasses the -F/-f ask rules. Mutations still hit ask.

Why

The per-page loop forced context-switching between action classes (mark-ready on page 1, draft on page 2, back to mark-ready on page 3, …) and required maintainer attention throughout the long fetch phase. Running the skill against the Apache Airflow queue (200+ open PRs) made the cost visible — maintainer attention was the bottleneck, not GraphQL budget.

Fetching everything up front lets the maintainer walk away during the long fetch, then come back to one batched decision per action class.

Out of scope

issue-triage and security-issue-triage would benefit from the same refactor; left untouched here to keep this PR's scope tight.

Test plan

  • skill-and-tool-validate exits 0 (verified locally on the touched files; pre-existing soft warnings in unrelated skills are not introduced here).
  • pytest (sandbox-lint) passes — expected.json baseline updated in lockstep with .claude/settings.json.
  • Run a triage pass on a small queue (e.g. triage label:area:scheduler) to confirm the fetch loop ends cleanly and groups present across the result set.
  • Confirm gh api graphql * calls during the fetch loop no longer prompt under the new allow rule.

Restructure the triage flow from the per-page interactive loop
to fetch-all → classify-once → present-groups.

== Why ==

The per-page model forced the maintainer to context-switch
between action classes (mark-ready on page 1, draft on page 2,
back to mark-ready on page 3) and required attention throughout
the long fetch phase — the loop paused for input after every
page. Running the skill against the Apache Airflow queue
(200+ open PRs) showed maintainer attention was the bottleneck,
not GraphQL budget.

== Flow changes ==

Step 1 walks every page serially until `hasNextPage=false`,
accumulating PRs into a single in-memory list. The fetch phase
is uninterrupted; the maintainer can step away. One progress
line per page lands so the loop is visibly advancing.

Step 2 classifies the full set in one pass. Pre-filters,
decision table, and Real-CI guard all run once.

Step 3 groups by `(classification, action)` across the entire
queue. A `mark-ready` group can carry 30+ PRs across what was
previously six pages — one screen, one decision.

Step 5 becomes "stale sweeps only" — pagination is finished by
the time Step 5 runs. Each stale sweep that needs a different
candidate set uses the same full-pagination pattern as Step 1.

Golden rule 4 rewritten from "prefetch and pre-classify while
the maintainer is reading" to "fetch all pages up front, then
classify once, then present." The prefetch + pre-classification
machinery in `fetch-and-batch.md` and `interaction-loop.md` is
removed. Lazy per-PR drill-in fetches remain (only fired when
the maintainer pulls a PR out of a group).

Session cache schema: `prefetched_pages.<n>` → `fetched_prs`
(selector, fetched_at, pages_fetched, total_prs, all_prs[],
classified[]).

== Settings ==

Add a project-level `permissions.allow` rule for
`Bash(gh api graphql *)` so the read-only fetch loop bypasses
the `gh api * -F *` / `-f *` ask rules. The pattern is more
specific than the wildcard ask, so it short-circuits. Mutations
via REST or `gh api -X POST` still hit ask. The same allow
rule lands in the isolation-setup template at
`docs/setup/secure-agent-setup.md` and the sandbox-lint
baseline at `tools/sandbox-lint/expected.json` so new adopters
get it out of the box and the baseline stays in lockstep.

== Verification ==

`skill-and-tool-validate` exits 0; pre-existing soft warnings
in unrelated skills, zero hard violations on the touched files.

Generated-by: Claude Code (Opus 4.7)
@potiuk

potiuk commented May 27, 2026

Copy link
Copy Markdown
Member Author

Nice UX improvement for triage -> instead of paging results, we fetch all PRs to memory and pre-classify them before we get triager's attention. This should all get with very little interaction - so when attention of triager is grabbed, there should already be a summary of things to do and groups of changes to act on .. This is a very nice improvement making the triage process far less "attention disrupting".

@potiuk

potiuk commented May 27, 2026

Copy link
Copy Markdown
Member Author

cc: @paulk-asert -> I am also applying it to "security-issue-triage" - possibly similar approach could be used for "issue-triage".

@potiuk potiuk merged commit 164c2e0 into apache:main May 27, 2026
16 checks passed
potiuk added a commit that referenced this pull request May 27, 2026
…gue) (#347)

Apply the same flow discipline `pr-management-triage` adopted in
PR #346 to the security tracker triage skill: fetch every
candidate up front, classify uninterrupted, then surface a
single batched confirm screen.

== What changes ==

Add Golden rule 7: Steps 1–4 run without a human checkpoint;
Step 5 is the single decision point. Explicit cross-reference
to `pr-management-triage`'s Golden rule 4.

Step 1: bump the `gh issue list` cap from `--limit 100` to
`--limit 1000` (security backlogs don't approach four-digit
needs-triage counts in practice, so one call is the full set).
A backlog that *does* exceed 1000 is the signal to escalate,
not silently page through. The list-echo becomes informational
only — the maintainer no longer has to answer a confirm prompt
before Step 2 fires. Three narrow cases still stop and ask
(empty result, CVE selector matching multiple trackers,
`--retriage` on 50+ trackers); outside those, proceed.

Step 2: framing now states "fires immediately after Step 1, no
human checkpoint in between."

Step 5: framing now states "the single human checkpoint" so
the maintainer knows Steps 6–7 will run sequentially without
re-prompting.

== Why ==

The security-issue-triage skill was already closer to the
batch pattern than pr-management-triage was (parallel
per-tracker enrichment via subagent fanout, full-list confirm
in Step 5), but it carried a redundant human checkpoint
between Step 1 and Step 2 — the "echo list and confirm before
gathering state" prompt. That checkpoint cost an attention
context-switch for a result that the Step 5 confirm screen
already covers. Removing it lets the maintainer run the skill
on a queue and walk away during the enrichment phase, same as
pr-management-triage.

== Verification ==

`skill-and-tool-validate` exits 0; pre-existing soft warnings
in unrelated rules (`gh-list-no-limit` on a `gh pr list` call
in this skill, plus three others) are not introduced here.

Generated-by: Claude Code (Opus 4.7)
potiuk added a commit to apache/airflow that referenced this pull request May 28, 2026
* Update apache-steward snapshot to 5c211a4

Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22
upstream commits). The only committed change in this PR is a
1-line frontmatter addition (capability: capability:setup) to
.github/skills/setup-steward/SKILL.md, propagated from the new
framework version via /setup-steward upgrade. Everything else
lives in the gitignored .apache-steward/ snapshot.

Highlights from upstream (apache/airflow-steward):

- pr-management-triage: session-history gist persistence Step 6b
  (apache/magpie#343), four classifier heuristic fixes
  (apache/magpie#344), fetch-all-upfront pattern
  (apache/magpie#346)
- security-issue-triage: fetch-all-upfront analogue
  (apache/magpie#347)
- Framework labels + capability taxonomy (apache/magpie#340) —
  the source of the frontmatter line in this PR
- New skill pairing-self-review and tool spec-status-index
- claude-code pin 2.1.141 -> 2.1.150

/setup-steward upgrade ran cleanly locally: snapshot refreshed,
symlinks resolve, post-checkout hook in sync,
sandbox-add-project-root reconciled across 3 worktrees.
.apache-steward.local.lock updated to fetched_commit 5c211a4.
All .apache-steward-overrides/ files unchanged.

* Gitignore .apache-steward.session-state.json

Adds the per-machine session-state file to .gitignore. The file is
written by steward skills that maintain adopter-local persistence
anchors — currently pr-management-triage Step 6b's session-history
gist URL (apache/magpie#343), but the structure is
deliberately shared so other skills can add their own keys later.

The file is per-user, per-machine state; it should never be
committed even when a contributor stages everything with `git add -A`.
choo121600 pushed a commit to apache/airflow that referenced this pull request May 29, 2026
* Update apache-steward snapshot to 5c211a4

Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22
upstream commits). The only committed change in this PR is a
1-line frontmatter addition (capability: capability:setup) to
.github/skills/setup-steward/SKILL.md, propagated from the new
framework version via /setup-steward upgrade. Everything else
lives in the gitignored .apache-steward/ snapshot.

Highlights from upstream (apache/airflow-steward):

- pr-management-triage: session-history gist persistence Step 6b
  (apache/magpie#343), four classifier heuristic fixes
  (apache/magpie#344), fetch-all-upfront pattern
  (apache/magpie#346)
- security-issue-triage: fetch-all-upfront analogue
  (apache/magpie#347)
- Framework labels + capability taxonomy (apache/magpie#340) —
  the source of the frontmatter line in this PR
- New skill pairing-self-review and tool spec-status-index
- claude-code pin 2.1.141 -> 2.1.150

/setup-steward upgrade ran cleanly locally: snapshot refreshed,
symlinks resolve, post-checkout hook in sync,
sandbox-add-project-root reconciled across 3 worktrees.
.apache-steward.local.lock updated to fetched_commit 5c211a4.
All .apache-steward-overrides/ files unchanged.

* Gitignore .apache-steward.session-state.json

Adds the per-machine session-state file to .gitignore. The file is
written by steward skills that maintain adopter-local persistence
anchors — currently pr-management-triage Step 6b's session-history
gist URL (apache/magpie#343), but the structure is
deliberately shared so other skills can add their own keys later.

The file is per-user, per-machine state; it should never be
committed even when a contributor stages everything with `git add -A`.
(cherry picked from commit c521078)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit to apache/airflow that referenced this pull request May 29, 2026
* Update apache-steward snapshot to 5c211a4

Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22
upstream commits). The only committed change in this PR is a
1-line frontmatter addition (capability: capability:setup) to
.github/skills/setup-steward/SKILL.md, propagated from the new
framework version via /setup-steward upgrade. Everything else
lives in the gitignored .apache-steward/ snapshot.

Highlights from upstream (apache/airflow-steward):

- pr-management-triage: session-history gist persistence Step 6b
  (apache/magpie#343), four classifier heuristic fixes
  (apache/magpie#344), fetch-all-upfront pattern
  (apache/magpie#346)
- security-issue-triage: fetch-all-upfront analogue
  (apache/magpie#347)
- Framework labels + capability taxonomy (apache/magpie#340) —
  the source of the frontmatter line in this PR
- New skill pairing-self-review and tool spec-status-index
- claude-code pin 2.1.141 -> 2.1.150

/setup-steward upgrade ran cleanly locally: snapshot refreshed,
symlinks resolve, post-checkout hook in sync,
sandbox-add-project-root reconciled across 3 worktrees.
.apache-steward.local.lock updated to fetched_commit 5c211a4.
All .apache-steward-overrides/ files unchanged.

* Gitignore .apache-steward.session-state.json

Adds the per-machine session-state file to .gitignore. The file is
written by steward skills that maintain adopter-local persistence
anchors — currently pr-management-triage Step 6b's session-history
gist URL (apache/magpie#343), but the structure is
deliberately shared so other skills can add their own keys later.

The file is per-user, per-machine state; it should never be
committed even when a contributor stages everything with `git add -A`.
(cherry picked from commit c521078)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
potiuk added a commit that referenced this pull request May 30, 2026
…erns from session manual cleanups (#402)

Per direct observations from the airflow-s 2026-05-29/30 bulk sync —
two recurring title-noise patterns were cleaned manually that the
existing cascade did not catch:

1. Trailing prior-CVE-relationship parentheticals — the cross-CVE
   relationship is structurally captured by the Gate #3 cross-CVE
   clause in the public summary; embedding the relationship in the
   title is noise to downstream advisory consumers. Catches every
   shape observed in this session:
   - `(CVE-YYYY-NNNNN)`
   - `(possible CVE-YYYY-NNNNN variant)` — from #345
   - `(incomplete fix for CVE-YYYY-NNNNN)` — from #351
   - `(fix-bypass of CVE-YYYY-NNNNN)` — from #352
   - and any other `(... CVE-YYYY-NNNNN ...)` shape

2. Trailing reporter-name attribution parentheticals — reporter
   attribution lives in the credits field, never in the public
   title. Pattern matches `(<name> follow-up)` where `<name>`
   matches name-like tokens (word chars, dots, hyphens, single
   inline spaces) to avoid over-stripping substantive technical
   content. Catches:
   - `(Evan Ricafort follow-up)` — from #346

Substantive technical parentheticals stay intact — e.g. the operator-
name list `(GCSToSFTPOperator + GCSTimeSpanFileTransformOperator)` on
the GCS path-traversal tracker is NOT stripped (it lacks a CVE ID
and doesn't end in `follow-up`).

The matching Step 1d signal row in security-issue-sync now enumerates
the two new patterns so the proposal-time detector and the pre-push
Gate #4 stay in lock-step with the cascade.

Validated against 9 cases: 4 session-derived fixes (all pass), 3
synthetic CVE-relationship variants (all pass), 1 substantive
technical parenthetical (preserved correctly), 1 "<word> follow-up"
edge case (stripped as designed — narrow scope acceptable since
"follow-up" titles in airflow-s are exclusively reporter-attribution).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant