feat(security-issue-sync): pre-flight no-op classifier for bulk mode by potiuk · Pull Request #414 · apache/magpie

potiuk · 2026-05-31T11:45:16Z

Summary

Bulk sync (sync all, sync announced, etc.) currently dispatches a full subagent per resolved tracker — ~50 KB of transcript apiece — even when the tracker is in steady state and the subagent's whole report is empty.
This change inserts a Step 1b pre-flight classifier between selector-resolution and subagent dispatch. One batched gh api graphql round-trip fetches state / closedAt / updatedAt / labels / last-comment for every resolved issue at once (aliased multi-field query: ~3 KB request, ~6 KB response for 30 issues). A conservative rule table classifies each as dispatch / dispatch-urgent / skip-noop; only the non-skipped ones get subagents.
Expected savings on a 20-tracker bulk sweep where ~30-50% are idle: 6-10 fewer subagents × ~50 KB = ~300-500 KB of context per sweep.

Safety

Conservative rules — skip-noop fires only when multiple signals align (closed AND age AND label set AND inactive comment AND bot last commenter).
7-day updatedAt override — never skip a tracker with recent activity, regardless of other signals.
Set-resolving selectors only — sync #232, #233 (explicit numbers) never skips. Pre-flight applies to sync all / sync announced / label / title selectors.
Never silent — every skip appears in the proposal's "Pre-flight skipped" group with the rule that fired. The user can force-sync <N> any of them at confirmation.
--no-preflight opts out entirely.

What this PR does NOT do

It does not decide what action a tracker needs — still the subagent's job. Pre-flight only decides whether spawning the subagent is worth it.
No Python tool added; the orchestrator builds the GraphQL query inline per the documented recipe. Rules can be tuned by editing the table.

Test plan

lychee on the edited file — clean
skill-and-tool-validate — no new violations
prek (markdownlint, typos, format, trailing whitespace) — green
CI lychee + tests-ok on this PR
Try a real sync all after merge to see the classifier in practice; tune rules if needed

🤖 Generated with Claude Code

…idle trackers in bulk mode Bulk sync (sync all, sync announced, etc.) currently dispatches one full subagent per resolved tracker. Each subagent loads the skill + does a `gh issue view` + reads comments + reads mail + returns a structured report — ~50 KB per subagent transcript. On bulk sweeps where 30–50% of trackers are in steady state (closed > 30d with `announced`, or open with the full cve-allocated + pr-merged + announced label set and no recent activity), the subagent's full work is a no-op that produces an empty proposal — pure waste. This change inserts a Step 1b pre-flight classifier between selector-resolution and subagent dispatch. One batched `gh api graphql` round-trip fetches `state`, `closedAt`, `updatedAt`, `labels`, and the last comment's author+timestamp for every resolved issue at once (aliased multi-field query, ~3 KB request, ~6 KB response for 30 issues). A conservative rule table classifies each as `dispatch` / `dispatch-urgent` / `skip-noop`; only the non-skipped ones get subagents. Safety: * Conservative — `skip-noop` fires only when multiple signals align (closed AND age AND label set AND inactive last comment AND bot last commenter). * `updatedAt` within last 7 days is an absolute override; never skip a tracker with recent activity regardless of other signals. * Pre-flight only applies to set-resolving selectors (`sync all`, `sync announced`, label/title selectors). An explicit number selector like `sync apache#232, apache#233` never skips. * Every skip appears in the proposal's "Pre-flight skipped" group with the rule that fired — never silent. The user can `force-sync <N>` any of them at confirmation. * `--no-preflight` opts out entirely. This is a skill-instruction change; no Python tool added. The orchestrator builds the GraphQL query directly. Rules can be iterated quickly by editing the table; if real-world results show the classifier is too aggressive or too timid, the patches are one-line edits to the rule table.

…detection + relaxed rules (#416) A dry-run of #414's pre-flight against a real adopter tracker revealed the original rules misfired in two ways: - The "last comment author is a bot" check was structurally unreachable on single-operator private trackers where the sync skill writes rollup updates as the operator's personal GitHub user, not as a *[bot] account. - The 7-day updatedAt safety override caught most trackers because every tracker had been touched by the recent sync itself (rollup-comment writes, label flips) — conflating skill activity with substantive activity. Skip rate measured ~5% in this setup vs the predicted 30-50%. This tunes the classifier with two changes: 1. Skill-or-bot detection. Treat a comment as bot-equivalent when its body starts with the skill marker `<!-- apache-steward: ` (matches every status-rollup, release-manager hand-off, and wrap-up comment the framework writes). Falls back to the original `*[bot]` login check, plus an override-file hook for adopters with personal-account bots. Requires fetching body on the last comment — bumps query response size moderately (still cheaper than one subagent transcript), and the body field is what enables the skill-marker detection that drives most of the real-world skip rate. 2. Relaxed lifecycle skip rules. The original "idle > 14d" gates were a safety net for the broken bot-detection. With skill-or-bot detection working, the "all phases done; awaiting release" / "fix released; awaiting advisory" patterns are skip-eligible regardless of comment age — the skill marker itself is the "nothing new since last sync" signal. Re-running the dry-run on the same setup: skip rate ~5% → ~30%, and the skipped trackers were all correctly steady-state ones. Adds a new "fix released; awaiting advisory propagation" skip rule for the `cve allocated + fix released` label set — the single largest contributor to the new skip count.

…ity-suite refactor patterns Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the apache#410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (apache#386/apache#387/apache#388) - out-of-context — read/PATCH one field without loading the body (apache#412 github-body-field, apache#424 github-rollup) - fetch-upfront — batch per-item round-trips (apache#347) - preflight-classifier — skip obvious no-ops before LLM passes (apache#414/apache#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)

…ity-suite refactor patterns (#427) Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the #410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (#386/#387/#388) - out-of-context — read/PATCH one field without loading the body (#412 github-body-field, #424 github-rollup) - fetch-upfront — batch per-item round-trips (#347) - preflight-classifier — skip obvious no-ops before LLM passes (#414/#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)

potiuk merged commit 09f4288 into apache:main May 31, 2026
16 checks passed

potiuk mentioned this pull request May 31, 2026

feat(security-issue-sync): pre-flight v2 — skill-marker detection #416

Merged

5 tasks

potiuk mentioned this pull request Jun 1, 2026

feat(optimize-skill): skill to optimize existing skills via the security-suite refactor patterns #427

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(security-issue-sync): pre-flight no-op classifier for bulk mode#414

feat(security-issue-sync): pre-flight no-op classifier for bulk mode#414
potiuk merged 1 commit into
apache:mainfrom
potiuk:feat-bulk-mode-preflight-noop-skip

potiuk commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

potiuk commented May 31, 2026

Summary

Safety

What this PR does NOT do

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant