feat(security-issue-sync): pre-flight no-op classifier for bulk mode#414
Merged
Merged
Conversation
…idle trackers in bulk mode Bulk sync (sync all, sync announced, etc.) currently dispatches one full subagent per resolved tracker. Each subagent loads the skill + does a `gh issue view` + reads comments + reads mail + returns a structured report — ~50 KB per subagent transcript. On bulk sweeps where 30–50% of trackers are in steady state (closed > 30d with `announced`, or open with the full cve-allocated + pr-merged + announced label set and no recent activity), the subagent's full work is a no-op that produces an empty proposal — pure waste. This change inserts a Step 1b pre-flight classifier between selector-resolution and subagent dispatch. One batched `gh api graphql` round-trip fetches `state`, `closedAt`, `updatedAt`, `labels`, and the last comment's author+timestamp for every resolved issue at once (aliased multi-field query, ~3 KB request, ~6 KB response for 30 issues). A conservative rule table classifies each as `dispatch` / `dispatch-urgent` / `skip-noop`; only the non-skipped ones get subagents. Safety: * Conservative — `skip-noop` fires only when multiple signals align (closed AND age AND label set AND inactive last comment AND bot last commenter). * `updatedAt` within last 7 days is an absolute override; never skip a tracker with recent activity regardless of other signals. * Pre-flight only applies to set-resolving selectors (`sync all`, `sync announced`, label/title selectors). An explicit number selector like `sync apache#232, apache#233` never skips. * Every skip appears in the proposal's "Pre-flight skipped" group with the rule that fired — never silent. The user can `force-sync <N>` any of them at confirmation. * `--no-preflight` opts out entirely. This is a skill-instruction change; no Python tool added. The orchestrator builds the GraphQL query directly. Rules can be iterated quickly by editing the table; if real-world results show the classifier is too aggressive or too timid, the patches are one-line edits to the rule table.
5 tasks
potiuk
added a commit
that referenced
this pull request
May 31, 2026
…detection + relaxed rules (#416) A dry-run of #414's pre-flight against a real adopter tracker revealed the original rules misfired in two ways: - The "last comment author is a bot" check was structurally unreachable on single-operator private trackers where the sync skill writes rollup updates as the operator's personal GitHub user, not as a *[bot] account. - The 7-day updatedAt safety override caught most trackers because every tracker had been touched by the recent sync itself (rollup-comment writes, label flips) — conflating skill activity with substantive activity. Skip rate measured ~5% in this setup vs the predicted 30-50%. This tunes the classifier with two changes: 1. Skill-or-bot detection. Treat a comment as bot-equivalent when its body starts with the skill marker `<!-- apache-steward: ` (matches every status-rollup, release-manager hand-off, and wrap-up comment the framework writes). Falls back to the original `*[bot]` login check, plus an override-file hook for adopters with personal-account bots. Requires fetching body on the last comment — bumps query response size moderately (still cheaper than one subagent transcript), and the body field is what enables the skill-marker detection that drives most of the real-world skip rate. 2. Relaxed lifecycle skip rules. The original "idle > 14d" gates were a safety net for the broken bot-detection. With skill-or-bot detection working, the "all phases done; awaiting release" / "fix released; awaiting advisory" patterns are skip-eligible regardless of comment age — the skill marker itself is the "nothing new since last sync" signal. Re-running the dry-run on the same setup: skip rate ~5% → ~30%, and the skipped trackers were all correctly steady-state ones. Adds a new "fix released; awaiting advisory propagation" skip rule for the `cve allocated + fix released` label set — the single largest contributor to the new skip count.
potiuk
added a commit
to potiuk/magpie
that referenced
this pull request
Jun 1, 2026
…ity-suite refactor patterns Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the apache#410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (apache#386/apache#387/apache#388) - out-of-context — read/PATCH one field without loading the body (apache#412 github-body-field, apache#424 github-rollup) - fetch-upfront — batch per-item round-trips (apache#347) - preflight-classifier — skip obvious no-ops before LLM passes (apache#414/apache#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)
potiuk
added a commit
to potiuk/magpie
that referenced
this pull request
Jun 1, 2026
…ity-suite refactor patterns Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the apache#410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (apache#386/apache#387/apache#388) - out-of-context — read/PATCH one field without loading the body (apache#412 github-body-field, apache#424 github-rollup) - fetch-upfront — batch per-item round-trips (apache#347) - preflight-classifier — skip obvious no-ops before LLM passes (apache#414/apache#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)
potiuk
added a commit
that referenced
this pull request
Jun 1, 2026
…ity-suite refactor patterns (#427) Adds `optimize-skill` (capability:setup) — the refactoring sibling of `write-skill`. It takes an existing framework skill (or sweeps a set) and applies the five restructuring patterns proven on the security suite, as behavior-preserving proposals gated by the validator (green-before / green-after): - split — slim an oversized SKILL.md into linked siblings (the #410 pattern; addresses the PRINCIPLES.md P14 cap) - config-lift — move concrete values into <project-config> (#386/#387/#388) - out-of-context — read/PATCH one field without loading the body (#412 github-body-field, #424 github-rollup) - fetch-upfront — batch per-item round-trips (#347) - preflight-classifier — skip obvious no-ops before LLM passes (#414/#416) SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR / mechanics / behavior-preservation guarantee / validation) lives in the patterns.md sibling. Reads only framework-internal files, so no injection-guard / Privacy-LLM callouts. Ships a step-diagnose eval (5 auto-comparable cases incl. an injection-resistance case) so the skill is not released without an eval (P8). Wires the skill into the capability->skill map and the eval index. Generated-by: Claude Code (Opus 4.8)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sync all,sync announced, etc.) currently dispatches a full subagent per resolved tracker — ~50 KB of transcript apiece — even when the tracker is in steady state and the subagent's whole report is empty.gh api graphqlround-trip fetchesstate/closedAt/updatedAt/labels/ last-comment for every resolved issue at once (aliased multi-field query: ~3 KB request, ~6 KB response for 30 issues). A conservative rule table classifies each asdispatch/dispatch-urgent/skip-noop; only the non-skipped ones get subagents.Safety
skip-noopfires only when multiple signals align (closed AND age AND label set AND inactive comment AND bot last commenter).updatedAtoverride — never skip a tracker with recent activity, regardless of other signals.sync #232, #233(explicit numbers) never skips. Pre-flight applies tosync all/sync announced/ label / title selectors.force-sync <N>any of them at confirmation.--no-preflightopts out entirely.What this PR does NOT do
Test plan
lycheeon the edited file — cleanskill-and-tool-validate— no new violationsprek(markdownlint, typos, format, trailing whitespace) — greensync allafter merge to see the classifier in practice; tune rules if needed🤖 Generated with Claude Code