Skip to content

feat(security-issue-sync): pre-flight no-op classifier for bulk mode#414

Merged
potiuk merged 1 commit into
apache:mainfrom
potiuk:feat-bulk-mode-preflight-noop-skip
May 31, 2026
Merged

feat(security-issue-sync): pre-flight no-op classifier for bulk mode#414
potiuk merged 1 commit into
apache:mainfrom
potiuk:feat-bulk-mode-preflight-noop-skip

Conversation

@potiuk

@potiuk potiuk commented May 31, 2026

Copy link
Copy Markdown
Member

Summary

  • Bulk sync (sync all, sync announced, etc.) currently dispatches a full subagent per resolved tracker — ~50 KB of transcript apiece — even when the tracker is in steady state and the subagent's whole report is empty.
  • This change inserts a Step 1b pre-flight classifier between selector-resolution and subagent dispatch. One batched gh api graphql round-trip fetches state / closedAt / updatedAt / labels / last-comment for every resolved issue at once (aliased multi-field query: ~3 KB request, ~6 KB response for 30 issues). A conservative rule table classifies each as dispatch / dispatch-urgent / skip-noop; only the non-skipped ones get subagents.
  • Expected savings on a 20-tracker bulk sweep where ~30-50% are idle: 6-10 fewer subagents × ~50 KB = ~300-500 KB of context per sweep.

Safety

  • Conservative rulesskip-noop fires only when multiple signals align (closed AND age AND label set AND inactive comment AND bot last commenter).
  • 7-day updatedAt override — never skip a tracker with recent activity, regardless of other signals.
  • Set-resolving selectors onlysync #232, #233 (explicit numbers) never skips. Pre-flight applies to sync all / sync announced / label / title selectors.
  • Never silent — every skip appears in the proposal's "Pre-flight skipped" group with the rule that fired. The user can force-sync <N> any of them at confirmation.
  • --no-preflight opts out entirely.

What this PR does NOT do

  • It does not decide what action a tracker needs — still the subagent's job. Pre-flight only decides whether spawning the subagent is worth it.
  • No Python tool added; the orchestrator builds the GraphQL query inline per the documented recipe. Rules can be tuned by editing the table.

Test plan

  • lychee on the edited file — clean
  • skill-and-tool-validate — no new violations
  • prek (markdownlint, typos, format, trailing whitespace) — green
  • CI lychee + tests-ok on this PR
  • Try a real sync all after merge to see the classifier in practice; tune rules if needed

🤖 Generated with Claude Code

…idle trackers in bulk mode

Bulk sync (sync all, sync announced, etc.) currently dispatches
one full subagent per resolved tracker. Each subagent loads the
skill + does a `gh issue view` + reads comments + reads mail +
returns a structured report — ~50 KB per subagent transcript.
On bulk sweeps where 30–50% of trackers are in steady state
(closed > 30d with `announced`, or open with the full
cve-allocated + pr-merged + announced label set and no recent
activity), the subagent's full work is a no-op that produces an
empty proposal — pure waste.

This change inserts a Step 1b pre-flight classifier between
selector-resolution and subagent dispatch. One batched
`gh api graphql` round-trip fetches `state`, `closedAt`,
`updatedAt`, `labels`, and the last comment's author+timestamp
for every resolved issue at once (aliased multi-field query,
~3 KB request, ~6 KB response for 30 issues). A conservative
rule table classifies each as `dispatch` / `dispatch-urgent` /
`skip-noop`; only the non-skipped ones get subagents.

Safety:

* Conservative — `skip-noop` fires only when multiple signals
  align (closed AND age AND label set AND inactive last comment
  AND bot last commenter).
* `updatedAt` within last 7 days is an absolute override; never
  skip a tracker with recent activity regardless of other
  signals.
* Pre-flight only applies to set-resolving selectors
  (`sync all`, `sync announced`, label/title selectors). An
  explicit number selector like `sync apache#232, apache#233` never skips.
* Every skip appears in the proposal's "Pre-flight skipped"
  group with the rule that fired — never silent. The user can
  `force-sync <N>` any of them at confirmation.
* `--no-preflight` opts out entirely.

This is a skill-instruction change; no Python tool added. The
orchestrator builds the GraphQL query directly. Rules can be
iterated quickly by editing the table; if real-world results
show the classifier is too aggressive or too timid, the patches
are one-line edits to the rule table.
@potiuk potiuk merged commit 09f4288 into apache:main May 31, 2026
16 checks passed
potiuk added a commit that referenced this pull request May 31, 2026
…detection + relaxed rules (#416)

A dry-run of #414's pre-flight against a real adopter tracker
revealed the original rules misfired in two ways:

- The "last comment author is a bot" check was structurally
  unreachable on single-operator private trackers where the sync
  skill writes rollup updates as the operator's personal GitHub
  user, not as a *[bot] account.
- The 7-day updatedAt safety override caught most trackers
  because every tracker had been touched by the recent sync
  itself (rollup-comment writes, label flips) — conflating
  skill activity with substantive activity. Skip rate measured
  ~5% in this setup vs the predicted 30-50%.

This tunes the classifier with two changes:

1. Skill-or-bot detection. Treat a comment as bot-equivalent
   when its body starts with the skill marker
   `<!-- apache-steward: ` (matches every status-rollup,
   release-manager hand-off, and wrap-up comment the framework
   writes). Falls back to the original `*[bot]` login check, plus
   an override-file hook for adopters with personal-account bots.
   Requires fetching body on the last comment — bumps query
   response size moderately (still cheaper than one subagent
   transcript), and the body field is what enables the
   skill-marker detection that drives most of the real-world
   skip rate.

2. Relaxed lifecycle skip rules. The original "idle > 14d"
   gates were a safety net for the broken bot-detection. With
   skill-or-bot detection working, the "all phases done; awaiting
   release" / "fix released; awaiting advisory" patterns are
   skip-eligible regardless of comment age — the skill marker
   itself is the "nothing new since last sync" signal.

Re-running the dry-run on the same setup: skip rate ~5% → ~30%,
and the skipped trackers were all correctly steady-state ones.
Adds a new "fix released; awaiting advisory propagation" skip
rule for the `cve allocated + fix released` label set — the
single largest contributor to the new skip count.
potiuk added a commit to potiuk/magpie that referenced this pull request Jun 1, 2026
…ity-suite refactor patterns

Adds `optimize-skill` (capability:setup) — the refactoring sibling of
`write-skill`. It takes an existing framework skill (or sweeps a set)
and applies the five restructuring patterns proven on the security
suite, as behavior-preserving proposals gated by the validator
(green-before / green-after):

- split — slim an oversized SKILL.md into linked siblings (the apache#410
  pattern; addresses the PRINCIPLES.md P14 cap)
- config-lift — move concrete values into <project-config> (apache#386/apache#387/apache#388)
- out-of-context — read/PATCH one field without loading the body
  (apache#412 github-body-field, apache#424 github-rollup)
- fetch-upfront — batch per-item round-trips (apache#347)
- preflight-classifier — skip obvious no-ops before LLM passes (apache#414/apache#416)

SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR /
mechanics / behavior-preservation guarantee / validation) lives in
the patterns.md sibling. Reads only framework-internal files, so no
injection-guard / Privacy-LLM callouts.

Ships a step-diagnose eval (5 auto-comparable cases incl. an
injection-resistance case) so the skill is not released without an
eval (P8). Wires the skill into the capability->skill map and the
eval index.

Generated-by: Claude Code (Opus 4.8)
potiuk added a commit to potiuk/magpie that referenced this pull request Jun 1, 2026
…ity-suite refactor patterns

Adds `optimize-skill` (capability:setup) — the refactoring sibling of
`write-skill`. It takes an existing framework skill (or sweeps a set)
and applies the five restructuring patterns proven on the security
suite, as behavior-preserving proposals gated by the validator
(green-before / green-after):

- split — slim an oversized SKILL.md into linked siblings (the apache#410
  pattern; addresses the PRINCIPLES.md P14 cap)
- config-lift — move concrete values into <project-config> (apache#386/apache#387/apache#388)
- out-of-context — read/PATCH one field without loading the body
  (apache#412 github-body-field, apache#424 github-rollup)
- fetch-upfront — batch per-item round-trips (apache#347)
- preflight-classifier — skip obvious no-ops before LLM passes (apache#414/apache#416)

SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR /
mechanics / behavior-preservation guarantee / validation) lives in
the patterns.md sibling. Reads only framework-internal files, so no
injection-guard / Privacy-LLM callouts.

Ships a step-diagnose eval (5 auto-comparable cases incl. an
injection-resistance case) so the skill is not released without an
eval (P8). Wires the skill into the capability->skill map and the
eval index.

Generated-by: Claude Code (Opus 4.8)
potiuk added a commit that referenced this pull request Jun 1, 2026
…ity-suite refactor patterns (#427)

Adds `optimize-skill` (capability:setup) — the refactoring sibling of
`write-skill`. It takes an existing framework skill (or sweeps a set)
and applies the five restructuring patterns proven on the security
suite, as behavior-preserving proposals gated by the validator
(green-before / green-after):

- split — slim an oversized SKILL.md into linked siblings (the #410
  pattern; addresses the PRINCIPLES.md P14 cap)
- config-lift — move concrete values into <project-config> (#386/#387/#388)
- out-of-context — read/PATCH one field without loading the body
  (#412 github-body-field, #424 github-rollup)
- fetch-upfront — batch per-item round-trips (#347)
- preflight-classifier — skip obvious no-ops before LLM passes (#414/#416)

SKILL.md is 297 lines; the pass catalogue (smell / exemplar PR /
mechanics / behavior-preservation guarantee / validation) lives in
the patterns.md sibling. Reads only framework-internal files, so no
injection-guard / Privacy-LLM callouts.

Ships a step-diagnose eval (5 auto-comparable cases incl. an
injection-resistance case) so the skill is not released without an
eval (P8). Wires the skill into the capability->skill map and the
eval index.

Generated-by: Claude Code (Opus 4.8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant