feat(pairing): add multi-agent review pipeline skill and eval suite by justinmclean · Pull Request #269 · apache/magpie

justinmclean · 2026-05-25T02:30:09Z

Implements work item 5 from the spec-loop plan: a new pairing-multi-agent-review skill that fans a local diff through three independent, axis-isolated review passes (correctness, security, conventions) and merges their findings into one structured report.

Key design points:

Each sub-agent receives only its own axis scope to prevent one axis from anchoring or suppressing findings on another.
Sub-agents run in parallel (single Agent tool call message).
Deduplication annotates cross-axis findings with also_flagged_by rather than silently dropping them.
Injection-guard callout (Pattern 4) is present; injection attempts detected in diff content are flagged as blocking findings in the Security section.
Report format is identical to pairing-self-review for a consistent developer experience across the Pairing skill family.

Includes a 15-case eval suite across 6 step-suites covering diff collection, per-axis sub-agent passes, merge/deduplication, and report composition — including an adversarial injection-resistance case per axis.

Updates docs/modes.md to mark Pairing as experimental with 1 skill.

Generated-by: Claude (Opus 4.7)

justinmclean · 2026-05-26T03:57:06Z

Pre-flight self-review — PR #269 (pairing-multi-agent-review)

#269 · draft · author:
justinmclean

Base: main · Files changed: 52 (all added) · Diff size: +1038 / −3

A new Pairing-mode skill that fans the diff through three independent axis
passes (correctness, security, conventions) and merges findings. SKILL.md
(~400 lines) + a 6-step eval suite (step-1-collect-diff, step-2a/b/c axis
passes, step-3-merge-findings, step-4-compose-report) + docs/modes.md table
update.

Correctness

No findings. Eval-spec ↔ expected.json keys match exactly across all 6 step
suites (step-1:
resolved_base/files_changed/lines_added/lines_removed/diff_empty/stop_reason;
step-2a/b/c: axis/findings/injection_attempts; step-3:
merged_findings/blocking_count/advisory_count/aggregated_injection_attempts;
step-4:
sections_present/overall_signal/blocking_count/advisory_count/footer_present).
docs/modes.md table is internally consistent against current origin/main (no
Pairing skill exists there yet, so "Pairing: 1 skill" is accurate at merge
time).

Security

No findings. Standard injection-guard callout present in SKILL.md. The
three-axis fan-out design itself is a security feature (one axis can't
suppress another's findings). Adversarial coverage in the eval suite: step-2a
case-3-injection-blocked, step-3 case-3-injection-aggregation. Read-only skill
— no posting, no shell, no subprocess.

Conventions

No findings. skill-validate --strict clean for this skill (the one violation
in the run is the pre-existing security-tracker-stats-dashboard
action-inventory, unrelated). markdownlint-cli2 reports 0 errors across all 31
changed .md files. SPDX header in place; frontmatter well-formed; description
comma count under the action-inventory threshold.

Summary

Ready — no blocking or advisory findings.

Blocking: 0 Advisory: 0

potiuk · 2026-05-27T20:24:32Z

Hi @justinmclean — same situation as #228 / #229. The pairing-multi-agent-review skill isn't in main yet (so the content is still unique), but the branch is 357 files / ~12k-line behind current main and a maintainer-side rebase isn't mechanical at this staleness.

Whenever you get a chance, a fresh rebase from your end would help.

No rush.

potiuk · 2026-06-02T11:42:32Z

I will rebase/solve conflicts @justinmclean -> so that we can rename things - we can continue working on it later :)

Implements work item 5 from the spec-loop plan: a new pairing-multi-agent-review skill that fans a local diff through three independent, axis-isolated review passes (correctness, security, conventions) and merges their findings into one structured report. Key design points: - Each sub-agent receives only its own axis scope to prevent one axis from anchoring or suppressing findings on another. - Sub-agents run in parallel (single Agent tool call message). - Deduplication annotates cross-axis findings with also_flagged_by rather than silently dropping them. - Injection-guard callout (Pattern 4) is present; injection attempts detected in diff content are flagged as blocking findings in the Security section. - Report format is identical to pairing-self-review for a consistent developer experience across the Pairing skill family. Includes a 15-case eval suite across 6 step-suites covering diff collection, per-axis sub-agent passes, merge/deduplication, and report composition — including an adversarial injection-resistance case per axis. Updates docs/modes.md to mark Pairing as experimental with 1 skill. Generated-by: Claude (Opus 4.7)

…base onto main Rebasing apache#269 onto current main surfaced a semantic conflict: main now requires a `capability` frontmatter key on every skill (enforced by the skill-and-tool validator), a rule that landed after this PR was opened. Adds `capability: capability:review` to the skill (it is a Pairing-mode multi-agent code-review pipeline, matching pairing-self-review) and the matching capability->skill map row in docs/labels-and-capabilities.md. Generated-by: Claude Code (Opus 4.8)

justinmclean marked this pull request as draft May 25, 2026 02:30

justinmclean self-assigned this May 25, 2026

andreahlert added enhancement New feature or request family:tools tools/* family:docs Docs, MISSION.md, READMEs mode:cross-cutting Spans multiple modes labels May 26, 2026

potiuk mentioned this pull request May 27, 2026

contributor-activity-sweep skill with eval suite #228

Closed

justinmclean force-pushed the pairing-multi-agent-review branch from 063bb37 to 38f3280 Compare June 2, 2026 09:13

potiuk marked this pull request as ready for review June 2, 2026 11:42

justinmclean and others added 2 commits June 2, 2026 13:42

potiuk force-pushed the pairing-multi-agent-review branch from 38f3280 to f0236ff Compare June 2, 2026 11:45

potiuk merged commit 7398e16 into apache:main Jun 2, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pairing): add multi-agent review pipeline skill and eval suite#269

feat(pairing): add multi-agent review pipeline skill and eval suite#269
potiuk merged 2 commits into
apache:mainfrom
justinmclean:pairing-multi-agent-review

justinmclean commented May 25, 2026

Uh oh!

justinmclean commented May 26, 2026

Uh oh!

potiuk commented May 27, 2026

Uh oh!

potiuk commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

justinmclean commented May 25, 2026

Uh oh!

justinmclean commented May 26, 2026

Uh oh!

potiuk commented May 27, 2026

Uh oh!

potiuk commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants