Skip to content

feat(pairing): add multi-agent review pipeline skill and eval suite#269

Merged
potiuk merged 2 commits into
apache:mainfrom
justinmclean:pairing-multi-agent-review
Jun 2, 2026
Merged

feat(pairing): add multi-agent review pipeline skill and eval suite#269
potiuk merged 2 commits into
apache:mainfrom
justinmclean:pairing-multi-agent-review

Conversation

@justinmclean

Copy link
Copy Markdown
Member

Implements work item 5 from the spec-loop plan: a new pairing-multi-agent-review skill that fans a local diff through three independent, axis-isolated review passes (correctness, security, conventions) and merges their findings into one structured report.

Key design points:

  • Each sub-agent receives only its own axis scope to prevent one axis from anchoring or suppressing findings on another.
  • Sub-agents run in parallel (single Agent tool call message).
  • Deduplication annotates cross-axis findings with also_flagged_by rather than silently dropping them.
  • Injection-guard callout (Pattern 4) is present; injection attempts detected in diff content are flagged as blocking findings in the Security section.
  • Report format is identical to pairing-self-review for a consistent developer experience across the Pairing skill family.

Includes a 15-case eval suite across 6 step-suites covering diff collection, per-axis sub-agent passes, merge/deduplication, and report composition — including an adversarial injection-resistance case per axis.

Updates docs/modes.md to mark Pairing as experimental with 1 skill.

Generated-by: Claude (Opus 4.7)

@justinmclean justinmclean marked this pull request as draft May 25, 2026 02:30
@justinmclean justinmclean self-assigned this May 25, 2026
@justinmclean

Copy link
Copy Markdown
Member Author

Pre-flight self-review — PR #269 (pairing-multi-agent-review)

#269 · draft · author:
justinmclean

Base: main · Files changed: 52 (all added) · Diff size: +1038 / −3

A new Pairing-mode skill that fans the diff through three independent axis
passes (correctness, security, conventions) and merges findings. SKILL.md
(~400 lines) + a 6-step eval suite (step-1-collect-diff, step-2a/b/c axis
passes, step-3-merge-findings, step-4-compose-report) + docs/modes.md table
update.

Correctness

No findings. Eval-spec ↔ expected.json keys match exactly across all 6 step
suites (step-1:
resolved_base/files_changed/lines_added/lines_removed/diff_empty/stop_reason;
step-2a/b/c: axis/findings/injection_attempts; step-3:
merged_findings/blocking_count/advisory_count/aggregated_injection_attempts;
step-4:
sections_present/overall_signal/blocking_count/advisory_count/footer_present).
docs/modes.md table is internally consistent against current origin/main (no
Pairing skill exists there yet, so "Pairing: 1 skill" is accurate at merge
time).

Security

No findings. Standard injection-guard callout present in SKILL.md. The
three-axis fan-out design itself is a security feature (one axis can't
suppress another's findings). Adversarial coverage in the eval suite: step-2a
case-3-injection-blocked, step-3 case-3-injection-aggregation. Read-only skill
— no posting, no shell, no subprocess.

Conventions

No findings. skill-validate --strict clean for this skill (the one violation
in the run is the pre-existing security-tracker-stats-dashboard
action-inventory, unrelated). markdownlint-cli2 reports 0 errors across all 31
changed .md files. SPDX header in place; frontmatter well-formed; description
comma count under the action-inventory threshold.

Summary

Ready — no blocking or advisory findings.

Blocking: 0 Advisory: 0

@andreahlert andreahlert added enhancement New feature or request family:tools tools/* family:docs Docs, MISSION.md, READMEs mode:cross-cutting Spans multiple modes labels May 26, 2026
@potiuk

potiuk commented May 27, 2026

Copy link
Copy Markdown
Member

Hi @justinmclean — same situation as #228 / #229. The pairing-multi-agent-review skill isn't in main yet (so the content is still unique), but the branch is 357 files / ~12k-line behind current main and a maintainer-side rebase isn't mechanical at this staleness.

Whenever you get a chance, a fresh rebase from your end would help.

No rush.

@justinmclean justinmclean force-pushed the pairing-multi-agent-review branch from 063bb37 to 38f3280 Compare June 2, 2026 09:13
@potiuk

potiuk commented Jun 2, 2026

Copy link
Copy Markdown
Member

I will rebase/solve conflicts @justinmclean -> so that we can rename things - we can continue working on it later :)

@potiuk potiuk marked this pull request as ready for review June 2, 2026 11:42
justinmclean and others added 2 commits June 2, 2026 13:42
Implements work item 5 from the spec-loop plan: a new
pairing-multi-agent-review skill that fans a local diff through three
independent, axis-isolated review passes (correctness, security,
conventions) and merges their findings into one structured report.

Key design points:
- Each sub-agent receives only its own axis scope to prevent one axis
  from anchoring or suppressing findings on another.
- Sub-agents run in parallel (single Agent tool call message).
- Deduplication annotates cross-axis findings with also_flagged_by
  rather than silently dropping them.
- Injection-guard callout (Pattern 4) is present; injection attempts
  detected in diff content are flagged as blocking findings in the
  Security section.
- Report format is identical to pairing-self-review for a consistent
  developer experience across the Pairing skill family.

Includes a 15-case eval suite across 6 step-suites covering diff
collection, per-axis sub-agent passes, merge/deduplication, and report
composition — including an adversarial injection-resistance case per axis.

Updates docs/modes.md to mark Pairing as experimental with 1 skill.

Generated-by: Claude (Opus 4.7)
…base onto main

Rebasing apache#269 onto current main surfaced a semantic conflict: main now
requires a `capability` frontmatter key on every skill (enforced by the
skill-and-tool validator), a rule that landed after this PR was opened.
Adds `capability: capability:review` to the skill (it is a Pairing-mode
multi-agent code-review pipeline, matching pairing-self-review) and the
matching capability->skill map row in docs/labels-and-capabilities.md.

Generated-by: Claude Code (Opus 4.8)
@potiuk potiuk force-pushed the pairing-multi-agent-review branch from 38f3280 to f0236ff Compare June 2, 2026 11:45
@potiuk potiuk merged commit 7398e16 into apache:main Jun 2, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request family:docs Docs, MISSION.md, READMEs family:tools tools/* mode:cross-cutting Spans multiple modes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants