feat(mentoring): add pr-management-mentor intervention eval suite; mark Mentoring experimental by justinmclean · Pull Request #252 · apache/magpie

justinmclean · 2026-05-24T08:40:19Z

Generated by the spec-driven build loop. This eval suite and the docs
update were produced by an autonomous run of tools/spec-loop (./loop.sh —
one work item, one branch, one PR). Authored by Claude (see the Generated-by
commit trailer) and reviewed + tested by a human before submission.

What

Adds the missing intervention eval suite (8 cases) to the existing
pr-management-mentor skill's eval tree, covering the intervention-selection
decision: the out-of-scope and maintainer-engaged checks, the four intervention
templates, the multi-trigger (ask) path, the no-trigger (silent) path, and
the hand-off triggers.

Also syncs docs/modes.md: the Mentoring row moves from proposed / 0 skills
to experimental / 1 skill, and the section points at the shipped skill rather
than a forward reference.

Why

pr-management-mentor shipped without a matching eval suite for its
intervention-selection step, and the framework treats a skill without evals as
incomplete. This back-fills that coverage so the skill's decision logic is
pinned by fixtures.

Changes

tools/skill-evals/evals/pr-management-mentor/intervention/ — 8-case eval
suite (system prompt, user-prompt template, case fixtures).
docs/modes.md — Mentoring row + skill table.

Testing — and an issue the loop did not detect

The suite assembles cleanly and, after the fix below, an independent agent run
matches all 8 cases against their ground truth.

On the first test pass, case-4 (why-pushback) failed. Its expected.json
is handoff — which is correct per the skill: the contributor argues after
the agent already answered the "why" once, firing the skill's hand-off trigger
2 ("answer the why once, don't argue", defined in hand-off.md). But the
eval's own system-prompt.md never encoded the hand-off triggers, so a model
following the prompt returned draft / 4 instead. The build loop generated an
eval whose ground truth assumed skill logic its own prompt left out — a
self-inconsistency the loop did not catch. Fixed here by adding the four
hand-off triggers (documented 4 → 3 → 1 → 2 order) to system-prompt.md;
re-running the independent check then passed 8/8.

Notes

Eval + docs only; no skill behaviour (SKILL.md) is changed.
The loop-detection gap is called out deliberately: it's a concrete data point
that the build loop needs a self-consistency check between an eval's fixtures
and the prompt it ships.

potiuk · 2026-05-25T05:38:59Z

Hi @justinmclean — heads-up first: main was just refreshed with pinned-action SHA bumps for actions/cache, github/codeql-action, zizmorcore/zizmor-action, and astral-sh/setup-uv. Those bumps had been blocked by a schema bug in .github/dependabot.yml (the github-actions ecosystem rejected semver-{major,minor,patch}-days cooldown keys), which is now fixed — see #257. A rebase will pull the new SHAs into your branch as a side effect; that's expected.

Now the actual reason for this ping: small conflict in docs/modes.md after the latest main:

Triage row count is now 13 and includes contributor-nomination in the experimental list (your branch still has 12 and the older list).
Mentoring row: your branch sets it to experimental | 1; main has proposed | 0. Your PR's whole point is to flip Mentoring to experimental with the new eval suite — so keep your value, but please double-check the skill-count math is right after picking up contributor-nomination on the Triage row.

Could you rebase or merge main in and resolve? Happy to push the fix myself if you'd rather not — let me know.

justinmclean · 2026-05-25T05:46:48Z

Yep I can rebase for you

…erimental Adds the missing `intervention` eval suite (8 cases) to the `pr-management-mentor` eval tree, covering steps 3–5 of the runtime loop: out-of-scope check, maintainer-engaged check, and trigger matching for all four templates plus the multi-trigger and no-trigger paths. Updates `docs/modes.md` to reflect the prototype skill that already shipped: Mentoring row moves from `proposed / 0 skills` to `experimental / 1 skill`, and the section body is rewritten to point at the live skill rather than the "lands in a follow-up PR" forward reference. Validation: test -f docs/mentoring/spec.md ✓ uv run --project tools/skill-validator skill-validate ✓ (no violations) Generated-by: Claude (Opus 4.7)

justinmclean · 2026-05-25T06:14:44Z

@potiuk shoudl be good once the CI finishes

…ng row Address Justin's two open points on `process.md`: - Flowchart: S12, S13, and S14 are independent terminals from S11, not 12→13 / 14→13. Step 13's inputs (planning issue, [VOTE] and [RESULT] URLs, voter list, artefact list, promotion revision, [ANNOUNCE] URL) come from everything through Step 11; the archive sweep (12) and post-release snapshot bump (14) feed nothing into the audit log. - stateDiagram-v2: previously ended `announced → archived → [*]`, dropping 13 and 14 entirely. Added parallel branches `announced → audited` (Step 13) and `announced → bumped` (Step 14), each terminating at `[*]`, matching the flowchart. Also sync the README.md mentoring row with current `docs/modes.md` (experimental, 1 skill shipping) instead of the stale "proposed — not yet formally adopted" wording carried over from before apache#252.

justinmclean self-assigned this May 24, 2026

justinmclean added 2 commits May 25, 2026 16:10

fix bug

87a43fa

justinmclean force-pushed the mentoring-prototype branch from 79fd114 to 87a43fa Compare May 25, 2026 06:14

justinmclean marked this pull request as ready for review May 25, 2026 06:14

potiuk approved these changes May 25, 2026

View reviewed changes

potiuk merged commit 347239e into apache:main May 25, 2026
13 checks passed

andreahlert mentioned this pull request May 28, 2026

feat(release-management): propose family with 14-step lifecycle and non-ASF backend abstraction #163

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mentoring): add pr-management-mentor intervention eval suite; mark Mentoring experimental#252

feat(mentoring): add pr-management-mentor intervention eval suite; mark Mentoring experimental#252
potiuk merged 2 commits into
apache:mainfrom
justinmclean:mentoring-prototype

justinmclean commented May 24, 2026

Uh oh!

potiuk commented May 25, 2026

Uh oh!

justinmclean commented May 25, 2026

Uh oh!

justinmclean commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

justinmclean commented May 24, 2026

What

Why

Changes

Testing — and an issue the loop did not detect

Notes

Uh oh!

potiuk commented May 25, 2026

Uh oh!

justinmclean commented May 25, 2026

Uh oh!

justinmclean commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants