feat: unified agent-evaluation lifecycle skill (combine eval-orchestrator + optimizer) by christso · Pull Request #583 · EntityProcess/agentv

christso · 2026-03-14T04:42:33Z

Closes #573

Changes

Expanded agentv-optimizer from 5-phase to 8-phase unified lifecycle skill
Deprecated eval-orchestrator (redirects to unified skill)
Added migration reference for skill-creator users

8 Phases

Discovery — optimizer-discovery agent analyzes eval, challenges assumptions, triages failures
Run Baseline — absorbed from eval-orchestrator: workspace eval, multi-provider, multi-turn, code judges, all formats, agent+CLI modes
Grade — enhanced eval-judge with per-assertion evidence, claims extraction, self-critique (feat: adopt skill-creator grading patterns in eval-judge (claims extraction, eval critique, evidence format) #570)
Compare — blind N-way comparison with dynamic rubrics, post-comparison analysis (feat: blind A/B comparison with dynamic rubrics and post-comparison analysis #571)
Analyze — SIMBA/GEPA + deterministic-upgrade suggestions, weak assertion detection, benchmark patterns (feat: eval analyzer pass for weak assertions and flaky scenarios #567)
Review — human review checkpoint with structured feedback.json (feat: human review checkpoint and feedback artifact for skill iteration #568), skippable in CI
Optimize — curator surgical edits + polish generalization, variant tracking
Re-run + Iterate — loop with exit conditions (target pass rate, human approval, stagnation)

Key Capabilities Preserved

All EVAL.yaml workspace evaluation capabilities:

Workspace isolation (clone repos, setup/teardown scripts)
Multi-provider targets (Claude, GPT, Copilot, Gemini, custom CLI)
Multi-turn conversation evaluation
Code judges (Python/TypeScript via defineCodeJudge())
Tool trajectory scoring
Workspace file change tracking
All eval formats (EVAL.yaml, evals.json, JSONL)
Agent-mode + CLI-mode

New Features

Mid-lifecycle entry (start at any phase with existing data)
Companion artifacts: grading.json, benchmark.json, feedback.json (feat: skill-eval companion artifacts (grading, timing, benchmark) #565)
Mode auto-detection from input format
9 specialized agents dispatched on-demand
Trigger description disambiguated from skill-creator (fix: disambiguate agentv eval skill triggers from skill-creator #572)
Migration reference for skill-creator users

Files Changed

plugins/agentv-dev/skills/agentv-optimizer/SKILL.md — rewritten (159→346 lines)
plugins/agentv-dev/skills/agentv-eval-orchestrator/SKILL.md — deprecated (77→25 lines)
plugins/agentv-dev/skills/agentv-optimizer/references/migrating-from-skill-creator.md — new (96 lines)

Expand agentv-optimizer into 8-phase lifecycle skill covering the full evaluation improvement loop: discover → run → grade → compare → analyze → review → optimize → re-run. Absorb eval-orchestrator into Phase 2. Reference all enhanced agents from Wave 1+2. Preserves all EVAL.yaml capabilities: workspace isolation, multi-provider, multi-turn, code judges, tool trajectory, workspace file tracking. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cloudflare-workers-and-pages · 2026-03-14T04:43:17Z

Deploying agentv with Cloudflare Pages

Latest commit:	`ba856de`
Status:	⚡️ Build in progress...

View logs

…cycle-skill # Conflicts: # plugins/agentv-dev/skills/agentv-eval-orchestrator/SKILL.md # plugins/agentv-dev/skills/agentv-optimizer/SKILL.md

christso marked this pull request as ready for review March 14, 2026 05:36

Merge remote-tracking branch 'origin/main' into feat/573-unified-life…

ba856de

…cycle-skill # Conflicts: # plugins/agentv-dev/skills/agentv-eval-orchestrator/SKILL.md # plugins/agentv-dev/skills/agentv-optimizer/SKILL.md

christso merged commit f7b35d3 into main Mar 14, 2026
1 check was pending

christso deleted the feat/573-unified-lifecycle-skill branch March 14, 2026 05:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unified agent-evaluation lifecycle skill (combine eval-orchestrator + optimizer)#583

feat: unified agent-evaluation lifecycle skill (combine eval-orchestrator + optimizer)#583
christso merged 2 commits intomainfrom
feat/573-unified-lifecycle-skill

christso commented Mar 14, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 14, 2026

Changes

8 Phases

Key Capabilities Preserved

New Features

Files Changed

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading