feat: eval analyzer pass for weak assertions and flaky scenarios by christso · Pull Request #582 · EntityProcess/agentv

christso · 2026-03-14T04:35:26Z

Closes #567

Changes

New agent: eval-analyzer.md — standalone eval-quality analysis agent
New skill: agentv-eval-analyzer/SKILL.md — skill for invoking the analyzer

Capabilities

Deterministic-upgrade suggestions: Identifies LLM-judge evaluators doing work a deterministic assertion could handle (contains, regex, is-json, starts-with, etc.)
Weak assertion detection: Flags vague, tautological, and compound assertions with specific improvement suggestions
Cost/quality flagging: Surfaces always-pass, always-fail, expensive binary checks, and redundant evaluators
Multi-provider variance: Flags evaluators with high score variance across targets
Works with all evaluator types: code-judge, tool-trajectory, llm-judge, agent-judge, rubrics, composite, and all deterministic types
EVAL.yaml aware: Reads both JSONL results and EVAL.yaml config for full-context analysis

Architecture

External-first: new agent in plugins/agentv-dev/agents/ and skill in plugins/agentv-dev/skills/, following existing patterns (eval-judge, agentv-trace-analyst).

…gestions (#567) Add eval-analyzer agent that identifies LLM-judge evaluations replaceable with deterministic assertions, flags weak/vague assertions, and surfaces cost/quality improvement opportunities from JSONL results. New files: - agents/eval-analyzer.md: standalone analysis agent - skills/agentv-eval-analyzer/SKILL.md: skill for invoking the analyzer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cloudflare-workers-and-pages · 2026-03-14T04:36:17Z

Deploying agentv with Cloudflare Pages

Latest commit:	`5767436`
Status:	⚡️ Build in progress...

View logs

Fix schema accuracy: llm-judge uses prompt not criteria, remove non-existent types (icontains, starts-with, ends-with, contains-all), use correct alternatives (contains, regex). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

christso and others added 2 commits March 14, 2026 05:35

Merge remote-tracking branch 'origin/main' into feat/567-eval-analyzer

5767436

christso marked this pull request as ready for review March 14, 2026 05:42

christso merged commit 976a000 into main Mar 14, 2026
1 check was pending

christso deleted the feat/567-eval-analyzer branch March 14, 2026 05:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: eval analyzer pass for weak assertions and flaky scenarios#582

feat: eval analyzer pass for weak assertions and flaky scenarios#582
christso merged 3 commits intomainfrom
feat/567-eval-analyzer

christso commented Mar 14, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 14, 2026

Changes

Capabilities

Architecture

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading