feat: adopt skill-creator grading patterns in eval-judge (claims extraction, eval critique, evidence format) by christso · Pull Request #578 · EntityProcess/agentv

christso · 2026-03-14T04:18:38Z

Closes #570

Changes

Enhanced eval-judge agent prompt with five capabilities from Anthropic's skill-creator grader:

Claims extraction and verification — Extracts and verifies implicit claims beyond predefined assertions
Eval self-critique — Critiques assertion quality inline, flags weak/trivial assertions
Surface vs substance guards — Distinguishes genuine task completion from superficial compliance
Structured evidence format — Per-assertion {text, passed, evidence} compatible with grading.json
User notes integration — Reads executor notes when available

What's preserved

agentv prompt eval judge command integration
Weighted average scoring
JSONL append workflow
Deterministic + prompt_ready evaluator dispatch

Enhance eval-judge with claims extraction, eval self-critique, surface/substance guards, per-assertion evidence format, and user notes integration from Anthropic's skill-creator grader patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cloudflare-workers-and-pages · 2026-03-14T04:19:24Z

Deploying agentv with Cloudflare Pages

Latest commit:	`d744fd3`
Status:	⚡️ Build in progress...

View logs

Restructure enhanced output fields to use existing schema fields (reasoning, scores[].reasoning, scores[].details) and extensions pattern for new data. - Per-assertion evidence → scores[].reasoning + scores[].details - Verified claims → structured section in top-level reasoning - User notes → structured section in top-level reasoning - Eval feedback, claims, user notes summary → extensions object Core output shape (score, hits, misses, reasoning, answer, mode, scores[]) remains unchanged. New structured data is additive via the extensions pattern, which the JSONL writer serializes via toSnakeCaseDeep(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nhancement

christso and others added 2 commits March 14, 2026 05:41

Merge remote-tracking branch 'origin/main' into feat/570-eval-judge-e…

d744fd3

…nhancement

christso marked this pull request as ready for review March 14, 2026 05:42

christso merged commit ef99a1f into main Mar 14, 2026
1 check was pending

christso deleted the feat/570-eval-judge-enhancement branch March 14, 2026 05:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adopt skill-creator grading patterns in eval-judge (claims extraction, eval critique, evidence format)#578

feat: adopt skill-creator grading patterns in eval-judge (claims extraction, eval critique, evidence format)#578
christso merged 3 commits intomainfrom
feat/570-eval-judge-enhancement

christso commented Mar 14, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 14, 2026

Changes

What's preserved

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading