feat: skill-eval companion artifacts (grading, timing, benchmark) by christso · Pull Request #579 · EntityProcess/agentv

christso · 2026-03-14T04:26:45Z

Closes #565

Changes

New ArtifactWriter module following existing JsonlWriter pattern
Produces grading/<test-id>.json, timing.json, benchmark.json from JSONL
--artifacts <dir> CLI flag on agentv eval run
JSONL parser handles snake_case keys from existing output files
29 tests for artifact generation, schema compatibility, and file I/O

Artifact Schemas

grading/<test-id>.json — Per-test grading with expectations (text/passed/evidence), summary, execution_metrics, and AgentV extensions (evaluators, workspace_changes, conversation)

timing.json — Aggregate duration_ms, total_tokens, and token_usage (input/output)

benchmark.json — Cross-test statistics per target with mean/stddev for pass_rate, time_seconds, tokens, tool_calls, cost_usd

Interoperability

Shared fields (expectations[].text/passed/evidence, summary, run_summary) use identical names and types as Anthropic's skill-creator. AgentV-specific fields are additive (evaluators, workspace_changes, conversation, per_evaluator_summary).

Add ArtifactWriter module that produces grading/<test>.json, timing.json, and benchmark.json from existing JSONL eval results. Includes --artifacts CLI flag for eval run command. - Grading artifacts map per-evaluator hits/misses to skill-creator's expectations/evidence format with AgentV extensions (evaluators, workspace_changes, conversation) - Timing artifact aggregates duration and token usage across all results - Benchmark artifact computes per-target statistics (mean/stddev) for pass_rate, time, tokens, tool_calls, and cost - JSONL parser handles snake_case keys from existing output files - 29 tests covering artifact generation, schema compatibility, and I/O - Schemas are supersets of Anthropic skill-creator conventions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cloudflare-workers-and-pages · 2026-03-14T04:27:51Z

Deploying agentv with Cloudflare Pages

Latest commit:	`391e4fd`
Status:	✅ Deploy successful!
Preview URL:	https://2f6d16ef.agentv.pages.dev
Branch Preview URL:	https://feat-565-companion-artifacts.agentv.pages.dev

View logs

christso marked this pull request as ready for review March 14, 2026 05:36

christso merged commit bf74717 into main Mar 14, 2026
1 check passed

christso deleted the feat/565-companion-artifacts branch March 14, 2026 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: skill-eval companion artifacts (grading, timing, benchmark)#579

feat: skill-eval companion artifacts (grading, timing, benchmark)#579
christso merged 1 commit intomainfrom
feat/565-companion-artifacts

christso commented Mar 14, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 14, 2026

Changes

Artifact Schemas

Interoperability

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant