Fix eval yaml parser by christso · Pull Request #4 · EntityProcess/agentv

christso · 2025-11-11T03:12:59Z

No description provided.

- Add noExternal config to tsup to bundle workspace dependency - Bump version to 0.1.4 - Fixes ERR_MODULE_NOT_FOUND when installing via npm

…e to CLAUDE.md Remove ErrorRetry interface, errorRetries field on EvaluationResult, and retry tracking code — no industry precedent, and retry count can be added later if needed. Add YAGNI as design principle #4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…y tracking (#442) * feat(eval): add retry-errors, fail_on_error tolerance, and error retry tracking (#433, #434, #435) Implements three follow-up features from #431 execution status classification: - --retry-errors <jsonl>: re-run only execution_error test cases from a previous output - execution.fail_on_error config: true (halt on first), false (never halt), or 0.0-1.0 threshold - errorRetries field on EvaluationResult to track transient errors retried during provider invocation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add unit tests for extractFailOnError config parser Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add retry-errors, fail_on_error, and errorRetries documentation Updates eval-schema.json, SKILL.md, running-evals.mdx, and eval-files.mdx with documentation for the three new features from #433, #434, #435. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add fail_on_error to Zod schema and regenerate eval-schema.json The eval-schema-sync test requires the Zod schema to be the source of truth. Adds FailOnErrorSchema to ExecutionSchema and regenerates the JSON schema to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: format eval-schema.json with Biome Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address code review feedback for fail_on_error implementation - Rewrite threshold test to exercise actual ratio math (succeed → succeed → fail → fail → fail triggers halt at 3/5=0.60 > 0.5) - Fix docs range notation from 0.0-1.0 to >0.0-1.0 (exclusive of 0) - Add concurrency best-effort note to docs - Add comment explaining why 0 is excluded from numeric thresholds - Add lightweight validation (testId + score) in loadNonErrorResults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: simplify fail_on_error to boolean-only (remove ratio threshold) Align with industry standards (promptfoo, braintrust) by keeping fail_on_error as a simple true/false toggle. The numeric ratio threshold (0.0-1.0) was YAGNI — post-hoc analysis of JSONL output is sufficient for error ratio decisions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix Biome formatting in config-loader Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove errorRetries tracking (YAGNI) and add YAGNI principle to CLAUDE.md Remove ErrorRetry interface, errorRetries field on EvaluationResult, and retry tracking code — no industry precedent, and retry count can be added later if needed. Add YAGNI as design principle #4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix trailing blank lines Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

add convenience script and fix yaml parser for windows line endings

c44efe3

christso changed the title ~~Fix eval parser~~ Fix eval yaml parser Nov 11, 2025

christso added 3 commits November 11, 2025 14:20

update AGENTS.md

5c7c8af

fix: bundle @agentevo/core for npm distribution

603f0dd

- Add noExternal config to tsup to bundle workspace dependency - Bump version to 0.1.4 - Fixes ERR_MODULE_NOT_FOUND when installing via npm

update AGENTS.md

9556c01

christso merged commit 2937f92 into main Nov 11, 2025

christso deleted the fix/eval-parsing branch November 11, 2025 03:45

christso mentioned this pull request Mar 9, 2026

Support env interpolation consistently across all config string fields #506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix eval yaml parser#4

Fix eval yaml parser#4
christso merged 4 commits intomainfrom
fix/eval-parsing

christso commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant