Merged
Conversation
- Add noExternal config to tsup to bundle workspace dependency - Bump version to 0.1.4 - Fixes ERR_MODULE_NOT_FOUND when installing via npm
christso
added a commit
that referenced
this pull request
Mar 6, 2026
…e to CLAUDE.md Remove ErrorRetry interface, errorRetries field on EvaluationResult, and retry tracking code — no industry precedent, and retry count can be added later if needed. Add YAGNI as design principle #4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
christso
added a commit
that referenced
this pull request
Mar 6, 2026
…y tracking (#442) * feat(eval): add retry-errors, fail_on_error tolerance, and error retry tracking (#433, #434, #435) Implements three follow-up features from #431 execution status classification: - --retry-errors <jsonl>: re-run only execution_error test cases from a previous output - execution.fail_on_error config: true (halt on first), false (never halt), or 0.0-1.0 threshold - errorRetries field on EvaluationResult to track transient errors retried during provider invocation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add unit tests for extractFailOnError config parser Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add retry-errors, fail_on_error, and errorRetries documentation Updates eval-schema.json, SKILL.md, running-evals.mdx, and eval-files.mdx with documentation for the three new features from #433, #434, #435. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add fail_on_error to Zod schema and regenerate eval-schema.json The eval-schema-sync test requires the Zod schema to be the source of truth. Adds FailOnErrorSchema to ExecutionSchema and regenerates the JSON schema to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: format eval-schema.json with Biome Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address code review feedback for fail_on_error implementation - Rewrite threshold test to exercise actual ratio math (succeed → succeed → fail → fail → fail triggers halt at 3/5=0.60 > 0.5) - Fix docs range notation from 0.0-1.0 to >0.0-1.0 (exclusive of 0) - Add concurrency best-effort note to docs - Add comment explaining why 0 is excluded from numeric thresholds - Add lightweight validation (testId + score) in loadNonErrorResults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: simplify fail_on_error to boolean-only (remove ratio threshold) Align with industry standards (promptfoo, braintrust) by keeping fail_on_error as a simple true/false toggle. The numeric ratio threshold (0.0-1.0) was YAGNI — post-hoc analysis of JSONL output is sufficient for error ratio decisions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix Biome formatting in config-loader Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove errorRetries tracking (YAGNI) and add YAGNI principle to CLAUDE.md Remove ErrorRetry interface, errorRetries field on EvaluationResult, and retry tracking code — no industry precedent, and retry count can be added later if needed. Add YAGNI as design principle #4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix trailing blank lines Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.