Skip to content

feat(eval): add retry-errors, fail_on_error tolerance, and error retry tracking#442

Merged
christso merged 10 commits intomainfrom
feat/follow-up-431-error-tracking-tolerance-retry
Mar 6, 2026
Merged

feat(eval): add retry-errors, fail_on_error tolerance, and error retry tracking#442
christso merged 10 commits intomainfrom
feat/follow-up-431-error-tracking-tolerance-retry

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Mar 6, 2026

Summary

Implements three follow-up features from #431 (execution status classification):

Closes #433, closes #434, closes #435

Test plan

  • Unit tests for loadErrorTestIds / loadNonErrorResults (5 tests)
  • Unit tests for extractFailOnError config parser (9 tests)
  • Integration tests for fail_on_error orchestrator behavior (3 tests: true, threshold, false)
  • Integration tests for errorRetries tracking (2 tests: with/without retries)
  • Zod schema sync test passes (eval-schema.json matches generated schema)
  • Full test suite: 1,080 tests, 0 failures
  • Build, typecheck, lint all pass

🤖 Generated with Claude Code

christso and others added 6 commits March 6, 2026 04:01
…y tracking (#433, #434, #435)

Implements three follow-up features from #431 execution status classification:
- --retry-errors <jsonl>: re-run only execution_error test cases from a previous output
- execution.fail_on_error config: true (halt on first), false (never halt), or 0.0-1.0 threshold
- errorRetries field on EvaluationResult to track transient errors retried during provider invocation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updates eval-schema.json, SKILL.md, running-evals.mdx, and eval-files.mdx
with documentation for the three new features from #433, #434, #435.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The eval-schema-sync test requires the Zod schema to be the source of
truth. Adds FailOnErrorSchema to ExecutionSchema and regenerates the
JSON schema to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite threshold test to exercise actual ratio math (succeed →
  succeed → fail → fail → fail triggers halt at 3/5=0.60 > 0.5)
- Fix docs range notation from 0.0-1.0 to >0.0-1.0 (exclusive of 0)
- Add concurrency best-effort note to docs
- Add comment explaining why 0 is excluded from numeric thresholds
- Add lightweight validation (testId + score) in loadNonErrorResults

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 6, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 4ea4508
Status: ✅  Deploy successful!
Preview URL: https://42e50489.agentv.pages.dev
Branch Preview URL: https://feat-follow-up-431-error-tra.agentv.pages.dev

View logs

@christso christso closed this Mar 6, 2026
@christso christso deleted the feat/follow-up-431-error-tracking-tolerance-retry branch March 6, 2026 05:28
@christso christso restored the feat/follow-up-431-error-tracking-tolerance-retry branch March 6, 2026 05:33
@christso christso reopened this Mar 6, 2026
christso and others added 4 commits March 6, 2026 06:09
…old)

Align with industry standards (promptfoo, braintrust) by keeping
fail_on_error as a simple true/false toggle. The numeric ratio
threshold (0.0-1.0) was YAGNI — post-hoc analysis of JSONL output
is sufficient for error ratio decisions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e to CLAUDE.md

Remove ErrorRetry interface, errorRetries field on EvaluationResult,
and retry tracking code — no industry precedent, and retry count
can be added later if needed. Add YAGNI as design principle #4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@christso christso merged commit 6f64eb4 into main Mar 6, 2026
1 check passed
@christso christso deleted the feat/follow-up-431-error-tracking-tolerance-retry branch March 6, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant