-
Notifications
You must be signed in to change notification settings - Fork 0
Add fail_on_error tolerance config for eval runs #434
Copy link
Copy link
Closed
Description
Context
Follow-up from #431 (execution status classification).
Request
Add execution.fail_on_error config to eval YAML files:
true— halt eval on first execution errorfalse— never halt, record all errors (default)>0.0–1.0— halt if execution error proportion exceeds threshold
Why
For large eval suites, a single infrastructure error shouldn't necessarily halt the entire run. Users need configurable tolerance. Defaulting to false is less disruptive and matches how most eval frameworks behave.
Design
Add to eval YAML schema under execution:
execution:
fail_on_error: false # or true, or 0.3 (30% threshold)In the orchestrator, after each test completes, check the error proportion against the threshold. If exceeded, set budgetExhausted-style flag to skip remaining tests. With concurrency > 1, threshold tracking is best-effort (a few additional tests may complete before halting takes effect).
Numeric threshold range: exclusive of 0 (use true instead), inclusive of 1.0.
Related
- Add explicit tooling/execution-failure status separate from model score #431 — execution status classification (prerequisite, merged)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels