Skip to content

Add fail_on_error tolerance config for eval runs #434

@christso

Description

@christso

Context

Follow-up from #431 (execution status classification).

Request

Add execution.fail_on_error config to eval YAML files:

  • true — halt eval on first execution error
  • false — never halt, record all errors (default)
  • >0.0–1.0 — halt if execution error proportion exceeds threshold

Why

For large eval suites, a single infrastructure error shouldn't necessarily halt the entire run. Users need configurable tolerance. Defaulting to false is less disruptive and matches how most eval frameworks behave.

Design

Add to eval YAML schema under execution:

execution:
  fail_on_error: false  # or true, or 0.3 (30% threshold)

In the orchestrator, after each test completes, check the error proportion against the threshold. If exceeded, set budgetExhausted-style flag to skip remaining tests. With concurrency > 1, threshold tracking is best-effort (a few additional tests may complete before halting takes effect).

Numeric threshold range: exclusive of 0 (use true instead), inclusive of 1.0.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions