Add fail_on_error tolerance config for eval runs

## Context
Follow-up from #431 (execution status classification).

## Request
Add `execution.fail_on_error` config to eval YAML files:
- `true` — halt eval on first execution error
- `false` — never halt, record all errors (default)
- `>0.0–1.0` — halt if execution error proportion exceeds threshold

## Why
For large eval suites, a single infrastructure error shouldn't necessarily halt the entire run. Users need configurable tolerance. Defaulting to `false` is less disruptive and matches how most eval frameworks behave.

## Design
Add to eval YAML schema under `execution`:
```yaml
execution:
  fail_on_error: false  # or true, or 0.3 (30% threshold)
```

In the orchestrator, after each test completes, check the error proportion against the threshold. If exceeded, set `budgetExhausted`-style flag to skip remaining tests. With concurrency > 1, threshold tracking is best-effort (a few additional tests may complete before halting takes effect).

Numeric threshold range: exclusive of 0 (use `true` instead), inclusive of 1.0.

## Related
- #431 — execution status classification (prerequisite, merged)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fail_on_error tolerance config for eval runs #434

Context

Request

Why

Design

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add fail_on_error tolerance config for eval runs #434

Description

Context

Request

Why

Design

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions