-
Notifications
You must be signed in to change notification settings - Fork 0
Add --retry-errors CLI flag to re-run only execution errors #433
Copy link
Copy link
Closed
Description
Context
Follow-up from #431 (execution status classification).
Request
Add agentv run --retry-errors <previous-output.jsonl> that reads a previous eval output, filters for results with execution_status === 'execution_error', and re-runs only those test cases.
Why
When infrastructure issues (missing runtime, network timeout) cause execution errors, users shouldn't have to re-run the entire eval suite. They should be able to fix the environment and re-run just the failed cases.
Design
- Read JSONL output file
- Filter for
execution_status === 'execution_error' - Extract test IDs
- Re-run eval with
--filterset to those test IDs - Merge results into a new output file
Related
- Add explicit tooling/execution-failure status separate from model score #431 — execution status classification (prerequisite, merged)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels