feat(eval): composable quality gates with auto-remediation triggers

## Summary

Add severity levels and auto-remediation triggers to AgentV's existing quality gate primitives.

## What already exists in AgentV

- **Required evaluators**: `required: boolean | number` on any evaluator — makes it a pass/fail gate
- **Required rubric items**: `required_min_score: number` on rubric items — minimum 0-10 score to pass
- **Score ranges**: `score_ranges` on rubrics for banded scoring (0-10 scale)
- **Negation**: `negate: boolean` to invert scores
- **Composite evaluators**: Combine multiple evaluators with weighted aggregation

## What's still missing

AgentV has binary pass/fail gating but lacks:

1. **Severity levels** — Distinguish `error` (blocks) from `warning` (informational) from `info` (logged only). Currently everything is pass/fail.
2. **Auto-remediation hooks** — When a gate fails, trigger a follow-up action (re-run with fix prompt, add a remediation step). Currently failure just fails.
3. **Reusable gate library** — Shareable, composable quality gate definitions (scaffold detection, duplicate code, hardcoded config, README accuracy).

## What this looks like in AgentV

```yaml
assert:
  - name: no_scaffold_defaults
    type: code_judge
    script: ./gates/scaffold-check.py
    severity: warning             # NEW: warning|error|info (default: error)

  - name: no_duplicate_blocks
    type: code_judge
    script: ./gates/duplicate-check.py
    severity: error               # blocks eval

  - name: readme_accuracy
    type: code_judge
    script: ./gates/readme-verify.py
    severity: info                # logged only
```

## Architecture alignment

- `severity` is an optional field on any evaluator config (non-breaking, default: `error` preserves current behavior)
- Severity maps to result JSONL: errors affect verdict, warnings appear in output but don't fail
- Auto-remediation is a post-eval hook (separate concern from eval itself)
- Reusable gates are just code_judge scripts — could ship as an `agentv-gates` package
- Extends existing `required` field semantics: `severity: warning` + `required: true` = required but non-blocking

## Research source

- copilot-swarm-orchestrator: `quality-gates.yaml` with scaffoldDefaults, duplicateBlocks, hardcodedConfig, readmeClaims, testIsolation; `gracefulDegradation: true`
- ralph-orchestrator: backpressure philosophy — deterministic checks as first-class evaluation

---

## AgentV Studio Surface (2026-03-27)

This issue now includes a dashboard management surface as part of the AgentV Studio platform (#788).

### Objective (clarified)

1. **Core engine**: Add severity levels (`error`/`warning`/`info`) and auto-remediation triggers to evaluator configs
2. **Studio UI**: Dashboard-driven gate configuration, visual threshold editor, alert routing, one-click remediation

### Design Latitude

- Auto-remediation hook format (shell command, eval rerun, webhook)
- Gate definition storage (YAML config vs. Studio-managed JSON)
- Whether gate library ships as built-in gates or a separate `agentv-gates` package
- Visual threshold editor implementation (slider vs. numeric input with histogram overlay)
- Alert routing destinations (Studio feed, webhook, email)

### Acceptance Signals

- [ ] `severity: warning|error|info` field works on any evaluator config
- [ ] Warnings appear in output but do not fail the eval
- [ ] Auto-remediation hooks trigger on gate failure
- [ ] Studio UI: gates are listable, creatable, editable, deletable
- [ ] Studio UI: threshold editor shows historical score distribution
- [ ] Studio UI: one-click remediation triggers rerun or mutation from gate failure alert
- [ ] Gate compliance history visible in Studio

### Non-Goals

- Real-time gate evaluation during streaming (gates evaluate after run completes)
- Gate marketplace or sharing across organizations
- Replacing YAML-based gate configuration (Studio is an additional surface, not a replacement)

### Dependencies

- #563 (AgentV Studio platform) — provides the dashboard surface
- #788 — AgentV Studio tracking epic



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): composable quality gates with auto-remediation triggers #334

Summary

What already exists in AgentV

What's still missing

What this looks like in AgentV

Architecture alignment

Research source

AgentV Studio Surface (2026-03-27)

Objective (clarified)

Design Latitude

Acceptance Signals

Non-Goals

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(eval): composable quality gates with auto-remediation triggers #334

Description

Summary

What already exists in AgentV

What's still missing

What this looks like in AgentV

Architecture alignment

Research source

AgentV Studio Surface (2026-03-27)

Objective (clarified)

Design Latitude

Acceptance Signals

Non-Goals

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions