-
Notifications
You must be signed in to change notification settings - Fork 0
feat(eval): composable quality gates with auto-remediation triggers #334
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add severity levels and auto-remediation triggers to AgentV's existing quality gate primitives.
What already exists in AgentV
- Required evaluators:
required: boolean | numberon any evaluator — makes it a pass/fail gate - Required rubric items:
required_min_score: numberon rubric items — minimum 0-10 score to pass - Score ranges:
score_rangeson rubrics for banded scoring (0-10 scale) - Negation:
negate: booleanto invert scores - Composite evaluators: Combine multiple evaluators with weighted aggregation
What's still missing
AgentV has binary pass/fail gating but lacks:
- Severity levels — Distinguish
error(blocks) fromwarning(informational) frominfo(logged only). Currently everything is pass/fail. - Auto-remediation hooks — When a gate fails, trigger a follow-up action (re-run with fix prompt, add a remediation step). Currently failure just fails.
- Reusable gate library — Shareable, composable quality gate definitions (scaffold detection, duplicate code, hardcoded config, README accuracy).
What this looks like in AgentV
assert:
- name: no_scaffold_defaults
type: code_judge
script: ./gates/scaffold-check.py
severity: warning # NEW: warning|error|info (default: error)
- name: no_duplicate_blocks
type: code_judge
script: ./gates/duplicate-check.py
severity: error # blocks eval
- name: readme_accuracy
type: code_judge
script: ./gates/readme-verify.py
severity: info # logged onlyArchitecture alignment
severityis an optional field on any evaluator config (non-breaking, default:errorpreserves current behavior)- Severity maps to result JSONL: errors affect verdict, warnings appear in output but don't fail
- Auto-remediation is a post-eval hook (separate concern from eval itself)
- Reusable gates are just code_judge scripts — could ship as an
agentv-gatespackage - Extends existing
requiredfield semantics:severity: warning+required: true= required but non-blocking
Research source
- copilot-swarm-orchestrator:
quality-gates.yamlwith scaffoldDefaults, duplicateBlocks, hardcodedConfig, readmeClaims, testIsolation;gracefulDegradation: true - ralph-orchestrator: backpressure philosophy — deterministic checks as first-class evaluation
AgentV Studio Surface (2026-03-27)
This issue now includes a dashboard management surface as part of the AgentV Studio platform (#788).
Objective (clarified)
- Core engine: Add severity levels (
error/warning/info) and auto-remediation triggers to evaluator configs - Studio UI: Dashboard-driven gate configuration, visual threshold editor, alert routing, one-click remediation
Design Latitude
- Auto-remediation hook format (shell command, eval rerun, webhook)
- Gate definition storage (YAML config vs. Studio-managed JSON)
- Whether gate library ships as built-in gates or a separate
agentv-gatespackage - Visual threshold editor implementation (slider vs. numeric input with histogram overlay)
- Alert routing destinations (Studio feed, webhook, email)
Acceptance Signals
-
severity: warning|error|infofield works on any evaluator config - Warnings appear in output but do not fail the eval
- Auto-remediation hooks trigger on gate failure
- Studio UI: gates are listable, creatable, editable, deletable
- Studio UI: threshold editor shows historical score distribution
- Studio UI: one-click remediation triggers rerun or mutation from gate failure alert
- Gate compliance history visible in Studio
Non-Goals
- Real-time gate evaluation during streaming (gates evaluate after run completes)
- Gate marketplace or sharing across organizations
- Replacing YAML-based gate configuration (Studio is an additional surface, not a replacement)
Dependencies
- feat: AgentV Studio — eval management platform with historical trends, quality gates, and orchestration #563 (AgentV Studio platform) — provides the dashboard surface
- project: AgentV Studio — eval management platform with quality gates, orchestration, and analysis #788 — AgentV Studio tracking epic
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Status
Backlog