docs: separate execution quality from trigger quality in eval guidance

## Objective

Explicitly document the distinction between execution-quality evaluation ("does the skill help when loaded?") and trigger-quality evaluation ("does the system load the skill when it should?") in AgentV's docs and roadmap, preventing overloaded evaluation configs and clarifying scope for future work.

## Architecture Boundary

docs-examples

## Context

Both Anthropic's skill-creator and Tessl treat trigger optimization as a separate evaluation track from task execution quality. Anthropic's skill-creator has dedicated tooling for trigger evaluation: repeated trigger trials, train/test splits, held-out model selection, and description optimization.

AgentV's current eval guidance naturally emphasizes execution quality. Without explicitly calling out trigger quality as a separate concern, users may try to overload execution eval configs with trigger-detection logic, or expect AgentV to handle skill discovery optimization that belongs in a different product surface.

## Design Latitude

- Choose where this distinction is documented (roadmap doc, architecture doc, existing eval guide, or new conceptual guide)
- May be as simple as a "Concepts" or "Evaluation Types" section in the docs
- Choose how to frame trigger-quality as a future direction without over-promising
- May reference the skill-creator's trigger evaluation approach as industry context

## Acceptance Signals

- AgentV docs explicitly name "execution quality" and "trigger quality" as distinct evaluation concerns
- The docs explain why they are different problems (noisy vs deterministic, different optimization surfaces)
- Current AgentV eval tooling is positioned as execution-quality evaluation
- Trigger-quality evaluation is framed as a future direction, not a current gap
- The docs include a clear statement that execution eval configs should not be used for trigger evaluation
- The `agentv-eval-builder` skill reference card is updated to reflect this distinction (per CLAUDE.md Documentation Updates guidelines)

## Non-Goals

- Building trigger-evaluation CLI commands or runtime features
- Adding trigger trial tooling, train/test splits, or description optimizers
- Creating a skill marketplace or discovery system
- Changing the current eval schema or config format

## Research Basis

- [Skill Creator Findings — Key Pattern #5: Trigger optimization is a separate eval track](https://github.com/agentevals/agentevals-research/blob/main/research/findings/skill-creator/README.md)
- [Tessl Findings — Key Pattern #5: Separate skill execution quality from skill discovery/trigger quality](https://github.com/agentevals/agentevals-research/blob/main/research/findings/tessl-skill-improvement/README.md)
- [Skill Lifecycle Alignment — Adopt Now #3](https://github.com/agentevals/agentevals-research/blob/main/research/agentv/skill-lifecycle-alignment.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: separate execution quality from trigger quality in eval guidance #566

Objective

Architecture Boundary

Context

Design Latitude

Acceptance Signals

Non-Goals

Research Basis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs: separate execution quality from trigger quality in eval guidance #566

Description

Objective

Architecture Boundary

Context

Design Latitude

Acceptance Signals

Non-Goals

Research Basis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions