Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
1026e00
feat(evals): set default targets so all evals work out of the box
christso Apr 1, 2026
ebe1688
feat(evals): set default targets so all evals work out of the box
christso Apr 1, 2026
f74fb09
feat(evals): make default target env-var-driven for out-of-box evals
christso Apr 1, 2026
d2102dc
fix(ci): use explicit include patterns instead of negated globs
christso Apr 1, 2026
37a526c
feat(cli): support negation patterns (!glob) in eval path resolution
christso Apr 1, 2026
71d77a5
fix(ci): remove --targets override so per-example targets auto-discover
christso Apr 1, 2026
df3a765
fix: remove deprecated workspace_template from mock target configs
christso Apr 1, 2026
1191250
fix(ci): add Gemini credentials to workflow .env
christso Apr 1, 2026
03f5503
feat(evals): add llm target and classify all evals as llm or agent
christso Apr 1, 2026
b2c6a78
fix(evals): use default (copilot) instead of pi-cli for agent evals
christso Apr 1, 2026
0b04cf9
chore(ci): increase eval workers from 1 to 3
christso Apr 1, 2026
5c53635
fix(ci): exclude evals with local script providers from CI
christso Apr 1, 2026
f3870d6
fix(ci): add missing echo provider and install uv for local script evals
christso Apr 1, 2026
d081bd6
fix(evals): make LLM eval assertions pass with generic models
christso Apr 1, 2026
f8d8e94
fix(evals): switch llm and grader targets to OpenRouter
christso Apr 1, 2026
2a9f1c3
fix(evals): switch per-example grader targets from azure to root grader
christso Apr 1, 2026
2185c65
feat(core): add target alias support for single-env-var provider swit…
christso Apr 1, 2026
6438c23
feat(core): add use_target for target delegation
christso Apr 1, 2026
6936380
refactor(targets): use use_target for llm and grader targets
christso Apr 1, 2026
a076d4e
refactor(core): make provider optional when use_target is set
christso Apr 1, 2026
fddd943
fix(core): allow provider to be omitted when use_target is set
christso Apr 1, 2026
3c39f70
fix(core): allow use_target in targets-file.ts parser
christso Apr 1, 2026
7650b51
fix(ci): exclude copilot-log-eval from CI
christso Apr 1, 2026
3441f91
fix(cli): catch before_all failures per eval file instead of aborting
christso Apr 1, 2026
0dd936a
fix(core): resolve use_target chains in orchestrator for grader targets
christso Apr 1, 2026
50eef93
fix(evals): restore workspace.template for mock agent evals
christso Apr 1, 2026
595fc16
fix(ci): exclude evals with pre-existing workspace/batch bugs
christso Apr 1, 2026
41d1fad
fix(evals): fix remaining CI failures
christso Apr 1, 2026
64f9b40
fix(ci): remove --verbose to reduce log size, make JUnit step non-fatal
christso Apr 1, 2026
2cf1004
fix(ci): use --output instead of -o for JUnit path
christso Apr 1, 2026
2844421
feat(ci): add eval results summary to GitHub Actions step summary
christso Apr 1, 2026
29ea7c1
fix: remove unused grader targets from offline-grader-benchmark
christso Apr 2, 2026
a852e0b
fix(ci): use npm package for copilot CLI instead of curl installer
christso Apr 2, 2026
d8c9f8d
fix(ci): add Node 22 for copilot CLI compatibility
christso Apr 2, 2026
17431c2
debug(ci): remove tee pipe and limit to 2 eval sets for debugging
christso Apr 2, 2026
99c2f33
fix(evals): fix csv-analyzer rubrics criteria format
christso Apr 2, 2026
7e60324
fix(evals): keep skill-trigger assertions required, tag for exclusion
christso Apr 2, 2026
8e91cdf
fix(evals): add csv-analyzer skill to workspace and set workspace tem…
christso Apr 2, 2026
61c1b74
fix(ci): include copilot logs in artifacts for debugging
christso Apr 2, 2026
707761b
fix(evals): make csv-analyzer skill essential with proprietary formula
christso Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix(evals): switch llm and grader targets to OpenRouter
GH Models rate limits (429) were failing most LLM evals. OpenRouter
has higher rate limits and built-in provider fallback.

Also excluded code-grader-sdk from CI (needs Azure keys in its
per-example targets.yaml).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
  • Loading branch information
christso and claude committed Apr 1, 2026
commit f8d8e94f000adad8685edf657065f8c44c04960a
14 changes: 6 additions & 8 deletions .agentv/targets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,15 @@ targets:

# ── LLM target (text generation, no agent binary needed) ────────────
- name: llm
provider: openai
base_url: https://models.github.ai/inference/v1
api_key: ${{ GH_MODELS_TOKEN }}
model: ${{ GH_MODELS_MODEL }}
provider: openrouter
api_key: ${{ OPENROUTER_API_KEY }}
model: ${{ OPENROUTER_MODEL }}

# ── Grader (LLM-as-judge) ──────────────────────────────────────────
- name: grader
provider: openai
base_url: https://models.github.ai/inference/v1
api_key: ${{ GH_MODELS_TOKEN }}
model: ${{ GH_MODELS_MODEL }}
provider: openrouter
api_key: ${{ OPENROUTER_API_KEY }}
model: ${{ OPENROUTER_MODEL }}

# ── Named agent targets ───────────────────────────────────────────
- name: copilot-cli
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/evals.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ jobs:
DEFAULT_PATTERNS: "evals/**/*.eval.yaml,examples/**/*.eval.yaml,examples/**/*.EVAL.yaml,examples/**/EVAL.yaml"
# Exclude evals that need local scripts or multiple agent targets.
# Negation patterns (!glob) are supported by the CLI.
EXCLUDE_PATTERNS: "!examples/showcase/multi-model-benchmark/**"
EXCLUDE_PATTERNS: "!examples/showcase/multi-model-benchmark/**,!examples/features/code-grader-sdk/**"
run: |
PATTERNS="${{ github.event.inputs.suite_filter || vars.EVAL_PATTERNS || env.DEFAULT_PATTERNS }}"
EXCLUDES="${{ vars.EVAL_EXCLUDE_PATTERNS || env.EXCLUDE_PATTERNS }}"
Expand Down