feat(providers,evaluators): claude-cli provider + trigger-judge evaluator by christso · Pull Request #597 · EntityProcess/agentv

christso · 2026-03-14T12:26:34Z

Summary

Closes #593

New claude-cli provider (ClaudeCliProvider): spawns claude -p as a subprocess with --output-format stream-json --include-partial-messages, parses streaming events to extract tool calls, token usage, and cost, and strips CLAUDECODE env var to allow nested sessions
claude is now an alias for claude-cli (the new default); both resolve to ClaudeCliProvider
claude-sdk provider (ClaudeSdkProvider): the existing ClaudeProvider renamed and made explicitly opt-in; kind changed to 'claude-sdk', id prefix updated accordingly
New trigger-judge evaluator (TriggerJudgeEvaluator): checks post-hoc whether the agent invoked a named skill by scanning response.toolCalls for Skill tool invocations (matching args.skill) or Read calls loading files from .claude/commands/ or .claude/skills/
ProviderKind, AGENT_PROVIDER_KINDS, KNOWN_PROVIDERS, and ResolvedTarget updated with claude-cli and claude-sdk
TriggerJudgeEvaluatorConfig added to EvaluatorConfig union and EVALUATOR_KIND_VALUES
eval-schema.json regenerated to include the new trigger-judge schema
Unit tests: 13 tests for trigger-judge evaluator logic, 4 tests for provider alias resolution

Risk

Low — purely additive. Existing claude targets continue to work (now route to the subprocess provider instead of the SDK provider). The old SDK provider is available at claude-sdk. No existing YAML or API changes required.

cloudflare-workers-and-pages · 2026-03-14T12:27:11Z

Deploying agentv with Cloudflare Pages

Latest commit:	`d42610c`
Status:	✅ Deploy successful!
Preview URL:	https://2360774d.agentv.pages.dev
Branch Preview URL:	https://feat-593-skill-trigger-eval.agentv.pages.dev

View logs

… evaluator (#593) - Add ClaudeCliProvider that spawns `claude -p` as a subprocess, streams output via --output-format stream-json --include-partial-messages, and extracts tool calls, token usage, and cost from stream events - Rename existing SDK provider class to ClaudeSdkProvider (claude-sdk.ts) with kind 'claude-sdk' for explicit opt-in to the Agent SDK path - Register 'claude' and 'claude-cli' as aliases for ClaudeCliProvider; 'claude-sdk' maps to ClaudeSdkProvider - Add 'claude-cli' and 'claude-sdk' to ProviderKind, AGENT_PROVIDER_KINDS, KNOWN_PROVIDERS, and ResolvedTarget union - Add TriggerJudgeEvaluator that checks whether the agent invoked a named skill by scanning tool calls for Skill invocations (args.skill match) or skill file reads (.claude/commands/, .claude/skills/) - Register trigger-judge in evaluator parser, schema, builtin registry, and EvaluatorConfig union - Regenerate eval-schema.json to include trigger-judge schema - Add unit tests for trigger-judge evaluator and claude provider aliases

…ider

--output-format stream-json requires --verbose when using -p (--print) mode. Without it the CLI exits with code 1 immediately. Also adds E2E tests validating output, tokenUsage, durationMs, and log file emission parity between claude-cli and claude-sdk providers.

…ges/ example Removes TriggerJudgeEvaluator from core built-ins (violates Principles 1 & 2: Claude-Code-specific, expressible as a code-judge script) and adds: - packages/core/src/evaluation/registry/judge-discovery.ts: new discoverJudges() function, mirroring discoverAssertions() but scans .agentv/judges/ - Wired discoverJudges into orchestrator alongside discoverAssertions - Exported discoverJudges from core public API and registry/index.ts - examples/features/agent-skills-evals/.agentv/judges/trigger-judge.ts: reference implementation as a code-judge script using defineCodeJudge - Regenerated eval-schema.json (trigger-judge removed from EvaluatorSchema union)

…n_eval.py

Align with the global rename from PR #604.

christso marked this pull request as ready for review March 14, 2026 12:26

christso force-pushed the feat/593-skill-trigger-eval branch from 5f3fc45 to c4cbdea Compare March 15, 2026 03:07

christso mentioned this pull request Mar 15, 2026

feat: create EVAL.yaml to evals.json transpiler #598

Closed

christso added 8 commits March 15, 2026 03:57

fix(providers): guard stdio access for null safety in claude-cli prov…

d028fde

…ider

style: format eval-schema.json with biome

3660b9d

docs: use assert instead of evaluators in trigger-judge example comment

5058999

refactor(judges): align trigger-judge detection with skill-creator ru…

481ab33

…n_eval.py

docs(judges): update trigger-judge example comment assert: → assertions:

5008ce5

Align with the global rename from PR #604.

christso force-pushed the feat/593-skill-trigger-eval branch from d42610c to 5008ce5 Compare March 15, 2026 04:00

christso merged commit ffc8457 into main Mar 15, 2026

christso deleted the feat/593-skill-trigger-eval branch March 15, 2026 04:01

christso mentioned this pull request Mar 15, 2026

Feat: support custom judges in transpile to evals.json #610

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers,evaluators): claude-cli provider + trigger-judge evaluator#597

feat(providers,evaluators): claude-cli provider + trigger-judge evaluator#597
christso merged 8 commits intomainfrom
feat/593-skill-trigger-eval

christso commented Mar 14, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 14, 2026

Summary

Risk

Uh oh!

cloudflare-workers-and-pages bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 14, 2026 •

edited

Loading