Skip to content

feat(providers,evaluators): claude-cli provider + trigger-judge evaluator#597

Merged
christso merged 8 commits intomainfrom
feat/593-skill-trigger-eval
Mar 15, 2026
Merged

feat(providers,evaluators): claude-cli provider + trigger-judge evaluator#597
christso merged 8 commits intomainfrom
feat/593-skill-trigger-eval

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

Closes #593

  • New claude-cli provider (ClaudeCliProvider): spawns claude -p as a subprocess with --output-format stream-json --include-partial-messages, parses streaming events to extract tool calls, token usage, and cost, and strips CLAUDECODE env var to allow nested sessions
  • claude is now an alias for claude-cli (the new default); both resolve to ClaudeCliProvider
  • claude-sdk provider (ClaudeSdkProvider): the existing ClaudeProvider renamed and made explicitly opt-in; kind changed to 'claude-sdk', id prefix updated accordingly
  • New trigger-judge evaluator (TriggerJudgeEvaluator): checks post-hoc whether the agent invoked a named skill by scanning response.toolCalls for Skill tool invocations (matching args.skill) or Read calls loading files from .claude/commands/ or .claude/skills/
  • ProviderKind, AGENT_PROVIDER_KINDS, KNOWN_PROVIDERS, and ResolvedTarget updated with claude-cli and claude-sdk
  • TriggerJudgeEvaluatorConfig added to EvaluatorConfig union and EVALUATOR_KIND_VALUES
  • eval-schema.json regenerated to include the new trigger-judge schema
  • Unit tests: 13 tests for trigger-judge evaluator logic, 4 tests for provider alias resolution

Risk

Low — purely additive. Existing claude targets continue to work (now route to the subprocess provider instead of the SDK provider). The old SDK provider is available at claude-sdk. No existing YAML or API changes required.

@christso christso marked this pull request as ready for review March 14, 2026 12:26
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 14, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: d42610c
Status: ✅  Deploy successful!
Preview URL: https://2360774d.agentv.pages.dev
Branch Preview URL: https://feat-593-skill-trigger-eval.agentv.pages.dev

View logs

… evaluator (#593)

- Add ClaudeCliProvider that spawns `claude -p` as a subprocess, streams
  output via --output-format stream-json --include-partial-messages, and
  extracts tool calls, token usage, and cost from stream events
- Rename existing SDK provider class to ClaudeSdkProvider (claude-sdk.ts)
  with kind 'claude-sdk' for explicit opt-in to the Agent SDK path
- Register 'claude' and 'claude-cli' as aliases for ClaudeCliProvider;
  'claude-sdk' maps to ClaudeSdkProvider
- Add 'claude-cli' and 'claude-sdk' to ProviderKind, AGENT_PROVIDER_KINDS,
  KNOWN_PROVIDERS, and ResolvedTarget union
- Add TriggerJudgeEvaluator that checks whether the agent invoked a named
  skill by scanning tool calls for Skill invocations (args.skill match) or
  skill file reads (.claude/commands/, .claude/skills/)
- Register trigger-judge in evaluator parser, schema, builtin registry,
  and EvaluatorConfig union
- Regenerate eval-schema.json to include trigger-judge schema
- Add unit tests for trigger-judge evaluator and claude provider aliases
--output-format stream-json requires --verbose when using -p (--print) mode.
Without it the CLI exits with code 1 immediately.

Also adds E2E tests validating output, tokenUsage, durationMs, and log
file emission parity between claude-cli and claude-sdk providers.
…ges/ example

Removes TriggerJudgeEvaluator from core built-ins (violates Principles 1 & 2:
Claude-Code-specific, expressible as a code-judge script) and adds:

- packages/core/src/evaluation/registry/judge-discovery.ts: new discoverJudges()
  function, mirroring discoverAssertions() but scans .agentv/judges/
- Wired discoverJudges into orchestrator alongside discoverAssertions
- Exported discoverJudges from core public API and registry/index.ts
- examples/features/agent-skills-evals/.agentv/judges/trigger-judge.ts:
  reference implementation as a code-judge script using defineCodeJudge
- Regenerated eval-schema.json (trigger-judge removed from EvaluatorSchema union)
@christso christso force-pushed the feat/593-skill-trigger-eval branch from d42610c to 5008ce5 Compare March 15, 2026 04:00
@christso christso merged commit ffc8457 into main Mar 15, 2026
@christso christso deleted the feat/593-skill-trigger-eval branch March 15, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: skill-trigger evaluation + claude-cli provider (subprocess-based)

1 participant