refactor: route evolution judges through Agent SDK subprocess#49
refactor: route evolution judges through Agent SDK subprocess#49
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 64f036ae82
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| options: JudgeQueryOptions<T>, | ||
| ): Promise<JudgeQueryResult<T>> { | ||
| const startTime = Date.now(); | ||
| const resolvedModel = options.model ?? config.judge_model ?? config.model; |
There was a problem hiding this comment.
Honor
judge_model override in judge selection
runJudgeQuery resolves the model as options.model ?? config.judge_model ?? config.model, but every judge wrapper still passes a hard-coded model into callJudge (for example, the Sonnet/Haiku constants in the judge modules), so config.judge_model is never reached in practice. Operators who set judge_model expecting to shift judge traffic to a cheaper/faster tier will see no behavior change and continue paying for the hard-coded models.
Useful? React with 👍 / 👎.
| maxTurns: 1, | ||
| effort: "low", | ||
| persistSession: false, |
There was a problem hiding this comment.
Apply
maxTokens when issuing judge subprocess queries
The new judge path keeps maxTokens in the public options shape and forwards it from callJudge, but runJudgeQuery never includes that value in the SDK query() options. This silently drops token caps that previously bounded judge responses, which can increase latency/cost or make long judge outputs fail unpredictably when callers rely on that limit.
Useful? React with 👍 / 👎.
Summary
query()subprocess as the main agent, via a newruntime.judgeQuery()method@anthropic-ai/sdkdependency. The Agent SDK (@anthropic-ai/claude-agent-sdk) is unchangedmessages.parse()to prompt instruction +JSON.parse()+ Zod validation, with tolerant recovery for raw JSON, fenced JSON, and JSON wrapped in prosejudge_modelconfig field for operators who want a different model tier for judgesUnifies authentication:
ANTHROPIC_API_KEY,ANTHROPIC_BASE_URL, and any Claude Code credentials now apply to both the main agent and every evolution judge. Judge voting logic, prompts, schemas, and the 5-gate validation pipeline are unchanged. Existing deployments continue to work without configuration changes.Judges previously imported
AnthropicandzodOutputFormatfrom@anthropic-ai/sdkand held their own singleton client. They now delegate toruntime.judgeQuery()which reuses the Agent SDK subprocess, so a single code path and a single credential store drives both tiers.Test plan
bun test: 838 pass, 0 fail (up from 825 with 13 new parser tests insrc/agent/__tests__/judge-query.test.ts)bun run typecheckcleanbun run lintcleansrc/imports@anthropic-ai/sdkanymoreminority_veto,majority,unanimous) byte-for-byte unchangedparseJsonFromResponsehandles raw JSON,```jsonfences, plain```fences, prose-wrapped JSON, and throws clear errors on empty / non-JSON / malformed / schema-violating output