refactor: route evolution judges through Agent SDK subprocess by mcheemaa · Pull Request #49 · ghostwright/phantom

mcheemaa · 2026-04-12T04:25:15Z

Summary

LLM evolution judges now route through the same Agent SDK query() subprocess as the main agent, via a new runtime.judgeQuery() method
Removes the raw @anthropic-ai/sdk dependency. The Agent SDK (@anthropic-ai/claude-agent-sdk) is unchanged
Structured output moves from messages.parse() to prompt instruction + JSON.parse() + Zod validation, with tolerant recovery for raw JSON, fenced JSON, and JSON wrapped in prose
Adds an optional judge_model config field for operators who want a different model tier for judges

Unifies authentication: ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL, and any Claude Code credentials now apply to both the main agent and every evolution judge. Judge voting logic, prompts, schemas, and the 5-gate validation pipeline are unchanged. Existing deployments continue to work without configuration changes.

Judges previously imported Anthropic and zodOutputFormat from @anthropic-ai/sdk and held their own singleton client. They now delegate to runtime.judgeQuery() which reuses the Agent SDK subprocess, so a single code path and a single credential store drives both tiers.

Test plan

bun test: 838 pass, 0 fail (up from 825 with 13 new parser tests in src/agent/__tests__/judge-query.test.ts)
bun run typecheck clean
bun run lint clean
No file in src/ imports @anthropic-ai/sdk anymore
Judge voting logic (minority_veto, majority, unanimous) byte-for-byte unchanged
parseJsonFromResponse handles raw JSON, ```json fences, plain ``` fences, prose-wrapped JSON, and throws clear errors on empty / non-JSON / malformed / schema-violating output

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64f036ae82

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T04:31:35Z

src/agent/judge-query.ts

+	options: JudgeQueryOptions<T>,
+): Promise<JudgeQueryResult<T>> {
+	const startTime = Date.now();
+	const resolvedModel = options.model ?? config.judge_model ?? config.model;


Honor judge_model override in judge selection

runJudgeQuery resolves the model as options.model ?? config.judge_model ?? config.model, but every judge wrapper still passes a hard-coded model into callJudge (for example, the Sonnet/Haiku constants in the judge modules), so config.judge_model is never reached in practice. Operators who set judge_model expecting to shift judge traffic to a cheaper/faster tier will see no behavior change and continue paying for the hard-coded models.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-12T04:31:35Z

src/agent/judge-query.ts

+			maxTurns: 1,
+			effort: "low",
+			persistSession: false,


Apply maxTokens when issuing judge subprocess queries

The new judge path keeps maxTokens in the public options shape and forwards it from callJudge, but runJudgeQuery never includes that value in the SDK query() options. This silently drops token caps that previously bounded judge responses, which can increase latency/cost or make long judge outputs fail unpredictably when callers rely on that limit.

Useful? React with 👍 / 👎.

mcheemaa added 2 commits April 11, 2026 21:24

refactor: route evolution judges through Agent SDK subprocess

64f036a

test: make events getListenerCount test order-independent

794e1a9

mcheemaa merged commit 7b455a7 into main Apr 12, 2026
1 check passed

mcheemaa deleted the phase1/judge-subprocess branch April 12, 2026 04:30

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

mcheemaa mentioned this pull request Apr 12, 2026

feat: add provider config for multi-backend subprocess routing #50

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: route evolution judges through Agent SDK subprocess#49

refactor: route evolution judges through Agent SDK subprocess#49
mcheemaa merged 2 commits intomainfrom
phase1/judge-subprocess

mcheemaa commented Apr 12, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mcheemaa commented Apr 12, 2026

Summary

Test plan

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant