refactor: unify the LLM retry classifier across client + judge retry#74
Conversation
Merge B of the agent-eval consolidation. callLlm and withJudgeRetry each carried a private retry-classification surface — two backoff functions, two retryable-status sets, two divergent transient-error pattern lists. Collapse to one exported primitive in llm-client.ts: - isTransientLlmError(err) — THE retry classifier. Inspects an error's name/message/code, honors a numeric HTTP status (LlmCallError or a foreign SDK error), and recurses into error.cause (undici nests the real socket fault under .cause). withJudgeRetry's default predicate and callLlm's catch path both route through it. - backoffMs(attempt) — exported; withJudgeRetry's default backoff. judge-retry.ts drops its ABORT_PATTERNS, RETRYABLE_HTTP_STATUS, DEFAULT_BACKOFF, and defaultIsRetryable in favor of the shared pair. Bug class fixed: neither old classifier matched undici HTTP/2 transport faults (`terminated`, NGHTTP2_INTERNAL_ERROR, UND_ERR_*, `other side closed`). An HTTP/2 connection dropping mid-response escaped the retry loop — surfacing as an uncaught rejection in the HTTP client and a silent non-retried trial failure in TCloud-backed judges. The unified pattern list covers them and the cause chain is followed. Regression tests: callLlm retries an HTTP/2 fault to recovery; withJudgeRetry does the same via the shared classifier; isTransientLlmError unit-tested across HTTP/2, network, abort, status, deterministic-failure, and self-referential-cause inputs. typecheck + 1273 tests + biome + build all green.
✅ No Blockers —
|
tangletools
left a comment
There was a problem hiding this comment.
✅ Clean — 7351f0fd
I read every changed file in full, traced callers/callees, ran the full test suite (1272 tests passed), and verified the build. The PR cleanly deduplicates retry logic between llm-client and judge-retry into a single exported classifier, adds HTTP/2 transport fault detection with cause-chain traversal, and covers the new paths with regression tests. No runtime defects were found.
Full findings and scores: review summary
tangletools · 2026-05-21T21:16:07Z · trace
Summary
Merge B of the agent-eval consolidation.
callLlmandwithJudgeRetryeach carried a private retry-classification surface — two backoff functions, two retryable-status sets, two divergent transient-error pattern lists. One concern, three copies.Collapse to one exported primitive in
llm-client.ts:isTransientLlmError(err)— THE retry classifier. Inspects an error's name/message/code, honors a numeric HTTP status (LlmCallErroror a foreign SDK error that duck-types.status), and recurses intoerror.cause— undici nests the real socket fault under.cause.backoffMs(attempt)— exported;withJudgeRetry's default backoff.judge-retry.tsdrops itsABORT_PATTERNS,RETRYABLE_HTTP_STATUS,DEFAULT_BACKOFF, anddefaultIsRetryablein favor of the shared pair.callLlm's catch path routes through the same function.Bug class fixed
Neither old classifier matched undici HTTP/2 transport faults —
terminated,NGHTTP2_INTERNAL_ERROR,UND_ERR_*,other side closed.llm-client's regex was onlyfetch failed|ECONNRESET|ETIMEDOUT|EAI_AGAIN;judge-retry's list was broader but still HTTP/1-shaped. An HTTP/2 keep-alive connection dropping mid-response escaped the retry loop — surfacing as an uncaught rejection in the HTTP client and a silently non-retried trial failure in TCloud-backed judges. The unified pattern list covers them and the.causechain is followed to the real fault.Test plan
callLlmretries an HTTP/2terminated/NGHTTP2_INTERNAL_ERRORfault to recoverywithJudgeRetrydoes the same via the shared classifierisTransientLlmErrorunit-tested: HTTP/2, undici.code,.causechain, network/abort, retryable vs non-retryable status, deterministic failures (JSON parse, schema reject), self-referential cause chainpnpm typecheck— cleanpnpm test— 1273 passed (131 files)pnpm exec biome check src— 0 errorspnpm build— green