fix(gateway): respect requested max_tokens for Anthropic#2289
fix(gateway): respect requested max_tokens for Anthropic#2289steebchen-bot wants to merge 2 commits into
Conversation
When the caller omitted max_tokens, the Anthropic and AWS Bedrock adapters in prepareRequestBody fell back to Math.max(1024, ...) — Anthropic's historical 1024 default for Claude 2. This silently truncated large responses and cut tool calls mid-emission, causing SDK parses to fail and agent loops to exit. Fall back to the model's advertised maxOutput (from packages/models) instead, e.g. 128000 for claude-opus-4-7. Caller-supplied values continue to flow through untouched. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
WalkthroughThis PR updates token budget defaults for Anthropic and AWS Bedrock when reasoning/thinking is enabled. Anthropic now uses the model's declared maximum output instead of a fixed 1024 cap. AWS Bedrock's token validation becomes conditional, respecting both caller-provided limits and reasoning budget requirements. Three new tests verify Anthropic's behavior. ChangesToken Budget Defaults for Reasoning Modes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/actions/src/prepare-request-body.spec.ts (1)
232-299: ⚡ Quick winAdd Bedrock parity regression tests for the new maxTokens path.
These new cases cover Anthropic well, but this PR also changes Bedrock maxTokens fallback/clamp behavior. Please add companion tests for: (1) omitted
max_tokensfallback to modelmaxOutput/reasoning floor, and (2) callermax_tokensbelow reasoning floor being raised.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/actions/src/prepare-request-body.spec.ts` around lines 232 - 299, The test suite adds Anthropic max_tokens cases but misses Bedrock parity; add tests exercising prepareRequestBody for the Bedrock provider checking (1) when caller omits max_tokens the returned requestBody.max_tokens falls back to the model mapping maxOutput (and respects any reasoning floor) and (2) when caller provides max_tokens below the reasoning floor and supportsReasoning/reasoning_effort is set, prepareRequestBody raises it to the reasoning floor; create two tests similar to the Anthropic ones that call prepareRequestBody with provider "bedrock" and a bedrock model id, asserting max_tokens equals the model's maxOutput when omitted and equals the reasoning floor when caller value is too low with supportsReasoning true and reasoning_effort set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/actions/src/prepare-request-body.ts`:
- Around line 2101-2106: The current truthy check `if
(!inferenceConfig.maxTokens)` treats 0 as unset; change it to an explicit
undefined/null check (e.g., `if (inferenceConfig.maxTokens === undefined ||
inferenceConfig.maxTokens === null)`) so that a deliberate 0 value is preserved
and then handled by the subsequent `else if (inferenceConfig.maxTokens <
reasoningFloor)` branch; update the branch around inferenceConfig.maxTokens,
max_tokens, bedrockModelMaxOutput, and reasoningFloor accordingly.
---
Nitpick comments:
In `@packages/actions/src/prepare-request-body.spec.ts`:
- Around line 232-299: The test suite adds Anthropic max_tokens cases but misses
Bedrock parity; add tests exercising prepareRequestBody for the Bedrock provider
checking (1) when caller omits max_tokens the returned requestBody.max_tokens
falls back to the model mapping maxOutput (and respects any reasoning floor) and
(2) when caller provides max_tokens below the reasoning floor and
supportsReasoning/reasoning_effort is set, prepareRequestBody raises it to the
reasoning floor; create two tests similar to the Anthropic ones that call
prepareRequestBody with provider "bedrock" and a bedrock model id, asserting
max_tokens equals the model's maxOutput when omitted and equals the reasoning
floor when caller value is too low with supportsReasoning true and
reasoning_effort set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 60849217-9dfe-423b-90e0-edd7c4b3b1e7
📒 Files selected for processing (2)
packages/actions/src/prepare-request-body.spec.tspackages/actions/src/prepare-request-body.ts
| if (!inferenceConfig.maxTokens) { | ||
| inferenceConfig.maxTokens = | ||
| max_tokens ?? | ||
| Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor); | ||
| } else if (inferenceConfig.maxTokens < reasoningFloor) { | ||
| inferenceConfig.maxTokens = reasoningFloor; |
There was a problem hiding this comment.
Use an explicit undefined check for Bedrock maxTokens initialization.
At Line 2101, if (!inferenceConfig.maxTokens) treats 0 as “unset”. With reasoning enabled, an explicit max_tokens = 0 can remain 0 (invalid floor) instead of being raised to reasoningFloor.
Suggested fix
- if (!inferenceConfig.maxTokens) {
+ if (inferenceConfig.maxTokens === undefined) {
inferenceConfig.maxTokens =
max_tokens ??
Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);
- } else if (inferenceConfig.maxTokens < reasoningFloor) {
+ }
+ if (inferenceConfig.maxTokens < reasoningFloor) {
inferenceConfig.maxTokens = reasoningFloor;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!inferenceConfig.maxTokens) { | |
| inferenceConfig.maxTokens = | |
| max_tokens ?? | |
| Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor); | |
| } else if (inferenceConfig.maxTokens < reasoningFloor) { | |
| inferenceConfig.maxTokens = reasoningFloor; | |
| if (inferenceConfig.maxTokens === undefined) { | |
| inferenceConfig.maxTokens = | |
| max_tokens ?? | |
| Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor); | |
| } | |
| if (inferenceConfig.maxTokens < reasoningFloor) { | |
| inferenceConfig.maxTokens = reasoningFloor; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/actions/src/prepare-request-body.ts` around lines 2101 - 2106, The
current truthy check `if (!inferenceConfig.maxTokens)` treats 0 as unset; change
it to an explicit undefined/null check (e.g., `if (inferenceConfig.maxTokens ===
undefined || inferenceConfig.maxTokens === null)`) so that a deliberate 0 value
is preserved and then handled by the subsequent `else if
(inferenceConfig.maxTokens < reasoningFloor)` branch; update the branch around
inferenceConfig.maxTokens, max_tokens, bedrockModelMaxOutput, and reasoningFloor
accordingly.
Bug report
Reproduction: calling
claude-opus-4-7through llmgateway withmax_tokens=32000(or any large value) returns at ~1024 output tokens; tool calls are cut mid-emission causing the SDK to fail silently. Calling Anthropic directly with the same model + same largemax_tokensworks fine — so the cap is being introduced in our gateway/proxy layer.Root cause
packages/actions/src/prepare-request-body.ts, the Anthropic case:Anthropic's Messages API requires
max_tokensto be set, so the adapter unconditionally fills it in. When the caller didn't supply one (or it was dropped earlier in the request pipeline), we fell back toMath.max(1024, thinkingBudget + 1000)— Anthropic's historical 1024 default for Claude 2. That value then went over the wire to upstream Anthropic, which honored it and truncated mid-emission. Same pattern existed in the AWS Bedrock branch.Fix
Fall back to the model's own advertised
maxOutput(e.g. 128000 forclaude-opus-4-7) rather than a flat 1024. Caller-supplied values continue to flow through verbatim. Applied to both:case "anthropic")case "aws-bedrock"reasoning branch)The Bedrock branch still enforces a floor of
thinkingBudget + 1000when the caller did supply amax_tokensthat's too small to fit the thinking budget — that's a separate correctness guarantee, not a default-cap.Test coverage
Added three regression tests in
prepare-request-body.spec.ts:max_tokensis forwarded verbatim — sends 32000, asserts request body carries 32000.max_tokensfalls back to modelmaxOutput— asserts Opus 4.7 gets 128000 (was 1024 before the fix; this test fails on the old code, confirmed locally).max_tokens— ensures enablingreasoning_effort: highdoesn't override caller's 32000.All 47 tests in the file pass; full actions package suite (127 tests across 4 files) green.
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests