Skip to content

fix(gateway): respect requested max_tokens for Anthropic#2289

Open
steebchen-bot wants to merge 2 commits into
mainfrom
fix/anthropic-max-tokens-1024-cap
Open

fix(gateway): respect requested max_tokens for Anthropic#2289
steebchen-bot wants to merge 2 commits into
mainfrom
fix/anthropic-max-tokens-1024-cap

Conversation

@steebchen-bot
Copy link
Copy Markdown
Collaborator

@steebchen-bot steebchen-bot commented May 14, 2026

Bug report

Smoking gun in the export — lastStepOutput: 1024. That's the LLM Gateway / proxy's per-step output cap (Anthropic's old default), overriding our requested MAX_OUTPUT_TOKENS = 32_000. Opus on Anthropic-native is fine, but llmgateway/claude-opus-4-7 is enforcing 1024 server-side. The model emits a tool call that gets cut mid-emission → SDK parses nothing → loop exits silently.

Reproduction: calling claude-opus-4-7 through llmgateway with max_tokens=32000 (or any large value) returns at ~1024 output tokens; tool calls are cut mid-emission causing the SDK to fail silently. Calling Anthropic directly with the same model + same large max_tokens works fine — so the cap is being introduced in our gateway/proxy layer.

Root cause

packages/actions/src/prepare-request-body.ts, the Anthropic case:

const thinkingBudget = getThinkingBudget(reasoning_effort);
const minMaxTokens = Math.max(1024, thinkingBudget + 1000);
requestBody.max_tokens = max_tokens ?? minMaxTokens;

Anthropic's Messages API requires max_tokens to be set, so the adapter unconditionally fills it in. When the caller didn't supply one (or it was dropped earlier in the request pipeline), we fell back to Math.max(1024, thinkingBudget + 1000) — Anthropic's historical 1024 default for Claude 2. That value then went over the wire to upstream Anthropic, which honored it and truncated mid-emission. Same pattern existed in the AWS Bedrock branch.

Fix

Fall back to the model's own advertised maxOutput (e.g. 128000 for claude-opus-4-7) rather than a flat 1024. Caller-supplied values continue to flow through verbatim. Applied to both:

  • Native Anthropic provider (case "anthropic")
  • AWS Bedrock Anthropic provider (case "aws-bedrock" reasoning branch)

The Bedrock branch still enforces a floor of thinkingBudget + 1000 when the caller did supply a max_tokens that's too small to fit the thinking budget — that's a separate correctness guarantee, not a default-cap.

Test coverage

Added three regression tests in prepare-request-body.spec.ts:

  1. Caller-supplied max_tokens is forwarded verbatim — sends 32000, asserts request body carries 32000.
  2. Omitted max_tokens falls back to model maxOutput — asserts Opus 4.7 gets 128000 (was 1024 before the fix; this test fails on the old code, confirmed locally).
  3. Reasoning + caller max_tokens — ensures enabling reasoning_effort: high doesn't override caller's 32000.

All 47 tests in the file pass; full actions package suite (127 tests across 4 files) green.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved token limit handling for Anthropic and AWS Bedrock providers when reasoning features are enabled, ensuring user-specified token limits are properly respected and default values align with actual model capabilities rather than artificial minimums.
  • Tests

    • Added comprehensive test coverage for token limit behavior and reasoning feature interactions across supported providers.

Review Change Stack

When the caller omitted max_tokens, the Anthropic and AWS Bedrock
adapters in prepareRequestBody fell back to Math.max(1024, ...) —
Anthropic's historical 1024 default for Claude 2. This silently
truncated large responses and cut tool calls mid-emission, causing
SDK parses to fail and agent loops to exit.

Fall back to the model's advertised maxOutput (from packages/models)
instead, e.g. 128000 for claude-opus-4-7. Caller-supplied values
continue to flow through untouched.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Walkthrough

This PR updates token budget defaults for Anthropic and AWS Bedrock when reasoning/thinking is enabled. Anthropic now uses the model's declared maximum output instead of a fixed 1024 cap. AWS Bedrock's token validation becomes conditional, respecting both caller-provided limits and reasoning budget requirements. Three new tests verify Anthropic's behavior.

Changes

Token Budget Defaults for Reasoning Modes

Layer / File(s) Summary
Anthropic token budget defaults and tests
packages/actions/src/prepare-request-body.ts, packages/actions/src/prepare-request-body.spec.ts
Anthropic's fallback max_tokens now uses the model's advertised maxOutput combined with thinking budget, replacing a fixed 1024 minimum. Three test cases verify caller-supplied max_tokens is forwarded without capping, omitted max_tokens defaults to maxOutput, and reasoning preserves the caller's token budget.
AWS Bedrock token budget validation under reasoning
packages/actions/src/prepare-request-body.ts
AWS Bedrock's inferenceConfig.maxTokens validation changes to conditional logic: unset values use caller max_tokens or the greater of provider maxOutput and reasoningFloor; values below reasoningFloor are raised to meet the minimum.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • theopenco/llmgateway#2034: Both PRs modify AWS Bedrock reasoning/thinking path token budget logic in packages/actions/src/prepare-request-body.ts.
  • theopenco/llmgateway#2079: Both PRs update Anthropic and AWS Bedrock token budgeting behavior for thinking/reasoning modes in prepare-request-body.ts.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title concisely and accurately summarizes the main change: fixing the Anthropic max_tokens handling to respect caller-supplied values instead of enforcing a 1024 cap.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/anthropic-max-tokens-1024-cap

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/actions/src/prepare-request-body.spec.ts (1)

232-299: ⚡ Quick win

Add Bedrock parity regression tests for the new maxTokens path.

These new cases cover Anthropic well, but this PR also changes Bedrock maxTokens fallback/clamp behavior. Please add companion tests for: (1) omitted max_tokens fallback to model maxOutput/reasoning floor, and (2) caller max_tokens below reasoning floor being raised.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/prepare-request-body.spec.ts` around lines 232 - 299,
The test suite adds Anthropic max_tokens cases but misses Bedrock parity; add
tests exercising prepareRequestBody for the Bedrock provider checking (1) when
caller omits max_tokens the returned requestBody.max_tokens falls back to the
model mapping maxOutput (and respects any reasoning floor) and (2) when caller
provides max_tokens below the reasoning floor and
supportsReasoning/reasoning_effort is set, prepareRequestBody raises it to the
reasoning floor; create two tests similar to the Anthropic ones that call
prepareRequestBody with provider "bedrock" and a bedrock model id, asserting
max_tokens equals the model's maxOutput when omitted and equals the reasoning
floor when caller value is too low with supportsReasoning true and
reasoning_effort set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/actions/src/prepare-request-body.ts`:
- Around line 2101-2106: The current truthy check `if
(!inferenceConfig.maxTokens)` treats 0 as unset; change it to an explicit
undefined/null check (e.g., `if (inferenceConfig.maxTokens === undefined ||
inferenceConfig.maxTokens === null)`) so that a deliberate 0 value is preserved
and then handled by the subsequent `else if (inferenceConfig.maxTokens <
reasoningFloor)` branch; update the branch around inferenceConfig.maxTokens,
max_tokens, bedrockModelMaxOutput, and reasoningFloor accordingly.

---

Nitpick comments:
In `@packages/actions/src/prepare-request-body.spec.ts`:
- Around line 232-299: The test suite adds Anthropic max_tokens cases but misses
Bedrock parity; add tests exercising prepareRequestBody for the Bedrock provider
checking (1) when caller omits max_tokens the returned requestBody.max_tokens
falls back to the model mapping maxOutput (and respects any reasoning floor) and
(2) when caller provides max_tokens below the reasoning floor and
supportsReasoning/reasoning_effort is set, prepareRequestBody raises it to the
reasoning floor; create two tests similar to the Anthropic ones that call
prepareRequestBody with provider "bedrock" and a bedrock model id, asserting
max_tokens equals the model's maxOutput when omitted and equals the reasoning
floor when caller value is too low with supportsReasoning true and
reasoning_effort set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 60849217-9dfe-423b-90e0-edd7c4b3b1e7

📥 Commits

Reviewing files that changed from the base of the PR and between 26270c9 and 8be204c.

📒 Files selected for processing (2)
  • packages/actions/src/prepare-request-body.spec.ts
  • packages/actions/src/prepare-request-body.ts

Comment on lines +2101 to +2106
if (!inferenceConfig.maxTokens) {
inferenceConfig.maxTokens =
max_tokens ??
Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);
} else if (inferenceConfig.maxTokens < reasoningFloor) {
inferenceConfig.maxTokens = reasoningFloor;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use an explicit undefined check for Bedrock maxTokens initialization.

At Line 2101, if (!inferenceConfig.maxTokens) treats 0 as “unset”. With reasoning enabled, an explicit max_tokens = 0 can remain 0 (invalid floor) instead of being raised to reasoningFloor.

Suggested fix
-					if (!inferenceConfig.maxTokens) {
+					if (inferenceConfig.maxTokens === undefined) {
 						inferenceConfig.maxTokens =
 							max_tokens ??
 							Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);
-					} else if (inferenceConfig.maxTokens < reasoningFloor) {
+					}
+					if (inferenceConfig.maxTokens < reasoningFloor) {
 						inferenceConfig.maxTokens = reasoningFloor;
 					}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!inferenceConfig.maxTokens) {
inferenceConfig.maxTokens =
max_tokens ??
Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);
} else if (inferenceConfig.maxTokens < reasoningFloor) {
inferenceConfig.maxTokens = reasoningFloor;
if (inferenceConfig.maxTokens === undefined) {
inferenceConfig.maxTokens =
max_tokens ??
Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);
}
if (inferenceConfig.maxTokens < reasoningFloor) {
inferenceConfig.maxTokens = reasoningFloor;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/prepare-request-body.ts` around lines 2101 - 2106, The
current truthy check `if (!inferenceConfig.maxTokens)` treats 0 as unset; change
it to an explicit undefined/null check (e.g., `if (inferenceConfig.maxTokens ===
undefined || inferenceConfig.maxTokens === null)`) so that a deliberate 0 value
is preserved and then handled by the subsequent `else if
(inferenceConfig.maxTokens < reasoningFloor)` branch; update the branch around
inferenceConfig.maxTokens, max_tokens, bedrockModelMaxOutput, and reasoningFloor
accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants