fix(gateway): respect requested max_tokens for Anthropic by steebchen-bot · Pull Request #2289 · theopenco/llmgateway

steebchen-bot · 2026-05-14T15:51:16Z

Bug report

Smoking gun in the export — lastStepOutput: 1024. That's the LLM Gateway / proxy's per-step output cap (Anthropic's old default), overriding our requested MAX_OUTPUT_TOKENS = 32_000. Opus on Anthropic-native is fine, but llmgateway/claude-opus-4-7 is enforcing 1024 server-side. The model emits a tool call that gets cut mid-emission → SDK parses nothing → loop exits silently.

Reproduction: calling claude-opus-4-7 through llmgateway with max_tokens=32000 (or any large value) returns at ~1024 output tokens; tool calls are cut mid-emission causing the SDK to fail silently. Calling Anthropic directly with the same model + same large max_tokens works fine — so the cap is being introduced in our gateway/proxy layer.

Root cause

packages/actions/src/prepare-request-body.ts, the Anthropic case:

const thinkingBudget = getThinkingBudget(reasoning_effort);
const minMaxTokens = Math.max(1024, thinkingBudget + 1000);
requestBody.max_tokens = max_tokens ?? minMaxTokens;

Anthropic's Messages API requires max_tokens to be set, so the adapter unconditionally fills it in. When the caller didn't supply one (or it was dropped earlier in the request pipeline), we fell back to Math.max(1024, thinkingBudget + 1000) — Anthropic's historical 1024 default for Claude 2. That value then went over the wire to upstream Anthropic, which honored it and truncated mid-emission. Same pattern existed in the AWS Bedrock branch.

Fix

Fall back to the model's own advertised maxOutput (e.g. 128000 for claude-opus-4-7) rather than a flat 1024. Caller-supplied values continue to flow through verbatim. Applied to both:

Native Anthropic provider (case "anthropic")
AWS Bedrock Anthropic provider (case "aws-bedrock" reasoning branch)

The Bedrock branch still enforces a floor of thinkingBudget + 1000 when the caller did supply a max_tokens that's too small to fit the thinking budget — that's a separate correctness guarantee, not a default-cap.

Test coverage

Added three regression tests in prepare-request-body.spec.ts:

Caller-supplied max_tokens is forwarded verbatim — sends 32000, asserts request body carries 32000.
Omitted max_tokens falls back to model maxOutput — asserts Opus 4.7 gets 128000 (was 1024 before the fix; this test fails on the old code, confirmed locally).
Reasoning + caller max_tokens — ensures enabling reasoning_effort: high doesn't override caller's 32000.

All 47 tests in the file pass; full actions package suite (127 tests across 4 files) green.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved token limit handling for Anthropic and AWS Bedrock providers when reasoning features are enabled, ensuring user-specified token limits are properly respected and default values align with actual model capabilities rather than artificial minimums.
Tests
- Added comprehensive test coverage for token limit behavior and reasoning feature interactions across supported providers.

When the caller omitted max_tokens, the Anthropic and AWS Bedrock adapters in prepareRequestBody fell back to Math.max(1024, ...) — Anthropic's historical 1024 default for Claude 2. This silently truncated large responses and cut tool calls mid-emission, causing SDK parses to fail and agent loops to exit. Fall back to the model's advertised maxOutput (from packages/models) instead, e.g. 128000 for claude-opus-4-7. Caller-supplied values continue to flow through untouched. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

coderabbitai · 2026-05-14T15:51:33Z

Walkthrough

This PR updates token budget defaults for Anthropic and AWS Bedrock when reasoning/thinking is enabled. Anthropic now uses the model's declared maximum output instead of a fixed 1024 cap. AWS Bedrock's token validation becomes conditional, respecting both caller-provided limits and reasoning budget requirements. Three new tests verify Anthropic's behavior.

Changes

Token Budget Defaults for Reasoning Modes

Layer / File(s)	Summary
Anthropic token budget defaults and tests `packages/actions/src/prepare-request-body.ts`, `packages/actions/src/prepare-request-body.spec.ts`	Anthropic's fallback `max_tokens` now uses the model's advertised `maxOutput` combined with thinking budget, replacing a fixed 1024 minimum. Three test cases verify caller-supplied `max_tokens` is forwarded without capping, omitted `max_tokens` defaults to `maxOutput`, and reasoning preserves the caller's token budget.
AWS Bedrock token budget validation under reasoning `packages/actions/src/prepare-request-body.ts`	AWS Bedrock's `inferenceConfig.maxTokens` validation changes to conditional logic: unset values use caller `max_tokens` or the greater of provider `maxOutput` and `reasoningFloor`; values below `reasoningFloor` are raised to meet the minimum.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

theopenco/llmgateway#2034: Both PRs modify AWS Bedrock reasoning/thinking path token budget logic in packages/actions/src/prepare-request-body.ts.
theopenco/llmgateway#2079: Both PRs update Anthropic and AWS Bedrock token budgeting behavior for thinking/reasoning modes in prepare-request-body.ts.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title concisely and accurately summarizes the main change: fixing the Anthropic max_tokens handling to respect caller-supplied values instead of enforcing a 1024 cap.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/anthropic-max-tokens-1024-cap

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/actions/src/prepare-request-body.spec.ts (1)
232-299: ⚡ Quick win

Add Bedrock parity regression tests for the new maxTokens path.

These new cases cover Anthropic well, but this PR also changes Bedrock maxTokens fallback/clamp behavior. Please add companion tests for: (1) omitted max_tokens fallback to model maxOutput/reasoning floor, and (2) caller max_tokens below reasoning floor being raised.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/actions/src/prepare-request-body.spec.ts` around lines 232 - 299,
The test suite adds Anthropic max_tokens cases but misses Bedrock parity; add
tests exercising prepareRequestBody for the Bedrock provider checking (1) when
caller omits max_tokens the returned requestBody.max_tokens falls back to the
model mapping maxOutput (and respects any reasoning floor) and (2) when caller
provides max_tokens below the reasoning floor and
supportsReasoning/reasoning_effort is set, prepareRequestBody raises it to the
reasoning floor; create two tests similar to the Anthropic ones that call
prepareRequestBody with provider "bedrock" and a bedrock model id, asserting
max_tokens equals the model's maxOutput when omitted and equals the reasoning
floor when caller value is too low with supportsReasoning true and
reasoning_effort set.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/actions/src/prepare-request-body.ts`:
- Around line 2101-2106: The current truthy check `if
(!inferenceConfig.maxTokens)` treats 0 as unset; change it to an explicit
undefined/null check (e.g., `if (inferenceConfig.maxTokens === undefined ||
inferenceConfig.maxTokens === null)`) so that a deliberate 0 value is preserved
and then handled by the subsequent `else if (inferenceConfig.maxTokens <
reasoningFloor)` branch; update the branch around inferenceConfig.maxTokens,
max_tokens, bedrockModelMaxOutput, and reasoningFloor accordingly.

---

Nitpick comments:
In `@packages/actions/src/prepare-request-body.spec.ts`:
- Around line 232-299: The test suite adds Anthropic max_tokens cases but misses
Bedrock parity; add tests exercising prepareRequestBody for the Bedrock provider
checking (1) when caller omits max_tokens the returned requestBody.max_tokens
falls back to the model mapping maxOutput (and respects any reasoning floor) and
(2) when caller provides max_tokens below the reasoning floor and
supportsReasoning/reasoning_effort is set, prepareRequestBody raises it to the
reasoning floor; create two tests similar to the Anthropic ones that call
prepareRequestBody with provider "bedrock" and a bedrock model id, asserting
max_tokens equals the model's maxOutput when omitted and equals the reasoning
floor when caller value is too low with supportsReasoning true and
reasoning_effort set.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 60849217-9dfe-423b-90e0-edd7c4b3b1e7

📥 Commits

Reviewing files that changed from the base of the PR and between 26270c9 and 8be204c.

📒 Files selected for processing (2)

packages/actions/src/prepare-request-body.spec.ts
packages/actions/src/prepare-request-body.ts

coderabbitai · 2026-05-14T15:54:25Z

+					if (!inferenceConfig.maxTokens) {
+						inferenceConfig.maxTokens =
+							max_tokens ??
+							Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);
+					} else if (inferenceConfig.maxTokens < reasoningFloor) {
+						inferenceConfig.maxTokens = reasoningFloor;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use an explicit undefined check for Bedrock maxTokens initialization.

At Line 2101, if (!inferenceConfig.maxTokens) treats 0 as “unset”. With reasoning enabled, an explicit max_tokens = 0 can remain 0 (invalid floor) instead of being raised to reasoningFloor.

Suggested fix

- if (!inferenceConfig.maxTokens) { + if (inferenceConfig.maxTokens === undefined) { inferenceConfig.maxTokens = max_tokens ?? Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor); - } else if (inferenceConfig.maxTokens < reasoningFloor) { + } + if (inferenceConfig.maxTokens < reasoningFloor) { inferenceConfig.maxTokens = reasoningFloor; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!inferenceConfig.maxTokens) {

inferenceConfig.maxTokens =

max_tokens ??

Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);

} else if (inferenceConfig.maxTokens < reasoningFloor) {

inferenceConfig.maxTokens = reasoningFloor;

if (inferenceConfig.maxTokens === undefined) {

inferenceConfig.maxTokens =

max_tokens ??

Math.max(bedrockModelMaxOutput ?? reasoningFloor, reasoningFloor);

}

if (inferenceConfig.maxTokens < reasoningFloor) {

inferenceConfig.maxTokens = reasoningFloor;

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/actions/src/prepare-request-body.ts` around lines 2101 - 2106, The current truthy check `if (!inferenceConfig.maxTokens)` treats 0 as unset; change it to an explicit undefined/null check (e.g., `if (inferenceConfig.maxTokens === undefined || inferenceConfig.maxTokens === null)`) so that a deliberate 0 value is preserved and then handled by the subsequent `else if (inferenceConfig.maxTokens < reasoningFloor)` branch; update the branch around inferenceConfig.maxTokens, max_tokens, bedrockModelMaxOutput, and reasoningFloor accordingly.

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

smakosh approved these changes May 15, 2026

View reviewed changes

Merge branch 'main' into fix/anthropic-max-tokens-1024-cap

a797001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): respect requested max_tokens for Anthropic#2289

fix(gateway): respect requested max_tokens for Anthropic#2289
steebchen-bot wants to merge 2 commits into
mainfrom
fix/anthropic-max-tokens-1024-cap

steebchen-bot commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

steebchen-bot commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Root cause

Fix

Test coverage

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steebchen-bot commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading