fix: Bedrock extended thinking configuration#1872
Draft
marcofranssen wants to merge 4 commits into
Draft
Conversation
b64e992 to
bba6217
Compare
Bedrock only requires thinking blocks for the last assistant message before tool results. Preserving them in all prior turns causes token counts to compound across multi-turn conversations (1.4M+ tokens seen in practice). Find the last assistant turn containing thinking parts and only emit ReasoningContent blocks there; earlier turns have them stripped.
Kubernetes tool responses (kubectl output, YAML, logs) can be many KBs each. With no history limit, long sessions accumulate millions of tokens across replayed tool results. Truncate tool responses in all but the most recent user turn to 2000 chars (~500 tokens), keeping full fidelity only where the model actually needs it for the current reasoning step.
Tool responses (kubectl output, YAML blobs) are serialized verbatim into span attributes by the upstream ADK. A single large response can exceed Tempo's 4MB gRPC message limit. Wrap the exporter with a truncating layer that caps any string attribute at 16KB before forwarding to the collector.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #1871
generateNonStreaming — handle ContentBlockMemberReasoningContent in responses
When Bedrock returns a thinking block, it now gets converted to genai.Part{Thought: true, Text: ..., ThoughtSignature: ...}
and included in the response parts. Both ReasoningText and Redacted variants are handled.
generateStreaming — accumulate reasoning deltas and emit in final response
Added reasoningBlocks map[int32]*streamingReasoningBlock to accumulate text, signature, and redacted deltas. The completed
blocks are prepended to finalParts (before text and tool calls), preserving the order Bedrock expects on round-trips.
convertGenaiContentsToBedrockMessages — echo thinking parts back as ContentBlockMemberReasoningContent
When building the messages array for a subsequent request, part.Thought == true parts are now converted back to the correct
Bedrock ReasoningContentBlock type. This is the critical path — without it, thinking blocks are silently dropped when the ADK
builds the tool-result turn, and Bedrock sees an assistant turn with a toolUse block but no preceding thinking block, causing
the ValidationException: toolUse.input is empty.