Skip to content

fix: Bedrock extended thinking configuration#1872

Draft
marcofranssen wants to merge 4 commits into
kagent-dev:mainfrom
marcofranssen:fix-bedrock-thinking
Draft

fix: Bedrock extended thinking configuration#1872
marcofranssen wants to merge 4 commits into
kagent-dev:mainfrom
marcofranssen:fix-bedrock-thinking

Conversation

@marcofranssen
Copy link
Copy Markdown
Contributor

Resolves #1871

  1. generateNonStreaming — handle ContentBlockMemberReasoningContent in responses
    When Bedrock returns a thinking block, it now gets converted to genai.Part{Thought: true, Text: ..., ThoughtSignature: ...}
    and included in the response parts. Both ReasoningText and Redacted variants are handled.

  2. generateStreaming — accumulate reasoning deltas and emit in final response
    Added reasoningBlocks map[int32]*streamingReasoningBlock to accumulate text, signature, and redacted deltas. The completed
    blocks are prepended to finalParts (before text and tool calls), preserving the order Bedrock expects on round-trips.

  3. convertGenaiContentsToBedrockMessages — echo thinking parts back as ContentBlockMemberReasoningContent
    When building the messages array for a subsequent request, part.Thought == true parts are now converted back to the correct
    Bedrock ReasoningContentBlock type. This is the critical path — without it, thinking blocks are silently dropped when the ADK
    builds the tool-result turn, and Bedrock sees an assistant turn with a toolUse block but no preceding thinking block, causing
    the ValidationException: toolUse.input is empty.

@github-actions github-actions Bot added the bug Something isn't working label May 15, 2026
@marcofranssen marcofranssen force-pushed the fix-bedrock-thinking branch from b64e992 to bba6217 Compare May 15, 2026 14:18
Bedrock only requires thinking blocks for the last assistant message
before tool results. Preserving them in all prior turns causes token
counts to compound across multi-turn conversations (1.4M+ tokens seen
in practice).

Find the last assistant turn containing thinking parts and only emit
ReasoningContent blocks there; earlier turns have them stripped.
Kubernetes tool responses (kubectl output, YAML, logs) can be many KBs
each. With no history limit, long sessions accumulate millions of tokens
across replayed tool results. Truncate tool responses in all but the most
recent user turn to 2000 chars (~500 tokens), keeping full fidelity only
where the model actually needs it for the current reasoning step.
Tool responses (kubectl output, YAML blobs) are serialized verbatim into
span attributes by the upstream ADK. A single large response can exceed
Tempo's 4MB gRPC message limit. Wrap the exporter with a truncating layer
that caps any string attribute at 16KB before forwarding to the collector.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Bedrock prompt caching

1 participant