fix: Bedrock extended thinking configuration by marcofranssen · Pull Request #1872 · kagent-dev/kagent

marcofranssen · 2026-05-15T14:18:03Z

Resolves #1871

generateNonStreaming — handle ContentBlockMemberReasoningContent in responses
When Bedrock returns a thinking block, it now gets converted to genai.Part{Thought: true, Text: ..., ThoughtSignature: ...}
and included in the response parts. Both ReasoningText and Redacted variants are handled.
generateStreaming — accumulate reasoning deltas and emit in final response
Added reasoningBlocks map[int32]*streamingReasoningBlock to accumulate text, signature, and redacted deltas. The completed
blocks are prepended to finalParts (before text and tool calls), preserving the order Bedrock expects on round-trips.
convertGenaiContentsToBedrockMessages — echo thinking parts back as ContentBlockMemberReasoningContent
When building the messages array for a subsequent request, part.Thought == true parts are now converted back to the correct
Bedrock ReasoningContentBlock type. This is the critical path — without it, thinking blocks are silently dropped when the ADK
builds the tool-result turn, and Bedrock sees an assistant turn with a toolUse block but no preceding thinking block, causing
the ValidationException: toolUse.input is empty.

Bedrock only requires thinking blocks for the last assistant message before tool results. Preserving them in all prior turns causes token counts to compound across multi-turn conversations (1.4M+ tokens seen in practice). Find the last assistant turn containing thinking parts and only emit ReasoningContent blocks there; earlier turns have them stripped.

Kubernetes tool responses (kubectl output, YAML, logs) can be many KBs each. With no history limit, long sessions accumulate millions of tokens across replayed tool results. Truncate tool responses in all but the most recent user turn to 2000 chars (~500 tokens), keeping full fidelity only where the model actually needs it for the current reasoning step.

Tool responses (kubectl output, YAML blobs) are serialized verbatim into span attributes by the upstream ADK. A single large response can exceed Tempo's 4MB gRPC message limit. Wrap the exporter with a truncating layer that caps any string attribute at 16KB before forwarding to the collector.

fix: Bedrock extended thinking configuration

bba6217

github-actions Bot added the bug Something isn't working label May 15, 2026

marcofranssen force-pushed the fix-bedrock-thinking branch from b64e992 to bba6217 Compare May 15, 2026 14:18

marcofranssen added 3 commits May 15, 2026 16:47

marcofranssen mentioned this pull request May 15, 2026

fix(bedrock): preserve thinking blocks in multi-turn tool use #1873

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Bedrock extended thinking configuration#1872

fix: Bedrock extended thinking configuration#1872
marcofranssen wants to merge 4 commits into
kagent-dev:mainfrom
marcofranssen:fix-bedrock-thinking

marcofranssen commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcofranssen commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant