fix(bedrock): preserve thinking blocks in multi-turn tool use#1873
fix(bedrock): preserve thinking blocks in multi-turn tool use#1873mesutoezdil wants to merge 5 commits into
Conversation
|
@supreme-gg-gg could you test this with extended thinking and tool use? That is the scenario that was broken. |
3eeb362 to
78e2467
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses Amazon Bedrock (Converse API) extended thinking + tool-use multi-turn behavior by preserving Bedrock “reasoning/thinking” content blocks across turns so subsequent requests validate correctly, and it also adds Helm chart support for exposing controller metrics (service, RBAC, and optional ServiceMonitor).
Changes:
- Bedrock: parse/accumulate reasoning content blocks (non-streaming + streaming) and round-trip them back into subsequent requests; omit
temperature/top_pwhen thinking is enabled. - Helm: add
controller.metrics.*values and templates for a dedicated metrics Service, optional ServiceMonitor, and RBAC to support secure metrics scraping. - Tests: add unit tests for thinking-block round-trip conversion and add Helm unittest coverage for metrics resources.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
helm/kagent/values.yaml |
Adds configurable controller.metrics settings (port, secure mode, ServiceMonitor knobs). |
helm/kagent/templates/controller-deployment.yaml |
Exposes an optional metrics container port when metrics are enabled. |
helm/kagent/templates/controller-configmap.yaml |
Wires metrics bind address + secure flag into controller config when enabled. |
helm/kagent/templates/controller-metrics-service.yaml |
Adds a dedicated ClusterIP Service for metrics. |
helm/kagent/templates/controller-servicemonitor.yaml |
Adds an optional Prometheus Operator ServiceMonitor for controller metrics. |
helm/kagent/templates/rbac/metrics-auth-role.yaml |
Adds ClusterRole/Binding for delegated auth (TokenReview/SAR) when secure metrics enabled. |
helm/kagent/templates/rbac/metrics-reader-role.yaml |
Adds ClusterRole allowing GET /metrics when metrics enabled. |
helm/kagent/tests/controller-deployment_test.yaml |
Ensures metrics port + configmap env are rendered when enabled and not otherwise. |
helm/kagent/tests/controller-service_test.yaml |
Ensures main controller Service does not gain extra ports when metrics enabled. |
helm/kagent/tests/controller-metrics-service_test.yaml |
Tests rendering and shape of the dedicated metrics Service. |
helm/kagent/tests/controller-servicemonitor_test.yaml |
Tests ServiceMonitor conditional rendering and TLS settings. |
helm/kagent/tests/metrics-rbac_test.yaml |
Tests metrics RBAC templates render conditions and rule contents. |
go/adk/pkg/models/bedrock.go |
Implements reasoning/thinking block parsing (streaming + non-streaming), round-trip conversion, and inference-config adjustments under thinking. |
go/adk/pkg/models/bedrock_test.go |
Adds a thinking-block conversion test and a minimal thinking/temperature test case. |
Comments suppressed due to low confidence (1)
go/adk/pkg/models/bedrock.go:383
- In streaming mode the final
Partsordering does not preserve the original Bedrock content-block order: all thinking parts are prepended, then all text is concatenated, then all tool calls are appended. Bedrock requires reasoning/tool blocks to be sent back in the same order as returned; reordering by type can break multi-turn tool-use validation. Consider accumulating completed blocks keyed byContentBlockIndex(including text/tool/thinking) and assemblingfinalPartsby increasing index (or by the exact stop/start sequence) instead of grouping by kind.
// thinking parts first; block order must match what Bedrock returned.
finalParts := []*genai.Part{}
finalParts = append(finalParts, completedThinkingParts...)
text := aggregatedText.String()
if text != "" {
finalParts = append(finalParts, &genai.Part{Text: text})
}
// Add completed tool calls
finalParts = append(finalParts, completedToolCalls...)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func TestThinkingDropsTemperatureAndTopP(t *testing.T) { | ||
| cases := []struct { | ||
| name string | ||
| fields map[string]any | ||
| wantDrop bool | ||
| }{ | ||
| {"no thinking", nil, false}, | ||
| {"thinking key present", map[string]any{"thinking": map[string]any{"type": "enabled", "budget_tokens": 2000}}, true}, | ||
| } | ||
| for _, tc := range cases { | ||
| t.Run(tc.name, func(t *testing.T) { | ||
| _, thinkingEnabled := tc.fields["thinking"] | ||
| if thinkingEnabled != tc.wantDrop { | ||
| t.Errorf("thinkingEnabled = %v, want %v", thinkingEnabled, tc.wantDrop) | ||
| } |
| metrics: | ||
| enabled: false | ||
| port: 8443 | ||
| # -- Serve metrics over HTTPS with authn/authz, matching the controller binary default. | ||
| # Set to false for plain HTTP scraping in environments where TLS is handled externally. | ||
| secure: true | ||
| serviceMonitor: | ||
| enabled: false | ||
| interval: 30s | ||
| scrapeTimeout: 10s | ||
| labels: {} |
| interval: {{ .Values.controller.metrics.serviceMonitor.interval }} | ||
| scrapeTimeout: {{ .Values.controller.metrics.serviceMonitor.scrapeTimeout }} | ||
| {{- if .Values.controller.metrics.secure }} | ||
| scheme: https |
3c9ede6 to
98103aa
Compare
|
I was doing an atempt here as well, but couldn't get it to work just yet, allthough there might be parts useful for this PR. |
0077429 to
942405c
Compare
When extended thinking is active, Bedrock returns thinking content blocks together with tool use blocks. The API requires these blocks to be sent back unmodified in the next request. Without them, Bedrock returns a ValidationException with toolUse.input is empty. Only emit thinking blocks for the last assistant turn before tool results. Sending them in all turns causes token counts to compound across long sessions. Truncate tool results in historical turns to 2000 chars. Older turns do not need full fidelity for large kubectl or YAML outputs. Wrap the OTLP span exporter to cap string attributes at 16KB, preventing large tool responses from exceeding Tempo's 4MB gRPC message limit. Fixes kagent-dev#1870 Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
942405c to
bf0a8e7
Compare
@marcofranssen any chance you can give this PR a smoke test before I review/approve? |
Extract inference config construction into a testable helper so that the temperature/top_p exclusion logic for extended thinking can be exercised directly without mocking the AWS Bedrock client. Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
06a04b0 to
9ef3656
Compare
Fixes #1870
When extended thinking is active, Bedrock returns thinking content blocks together with tool use blocks.
The API requires these blocks to be sent back unmodified in the next request.
Without them, Bedrock returns a ValidationException with toolUse.input is empty.
See the API requirement: https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html
Tests added for thinking block round-trip and temperature/top_p exclusion.
cc @marcofranssen @supreme-gg-gg @EItanya