Skip to content

Python: coalesce code interpreter history chunks#5801

Merged
eavanvalkenburg merged 3 commits into
microsoft:mainfrom
he-yufeng:fix/code-interpreter-history-chunks
Jun 1, 2026
Merged

Python: coalesce code interpreter history chunks#5801
eavanvalkenburg merged 3 commits into
microsoft:mainfrom
he-yufeng:fix/code-interpreter-history-chunks

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

Fixes #5793.

Summary

  • Coalesce streamed code_interpreter_tool_call and code_interpreter_tool_result content by call_id during response finalization.
  • When a later done event carries the full code, keep that full value instead of storing both chunk deltas and the complete script.
  • Add a history-provider regression test for the Cosmos-style chunked code interpreter shape.

To verify

  • uv run pytest packages\core\tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest
  • uv run pytest packages\core\tests\core\test_sessions.py -q --basetemp .tmp\pytest -p no:cacheprovider
  • uv run pytest packages\core\tests\core\test_types.py -q --basetemp .tmp\pytest -p no:cacheprovider
  • uv run pytest packages\openai\tests\openai\test_openai_chat_client.py -q -k "code_interpreter" --basetemp .tmp\pytest -p no:cacheprovider
  • uv run ruff check packages\core\agent_framework_types.py packages\core\tests\core\test_sessions.py
  • uv run ruff format --check packages\core\agent_framework_types.py packages\core\tests\core\test_sessions.py
  • uv run mypy packages\core\agent_framework_types.py
  • uv run python -m py_compile packages\core\agent_framework_types.py packages\core\tests\core\test_sessions.py
  • git diff --check

Copilot AI review requested due to automatic review settings May 13, 2026 06:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses streamed code-interpreter history bloat in the Python core by coalescing code_interpreter_tool_call / code_interpreter_tool_result content items with the same call_id (or item_id) during response finalization, ensuring history providers receive a single aggregated item per logical tool call.

Changes:

  • Add response-finalization logic to coalesce code-interpreter tool call/result content by (type, call_id).
  • Implement merge behavior that prefers a later “done” event carrying the full code over keeping both deltas and the complete script.
  • Add a regression test ensuring history providers store the coalesced code-interpreter content shape.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/core/agent_framework/_types.py Adds code-interpreter coalescing/merge helpers and applies them during _finalize_response.
python/packages/core/tests/core/test_sessions.py Adds a history-provider regression test verifying coalesced code-interpreter chunks are stored as a single content item.

@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented May 14, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _types.py11879791%59, 68–69, 123, 128, 147, 149, 153, 157, 159, 161, 163, 181, 185, 211, 233, 238, 243, 247, 277, 690–691, 850–851, 1286, 1358, 1393, 1413, 1423, 1475, 1607–1609, 1791, 1894–1899, 1924, 1979, 1984, 1994, 2002, 2009–2013, 2031, 2104, 2112–2114, 2119, 2222, 2245, 2500, 2524, 2623, 2877, 3087, 3146, 3185, 3196, 3198–3202, 3204, 3207–3215, 3225, 3314, 3451, 3456, 3461, 3466, 3470, 3554–3556, 3585, 3673–3677
TOTAL37460436088% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
7452 34 💤 0 ❌ 0 🔥 1m 58s ⏱️

@moonbox3
Copy link
Copy Markdown
Contributor

@he-yufeng please have a look at the failing code quality failures. Thank you.

@he-yufeng
Copy link
Copy Markdown
Contributor Author

Thanks for the heads up. I pushed a small follow-up that narrows the nested content list types before iteration/deepcopy, which addresses the pyright failures from the package check.

Validated locally:

  • python -m py_compile agent_framework\_types.py tests\core\test_sessions.py
  • python -m pytest tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest -p no:cacheprovider
  • python -m ruff check agent_framework\_types.py tests\core\test_sessions.py
  • uv run pyright packages\core\agent_framework\_types.py
  • git diff --check

@he-yufeng
Copy link
Copy Markdown
Contributor Author

Thanks, I pushed a follow-up for the mypy redundant-cast failure in packages/core/agent_framework/_types.py.

Validated locally:

  • uv run mypy packages\core\agent_framework\_types.py
  • uv run python scripts\workspace_poe_tasks.py ci-mypy with UTF-8 console env on Windows
  • uv run ruff check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run ruff format --check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run python -m py_compile packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run pytest packages\core\tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest -p no:cacheprovider
  • git diff --check

@he-yufeng he-yufeng force-pushed the fix/code-interpreter-history-chunks branch from 118ecb5 to a4a0a73 Compare May 14, 2026 13:31
@he-yufeng
Copy link
Copy Markdown
Contributor Author

I also rebased the branch onto current upstream/main after the mypy follow-up. The branch now carries only the three PR commits on top of main.

Revalidated after the rebase:

  • uv run mypy packages\core\agent_framework\_types.py
  • uv run ruff check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run pytest packages\core\tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest -p no:cacheprovider
  • git diff --check

@he-yufeng he-yufeng force-pushed the fix/code-interpreter-history-chunks branch from d00c77a to 32298f5 Compare May 15, 2026 05:53
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased onto current main and addressed the type-check failure from the previous run.

The fix only narrows the helper casts in _merge_content_item_lists() so Pyright/Mypy no longer infer unknown list elements.

Local validation:

uv run pyright packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
uv run mypy packages\core\agent_framework\_types.py
uv run pytest packages\core\tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest -p no:cacheprovider
uv run ruff check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
uv run ruff format --check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
git diff --check

Result: pyright/mypy clean; targeted test passed; ruff and diff-check passed.

@eavanvalkenburg eavanvalkenburg force-pushed the fix/code-interpreter-history-chunks branch from 32298f5 to 571a3fc Compare June 1, 2026 13:17
@eavanvalkenburg eavanvalkenburg enabled auto-merge June 1, 2026 13:19
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Jun 1, 2026
Merged via the queue into microsoft:main with commit 78d175a Jun 1, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: CosmosHistoryProvider Code interpreter tool calls are saved chunk by chunk

4 participants