Skip to content

Python: CosmosHistoryProvider Code interpreter tool calls are saved chunk by chunk #5793

@moonbox3

Description

@moonbox3

Discussed in #5715

Originally posted by CristinaStn May 8, 2026
When using code interpreter with streaming, the cosmos history provider receives multiple code interpreter content items each with the same call_id but different sequence_number values. This requires custom aggregation logic to prevent storing hundreds of redundant chunks. Finally, there is also complete code_interpreter script along those chunk-by-chunk items.
Here is an example of a CosmosDB session message:

{
    "id": "...",
    "session_id": "...",
    "sort_key": 1778239085946190300,
    "source_id": "azure_cosmos_history",
    "message": {
        "type": "message",
        "role": "assistant",
        "contents": [
            {
                "type": "text_reasoning",
                "text": "",
                "id": "rs_0f37a82e9edb89710069fdc661205c8190a71ebbc318e9b3b8",
                "additional_properties": {}
            },
            {
                "type": "code_interpreter_tool_result",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "outputs": [],
                "additional_properties": {}
            },
            {
                "type": "code_interpreter_tool_call",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "inputs": [
                    {
                        "type": "text",
                        "text": "import",
                        "additional_properties": {
                            "output_index": 1,
                            "sequence_number": 6,
                            "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                        }
                    }
                ],
                "additional_properties": {
                    "output_index": 1,
                    "sequence_number": 6,
                    "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                }
            },
            {
                "type": "code_interpreter_tool_call",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "inputs": [
                    {
                        "type": "text",
                        "text": " pandas",
                        "additional_properties": {
                            "output_index": 1,
                            "sequence_number": 7,
                            "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                        }
                    }
                ],
                "additional_properties": {
                    "output_index": 1,
                    "sequence_number": 7,
                    "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                }
            },

....................5000 more json lines............
{
                "type": "code_interpreter_tool_call",
                "call_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83",
                "inputs": [
                    {
                        "type": "text",
                        "text": "import pandas as pd\r\n\r\ndata = [\r\n    {\"Color\": \"Black\", \"Age\": 2, \"name\": \"Luna\"},\r\n    {\"Color\": \"White\", \"Age\": 4, \"name\": \"Snowball\"},\r\n    {\"Color\": \"Calico\", \"Age\": 1, \"name\": \"Patches\"},\r\n    {\"Color\": \"Tabby\", \"Age\": 5, \"name\": \"Tiger\"},\r\n    {\"Color\": \"Gray\", \"Age\": 3, \"name\": \"Smokey\"},\r\n    {\"Color\": \"Orange\", \"Age\": 7, \"name\": \"Marmalade\"},\r\n    {\"Color\": \"Tortoiseshell\", \"Age\": 2, \"name\": \"Pebbles\"},\r\n    {\"Color\": \"Brown\", \"Age\": 6, \"name\": \"Mocha\"},\r\n    {\"Color\": \"Cream\", \"Age\": 8, \"name\": \"Biscuit\"},\r\n    {\"Color\": \"Blue\", \"Age\": 10, \"name\": \"Misty\"},\r\n]\r\n\r\ndf = pd.DataFrame(data, columns=[\"Color\", \"Age\", \"name\"])\r\nfile_path = \"/mnt/data/cats.xlsx\"\r\ndf.to_excel(file_path, index=False)\r\nfile_path",
                        "additional_properties": {
                            "output_index": 1,
                            "sequence_number": 261,
                            "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                        }
                    }
                ],
                "additional_properties": {
                    "output_index": 1,
                    "sequence_number": 261,
                    "item_id": "ci_0f37a82e9edb89710069fdc66853ec8190be1cbda7c3a44e83"
                }
            }
]

Expected Behaviour:

By the time save_messages() is called on the history provider, each code interpreter tool call/result should appear as a single content item with the complete, aggregated text.

Workaround:

I have implemented a CustomCosmosHistoryProvider which overwrites save_messages, aggregating all code_interpreter_tool_calls with same id. However, there are many drawbacks of such approach due to maintenance debt as agent-framework release new features, testing burden, breaking changes risk.

Code sample:

Use sample CosmosHistoryProvider and enable streaming: https://github.com/microsoft/agent-framework/blob/main/python/samples/02-agents/conversations/cosmos_history_provider_conversation_persistence.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpython

    Type

    No fields configured for Bug.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions