Skip to content

fix: extract usage from ResponsesAPI streaming events#2573

Open
varad-ahirwadkar wants to merge 1 commit intokubernetes-sigs:mainfrom
varad-ahirwadkar:fix-response
Open

fix: extract usage from ResponsesAPI streaming events#2573
varad-ahirwadkar wants to merge 1 commit intokubernetes-sigs:mainfrom
varad-ahirwadkar:fix-response

Conversation

@varad-ahirwadkar
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
Responses API returns usage under response.usage, which was not previously handled by the streaming parser. This change adds support for that format while keeping compatibility with existing ChatCompletions and vLLM streaming responses.

Which issue(s) this PR fixes:
Fixes #2482

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 13, 2026
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 13, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @varad-ahirwadkar. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 13, 2026
@netlify
Copy link

netlify bot commented Mar 13, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 107d3e4
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69bbb2fc8e230700081834f9
😎 Deploy Preview https://deploy-preview-2573--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@varad-ahirwadkar varad-ahirwadkar force-pushed the fix-response branch 2 times, most recently from ab88f85 to 55d24ce Compare March 13, 2026 17:52
@varad-ahirwadkar varad-ahirwadkar changed the title [WIP] fix: extract usage from ResponsesAPI streaming events fix: extract usage from ResponsesAPI streaming events Mar 13, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2026
@varad-ahirwadkar
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot
Copy link
Contributor

@varad-ahirwadkar: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ahg-g
Copy link
Contributor

ahg-g commented Mar 14, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 14, 2026
@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2026

Thanks, what kind of testing was done to validate the fix?

@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2026

/assign @zetxqx

Copy link
Contributor

@zetxqx zetxqx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, can you add a unit test of this, with some real streaming responses data?

@varad-ahirwadkar
Copy link
Contributor Author

what kind of testing was done to validate the fix?

I validated the fix by running the TestOpenAIParser_ParseResponse_Streaming tests:

go test -v ./pkg/epp/framework/plugins/requesthandling/parsers/openai/... -run TestOpenAIParser_ParseResponse_Streaming

=== RUN   TestOpenAIParser_ParseResponse_Streaming
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Single_data_chunk_with_usage
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Usage_and_DONE_in_the_same_multi-line_response
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Chunk_without_usage_returns_ParsedResponse_with_nil_usage
=== RUN   TestOpenAIParser_ParseResponse_Streaming/DONE_message_returns_error
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Malformed_JSON_in_stream_(skipped)
--- PASS: TestOpenAIParser_ParseResponse_Streaming (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Single_data_chunk_with_usage (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Usage_and_DONE_in_the_same_multi-line_response (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Chunk_without_usage_returns_ParsedResponse_with_nil_usage (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/DONE_message_returns_error (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Malformed_JSON_in_stream_(skipped) (0.00s)
PASS
ok  	sigs.k8s.io/gateway-api-inference-extension/pkg/epp/framework/plugins/requesthandling/parsers/openai	0.027s

I also ran the full test suite for the OpenAI parser package to ensure there were no regressions:

go test -v ./pkg/epp/framework/plugins/requesthandling/parsers/openai/...

However, when I sent a similar curl request as mentioned in this issue: #2482, but I still got the same results.

@varad-ahirwadkar
Copy link
Contributor Author

Looks good, can you add a unit test of this, with some real streaming responses data?

Sure, will add a unit test. Thanks

Signed-off-by: Varad <varad.ahirwadkar1@ibm.com>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 19, 2026
@varad-ahirwadkar
Copy link
Contributor Author

Hi @zetxqx,
I have added tests, PTAL

# go test -count=1  -v ./pkg/epp/framework/plugins/requesthandling/parsers/openai -run TestOpenAIParser_ParseResponse_Streaming 
=== RUN   TestOpenAIParser_ParseResponse_Streaming
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Single_data_chunk_with_usage
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Usage_and_DONE_in_the_same_multi-line_response
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Chunk_without_usage_returns_ParsedResponse_with_nil_usage
=== RUN   TestOpenAIParser_ParseResponse_Streaming/DONE_message_returns_error
=== RUN   TestOpenAIParser_ParseResponse_Streaming/Malformed_JSON_in_stream_(skipped)
=== RUN   TestOpenAIParser_ParseResponse_Streaming/ResponsesAPI_streaming_with_full_response
=== RUN   TestOpenAIParser_ParseResponse_Streaming/ResponsesAPI_without_response.completed_type_returns_nil
=== RUN   TestOpenAIParser_ParseResponse_Streaming/ResponsesAPI_with_multiple_events_extracts_from_completed
--- PASS: TestOpenAIParser_ParseResponse_Streaming (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Single_data_chunk_with_usage (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Usage_and_DONE_in_the_same_multi-line_response (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Chunk_without_usage_returns_ParsedResponse_with_nil_usage (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/DONE_message_returns_error (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/Malformed_JSON_in_stream_(skipped) (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/ResponsesAPI_streaming_with_full_response (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/ResponsesAPI_without_response.completed_type_returns_nil (0.00s)
    --- PASS: TestOpenAIParser_ParseResponse_Streaming/ResponsesAPI_with_multiple_events_extracts_from_completed (0.00s)
PASS
ok      sigs.k8s.io/gateway-api-inference-extension/pkg/epp/framework/plugins/requesthandling/parsers/openai        0.024s

@ahg-g
Copy link
Contributor

ahg-g commented Mar 19, 2026

/approve

I will leave the lgtm to @zetxqx

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, varad-ahirwadkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

usage of ResponsesAPI response in streaming mode is not extracted correctly

4 participants