Skip to content

Failing to delete the stream from the activeStreams map leading to REFUSED_STREAM errors #8517

@arjan-bal

Description

@arjan-bal

Original PR with repro: #8504

After upgrading go-grpc, we started see the following error:

stream terminated by RST_STREAM with error code: REFUSED_STREAM

Upon investigation, it appears that in some scenarios, we fail to properly delete streams from the activeStreams map, resulting in this error being wrongly returned.

I added a test case that seems to trigger the problem:

PS: To test the behavior i added a "ActiveStreamTracker" hook -> im not sure this is the best way but i wanted just to show the problem

end2end_test.go:4015: leak streams:  5

The issue seems to have been introduced in the following PR: #8071

Specifically, the check added here:

if oldState == streamDone {
return
}

If this conditional is removed, all streams are correctly cleaned up from the activeStreams map, and the test passes without leaks.

Root cause

  1. The server stream exhausts the stream quota, and starts buffering data frames.
  2. The server attempts to finish the RPC gracefully by sending trailers by calling finishStream.
  3. The write of the header frame gets blocked due to the pending data frames that are awaiting flow control quota.
  4. Once the deadline expires, the server's deadline monitoring timer tries to call closeStream, which is a no-op since the stream state is already marked streamDone by finishStream in point 1.
  5. The client sends and RST stream, the server calls closeStream which is again a no-op.

The simplest fix is to remove the guard block added in closeStream in #8071. We may end up pushing redundant cleanupStream events on to the controlbuf, but it doesn't cause correctness issues since cleanupStreamHandler returns early in such cases.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions