Skip to content

Conversation

@kasimov-maxim
Copy link

@kasimov-maxim kasimov-maxim commented Dec 6, 2025

Fixes #142352

Summary

When using StreamWriter.start_tls() to upgrade a connection to TLS mid-stream, any data already buffered in the StreamReader was lost. This commonly occurs when implementing servers that support the PROXY protocol.

Changes

  • Added _transfer_buffered_data_to_ssl() method to BaseEventLoop that transfers buffered data from StreamReader._buffer to SSLProtocol._incoming before the TLS handshake begins
  • Added tests for the PROXY protocol + TLS scenario

Generative AI usage

Some parts of this PR were assisted by a generative AI tool. Specifically:

  • help with drafting English-language comments and documentation
  • suggestions for test refactoring
  • assistance with wording and structuring the PR description

All functional logic and final edits were reviewed, verified, and authored by
me, in accordance with the Python devguide guidelines.

Test plan

  • Added new tests in test_streams.py
  • All asyncio tests pass
  • make patchcheck passes

@python-cla-bot
Copy link

python-cla-bot bot commented Dec 6, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

file.seek(offset + total_sent)
await proto.restore()

def _transfer_buffered_data_to_ssl(self, protocol, ssl_protocol):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method is called from just one place, I suggest to inline it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this logic in a small helper because it represents a distinct
conceptual step (bridging StreamReader buffering into the SSLProtocol BIO),
and start_tls() is already quite dense. Even though the block is relatively
short, it introduces non-obvious buffering semantics involving private
StreamReader and SSLProtocol internals.

Keeping it named helps document the behavior required for mid-connection
start_tls() scenarios, avoids cluttering the main flow, and leaves room
for future adjustments if this logic needs to grow.

If we inline this, I think it would be worth keeping a fairly detailed
comment explaining why this buffering transfer is needed.

That said, I'm happy to inline it if you think that's preferable here.

Copy link
Member

@picnixz picnixz Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for a method as it would indirectly be exposed on subclasses. The docstring is also un-necessary, and there is no need to mention custom implementations.

If we inline this, I think it would be worth keeping a fairly detailed
comment explaining why this buffering transfer is needed.

No. Just link an issue (and possibly a short comment). The code base prefers concise comments. It's not meant to be read by everyone and we definitely don't want to have more docstrings than code. If something is meant to be extensively documented it should be done at the RST level.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks. I'll inline it and keep only a short comment with an issue reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(my $0.02)

no need for a method as it would indirectly be exposed on subclasses

FWIW, I often make use of the technique of naming things. It's nice to have clear semantic names for such abstractions: it'd be more difficult to make sense of ssl_protocol._incoming.write(protocol._stream_reader._buffer) in the context of start_tls(). Plus having these low-level details leak into that layer would mean that future refactorings may result in unrelated lines being added in-between, making their way into a block of otherwise tightly-coupled instructions.

Clearly, having it a member of the class is suboptimal but what's wrong with having a _private helper function around, for example?

Copy link
Member

@picnixz picnixz Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not like proliferation of methods when code can be inlined. Making sense of line is the reason why adding a small comment is enough.

We also should not be worried about the future. However I would recommend having a more compact block (we avoid blank lines in the stdlib and only use them to separate sections in a function).

Now, If we want to make it private, why not also. But I do not think that it is better to leak the method itself to subclasses (because then we need to indicate that the logic is private and neither protected or public; this the unfortunate part of not having a protected convention aside from @override).

Since Kumar is the code owner I think we should respect his intentions on that matter. Though they could have changed after your suggestion.

@kumaraditya303
Copy link
Contributor

@ambv Can you review this? You reviewed the original PR which added this API #91453

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this PR been generated using an LLM? if so, indicate which places were so and read https://devguide.python.org/getting-started/generative-ai/.

self.assertEqual(msg2, b"hello world 2!\n")

def _run_test_start_tls_behind_proxy(self, send_combined):
"""Test start_tls() when TLS ClientHello arrives with PROXY header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove docstrings from tests. It would be shown on failure. Use regular comments and shorten it. We don't want to restate the entire issue so simply link the issue's number as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, will update the tests accordingly.

test_message = b"hello world\n"
expected_response = reverse_message(test_message)

class TCPProxyServer:
Copy link
Member

@picnixz picnixz Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this class? don't we already have some helpers elsewhere? The test also looks overly complex. Why not just patching existing TCPServer? (if possible; I don't know if there isn't some class that could be used as is with some minimal modifications)

file.seek(offset + total_sent)
await proto.restore()

def _transfer_buffered_data_to_ssl(self, protocol, ssl_protocol):
Copy link
Member

@picnixz picnixz Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for a method as it would indirectly be exposed on subclasses. The docstring is also un-necessary, and there is no need to mention custom implementations.

If we inline this, I think it would be worth keeping a fairly detailed
comment explaining why this buffering transfer is needed.

No. Just link an issue (and possibly a short comment). The code base prefers concise comments. It's not meant to be read by everyone and we definitely don't want to have more docstrings than code. If something is meant to be extensively documented it should be done at the RST level.

@bedevere-app
Copy link

bedevere-app bot commented Jan 6, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz
Copy link
Member

picnixz commented Jan 6, 2026

Also, I think reading and checking if we're compliant with https://datatracker.ietf.org/doc/html/rfc2817 should be done (even if the RFC is only proposed and not in the standard tracks) or whatever RFC should be considered (I do think it's correct to fix this but I don't know if we should only send the possible hello for tls or generally write whatever was buffered)

@kasimov-maxim
Copy link
Author

Has this PR been generated using an LLM? if so, indicate which places were so

Noted in PR.


# Pause early so that "ssl_protocol.data_received()" doesn't
# have a chance to get called before "ssl_protocol.connection_made()".
transport.pause_reading()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to transfer the buffer after reading is paused rather than before?

Comment on lines +1335 to +1336
ssl_protocol._incoming.write(buffer)
buffer.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could better belong @

self._incoming = ssl.MemoryBIO()
.

Technically, there could be something like

-         self._incoming = ssl.MemoryBIO()
+         self._incoming = ssl.MemoryBIO()
+         try:
+             underlying_raw_stream_buffer = app_protocol._stream_reader._buffer
+         except AttributeError:
+             pass
+         else:
+             self._incoming.write(underlying_raw_stream_buffer)
+             underlying_raw_stream_buffer.clear()

But I don't know how other places that initialize sslproto.SSLProtocol() would be affected. So it's just an idea to check out and see if other instances should have the same buffer transfer logic (looks like it's used in two other modules).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

asyncio: start_tls() loses buffered data from StreamReader

4 participants