Skip to content

[Python] Drop envoy-data-plane/betterproto dependency from the Python SDK#39213

Open
shahar1 wants to merge 1 commit into
apache:masterfrom
shahar1:drop-envoy-data-plane-dependency
Open

[Python] Drop envoy-data-plane/betterproto dependency from the Python SDK#39213
shahar1 wants to merge 1 commit into
apache:masterfrom
shahar1:drop-envoy-data-plane-dependency

Conversation

@shahar1

@shahar1 shahar1 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What & why

EnvoyRateLimiter (sdks/python/apache_beam/io/components/rate_limiter.py) depends on envoy-data-plane purely to obtain a handful of protobuf message classes. That package pulls in betterproto==2.0.0b6 — an outdated pre-release of a protobuf reimplementation — plus grpclib and a transitive subtree (h2, hpack, hyperframe, multidict).

This PR removes envoy-data-plane (and therefore betterproto) entirely, replacing it with a minimal, self-contained vendored proto.

Why this is worth doing, and why vendoring beats bumping the dependency:

  • The dependency was already fought, not used. betterproto emits async grpclib stubs that are incompatible with Beam's synchronous grpcio. The code already hand-writes the RateLimitServiceStub bridge to work around that — so the package only ever contributed 6 message dataclasses, and the wire is plain protobuf over grpcio regardless.
  • It resolves a real downstream block. The betterproto==2.0.0b6 pin is the last blocker stopping Apache Airflow from un-suspending its Beam provider (Revert "Suspend Apache Beam Provider due to grpcio limitation (#61926)" airflow#66952). Because Airflow's constraint solver reads Beam's install_requires, removing the dependency here fixes it for all Python versions — no betterproto pin needed on Airflow's side.
  • Bumping to envoy-data-plane>=2.1.0 would only move the problem. That route swaps in betterproto2 (still pre-1.0, 0.9.x), requires porting to the betterproto2 API, and drops Python 3.10 support (2.x requires >=3.11). Vendoring ends the version-chase permanently and lets us delete the existing python_version split in setup.py.
  • This is an established Beam pattern, not a novelty. apache_beam/coders/proto2_coder_test_messages_pb2.py is already a checked-in, hand-regenerated _pb2.py outside the gen_protos.py model pipeline. This change follows it exactly (same lint-exclusion spots, same header style). The vendored proto is a frozen external contract — Envoy RLS v3 field numbers are GA and cannot change without breaking every client — so it is a write-once artifact, not an ongoing maintenance burden.

How it works

  • New sdks/python/apache_beam/io/components/rate_limit.proto: a ~30-line self-contained subset of the Envoy Rate Limit Service. Field numbers match Envoy's rls.proto / ratelimit.proto, so the messages are wire-compatible with a real RLS server (protobuf carries only field numbers and types on the wire — not message or package names).
  • New checked-in rate_limit_pb2.py, generated via grpc_tools.protoc. The generated runtime_version guard is removed so the module stays compatible across Beam's full protobuf>=3.20.3,<7 runtime range (matching the existing test pb2). The regeneration command is documented in both files' headers.
  • rate_limiter.py / rate_limiter_test.py: swapped imports to rate_limit_pb2; the one behavioral adjustment is google.protobuf.Duration handling (dur.ToTimedelta().total_seconds()betterproto used to auto-map Durationtimedelta).

Testing

  • 8/8 unit tests pass. The original 5 mock-based tests cover the retry/throttle logic; 3 new wire-format tests pin the exact serialized bytes and enum values against Envoy's field numbers — these fail loudly if the vendored proto ever drifts, which the mock tests cannot catch.
  • Verified rate_limiter imports and constructs with envoy_data_plane and betterproto blocked from sys.modules, confirming the dependency is genuinely gone.
  • yapf, isort (repo CI flags), and ruff all clean.

Follow-up

The 12 container *_requirements.txt files have the two direct packages (envoy-data-plane, betterproto) removed so they stay consistent with setup.py. A full ./gradlew :sdks:python:container:generatePythonRequirementsAll regeneration should run in CI/by a committer to also prune the now-orphaned transitives (grpclib, h2, multidict, …) and refresh pins — that step requires interpreters for all supported Python versions plus a full network resolution.



Was AI tooling used to author this PR?

Yes — authored with Claude Code (Opus 4.8, 1M context). All changes were reviewed by the PR author. Generated following the project's gen-AI contribution guidelines.

🤖 Generated with Claude Code

EnvoyRateLimiter only needed a handful of protobuf message classes from
envoy-data-plane, which pulls in the outdated betterproto==2.0.0b6 beta
(a protobuf reimplementation) plus grpclib and transitive deps. That pin
is the last blocker stopping Apache Airflow from un-suspending its Beam
provider (apache/airflow#66952), and it forced a Python-version split in
setup.py.

The dependency was already fought rather than used: the RateLimitServiceStub
bridge exists solely because betterproto emits async grpclib stubs that don't
work with Beam's synchronous grpcio, and the wire is plain protobuf over
grpcio regardless.

Replace it with a minimal, self-contained rate_limit.proto compiled to a
checked-in rate_limit_pb2.py (following the existing
proto2_coder_test_messages_pb2.py precedent). Field numbers match Envoy's
rls.proto/ratelimit.proto, so it stays wire-compatible with a real RLS
server. This removes the conflict permanently for every downstream, deletes
the py<3.11 split, and drops betterproto + grpclib from every container image.

New wire-format tests pin the field numbers/enum values so a renumbering
can't silently break live rate limiting (the mock-based tests would not).

Note: the container base_image_requirements.txt files had the two direct
packages removed to stay consistent with setup.py; a full
`generatePythonRequirementsAll` regeneration should follow in CI to also
prune now-orphaned transitives (grpclib, h2, multidict, ...) and refresh pins.

Part of apache#37854
Unblocks apache/airflow#66952

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request removes the 'envoy-data-plane' dependency from the Apache Beam Python SDK. By vendoring a minimal subset of the Envoy Rate Limit Service protocol, the SDK avoids the maintenance burden and version conflicts associated with the 'betterproto' library. This change improves compatibility for downstream projects like Apache Airflow and aligns with existing patterns for handling protobuf definitions within the repository.

Highlights

  • Dependency Removal: Removed the 'envoy-data-plane' and 'betterproto' dependencies from the Python SDK to resolve downstream dependency conflicts and simplify the build environment.
  • Vendored Protobuf: Introduced a minimal, self-contained vendored protobuf definition ('rate_limit.proto') and generated 'rate_limit_pb2.py' to replace the removed dependency, ensuring wire-compatibility with Envoy RLS servers.
  • Codebase Cleanup: Updated 'EnvoyRateLimiter' to use the new vendored protobuf types and adjusted 'setup.py' and container requirements files to reflect the removal of the external packages.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the envoy-data-plane and transitive betterproto dependencies from the Python SDK. It introduces a small, vendored protobuf definition (rate_limit.proto and its generated Python module) to maintain wire-compatibility with the Envoy Rate Limit Service, thereby resolving dependency conflicts for downstream projects. I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @damccorm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@shahar1

shahar1 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Failed CI job seems unrelated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant