Implemented ML Pipeline Continuous new table rows RunInference#37647
Implemented ML Pipeline Continuous new table rows RunInference#37647aIbrahiim wants to merge 9 commits intoapache:masterfrom
Conversation
Summary of ChangesHello @aIbrahiim, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances Apache Beam's ML capabilities by introducing a robust example pipeline for performing continuous machine learning inference on structured table data. The new pipeline, built around the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Assigning reviewers: R: @tvalentyn for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #37647 +/- ##
============================================
- Coverage 57.13% 57.01% -0.13%
Complexity 3515 3515
============================================
Files 1228 1225 -3
Lines 189092 188725 -367
Branches 3656 3656
============================================
- Hits 108039 107596 -443
- Misses 77637 77713 +76
Partials 3416 3416
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3af0279 to
331aa64
Compare
331aa64 to
077e777
Compare
damccorm
left a comment
There was a problem hiding this comment.
Thanks - just had some minor feedback
sdks/python/apache_beam/ml/inference/table_row_inference_requirements.txt
Show resolved
Hide resolved
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new example pipeline for table row inference with scikit-learn models, which is a valuable addition. The implementation covers both batch and streaming modes, and includes comprehensive tests, benchmarks, and documentation.
My review has identified a few areas for improvement:
- Robustness: There are places where the code could be more robust against missing data or arguments.
- Determinism: The use of Python's built-in
hash()for generating fallback keys can lead to non-deterministic behavior in a distributed environment. - Code Structure: The introduction of
table_row_inference_batch.pyalongsidetable_row_inference.py(which also supports batch mode) is a bit confusing. While the former is described as 'simplified', it also has features the latter lacks (like file output). It would be beneficial to either consolidate these into a single, more capable script or clarify their distinct purposes in the documentation. - Exception Handling: Some utility functions catch overly broad exceptions.
I've left specific comments with suggestions to address these points. Overall, this is a great contribution that expands the ML examples in Beam.
sdks/python/apache_beam/examples/inference/table_row_inference.py
Outdated
Show resolved
Hide resolved
sdks/python/apache_beam/examples/inference/table_row_inference.py
Outdated
Show resolved
Hide resolved
sdks/python/apache_beam/examples/inference/table_row_inference.py
Outdated
Show resolved
Hide resolved
sdks/python/apache_beam/examples/inference/table_row_inference_batch.py
Outdated
Show resolved
Hide resolved
sdks/python/apache_beam/examples/inference/table_row_inference_batch.py
Outdated
Show resolved
Hide resolved
sdks/python/apache_beam/examples/inference/table_row_inference_utils.py
Outdated
Show resolved
Hide resolved
|
Reminder, please take a look at this pr: @tvalentyn @Abacn |
752712e to
6b1b007
Compare
|
waiting on author |
…mprove Pub/Sub utils
|
Thanks - could you address the 2 gemini comments as well? Otherwise, this looks good |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'Table Row Inference' example for Apache Beam, demonstrating how to use RunInference with structured table data and Scikit-learn models. It includes a core pipeline (table_row_inference.py) supporting both batch and streaming modes, a local batch example (table_row_batch_example.py), utility functions (table_row_inference_utils.py), and unit tests (table_row_inference_test.py). The changes also update Looker metrics, add new dependencies, and integrate performance benchmarks for the new pipeline into the testing infrastructure and website documentation. Review comments primarily point out minor docstring indentation issues and a missing newline character in a Markdown file.
|
@damccorm resolved |
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.