[BEAM-13984] Implement RunInference for PyTorch by yeandy · Pull Request #17196 · apache/beam

yeandy · 2022-03-28T20:19:50Z

Pytorch implementation of RunInference

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

yeandy · 2022-04-04T19:17:06Z

R: @ryanthompson591 @AnandInguva Please take a look, thanks!

Still working on fixing pytorch imports in the meantime.

yeandy · 2022-04-04T19:25:49Z

I think I should remove any traces of GPU logic lest we give the impression that it has been fully tested. Or is the minimal GPU logic I have ok?

ryanthompson591

Looks good. Like the tests.

ryanthompson591 · 2022-04-05T19:02:23Z

sdks/python/setup.py

Does this mean all python sdks require pytorch going forward. Is this a heavy requirement?

If so let's make sure it's fine to require this for everyone.

I can't see the original change now, but I think this was in an extra. It would be a requirement that's added when you pip install apache-beam[ml], so it wouldn't be a hard dependency for all users.

Regardless I think we should hold off adding a requirement spec anywhere, unless we need to communicate that there's restricted version range we support.

For now, to allow most flexibility, we're going to support pip install apache-beam pytorch

sdks/python/apache_beam/ml/inference/pytorch_impl.py

sdks/python/apache_beam/ml/inference/pytorch_impl_test.py

kerrydc · 2022-04-06T14:35:56Z

R: @TheNeuralBit for review and merge

TheNeuralBit · 2022-04-06T18:31:37Z

sdks/python/apache_beam/ml/inference/pytorch_impl.py

Shouldn't our Beam abstraction be opinionated about the element type instead of using the native type for each library? Otherwise I'd argue this is just a library of similar-looking extensions for different ML libraries

TheNeuralBit · 2022-04-06T18:32:18Z

sdks/python/apache_beam/ml/inference/pytorch_impl.py

Should this use FileSystem to add support for gs:// and s3:// paths? This seems to rely on reading from a local filesystem which is problematic when running on distributed workers.

(admittedly I may be misunderstanding when this is used as I'm still catching up on this work)

Good point, I can look into adding that. My initial work was assuming that that we are reading only from local filesystem.

Summary

self._state_dict_path is that path to a file that stores model states.

And self._model_class is a Python Pytorch class that defines the model structure.

We're basically reading in a dictionary of coefficients/parameters/states that specify how to populate the model's structure (passed in via the argument model_class) with certain values.

The load_model will then be acquired by a Shared() instance.

codecov · 2022-04-12T22:51:12Z

Codecov Report

Merging #17196 (91ca931) into master (2aa24dc) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17196      +/-   ##
==========================================
- Coverage   73.99%   73.99%   -0.01%     
==========================================
  Files         685      685              
  Lines       89727    89735       +8     
==========================================
+ Hits        66395    66399       +4     
- Misses      22172    22176       +4     
  Partials     1160     1160

Flag	Coverage Δ
python	`83.67% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdks/python/apache_beam/io/source_test_utils.py	`88.01% <0.00%> (-1.39%)`	⬇️
...che_beam/runners/interactive/interactive_runner.py	`92.25% <0.00%> (-0.71%)`	⬇️
sdks/python/apache_beam/transforms/combiners.py	`93.03% <0.00%> (-0.39%)`	⬇️
sdks/python/apache_beam/io/iobase.py	`86.18% <0.00%> (-0.04%)`	⬇️
sdks/python/apache_beam/io/textio.py	`96.89% <0.00%> (+0.11%)`	⬆️
sdks/python/apache_beam/coders/coders.py	`88.44% <0.00%> (+0.12%)`	⬆️
...ks/python/apache_beam/runners/worker/sdk_worker.py	`89.06% <0.00%> (+0.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aa24dc...91ca931. Read the comment docs.

ryanthompson591

Looks good. I like it.

sdks/python/apache_beam/ml/inference/pytorch.py

sdks/python/apache_beam/ml/inference/pytorch_test.py

ryanthompson591 · 2022-04-18T17:40:13Z

sdks/python/apache_beam/ml/inference/pytorch_test.py

+                            for example in examples]).reshape(-1, 1))
+      ]
+
+      gs_pth = 'gs://apache-beam-ml/pytorch_lin_reg_model_2x+0.5_state_dict.pth'


I didn't test gcs in my implementation. Does it make sense to break these larger E2E tests into another module instead of the more unit test like tests?

You're probably right on separating out the E2E tests. I was thinking that this should be small enough tough to verify the usage of FileSystems module though. Perhaps I can break it out when we start adding the E2E testing file.

I think it's preferable not to require GCP credentials for a unit test, I'm actually not clear on this, can someone run this unit test and read this GCS path without setting up GCP credentials?

yeandy · 2022-04-18T20:57:56Z

R: @TheNeuralBit @tvalentyn

tvalentyn · 2022-04-21T15:27:09Z

@TheNeuralBit thanks for providing feedback on this change. have all yor comments been addressed?

TheNeuralBit · 2022-04-21T16:36:44Z

sdks/python/apache_beam/ml/inference/pytorch_test.py

+                            for example in examples]).reshape(-1, 1))
+      ]
+
+      gs_pth = 'gs://apache-beam-ml/pytorch_lin_reg_model_2x+0.5_state_dict.pth'


I think it's preferable not to require GCP credentials for a unit test, I'm actually not clear on this, can someone run this unit test and read this GCS path without setting up GCP credentials?

TheNeuralBit · 2022-04-21T16:43:55Z

sdks/python/test-suites/tox/py38/build.gradle

+
+toxTask "testPy38pytorch-110", "py38-pytorch-110"
+test.dependsOn "testPy38pytorch-110"
+preCommitPy38.dependsOn "testPy38pytorch-110"


Does pytorch commonly make changes that will break us between minor versions? We test different minor versions for pandas because our special usage of pandas in the DataFrame API leads to breakages even between minor versions. In the case of pyarrow, every release is a major version, which is meant to communicate that the API can change (https://arrow.apache.org/docs/format/Versioning.html).

How will we keep this up to date as new version of pytorch come out?

Neither of these are blockers, but these are questions we should consider

Pytorch has a 90-day release cycle, but it doesn't seem that their changes really touch on the APIs that we use.

Maybe we can test the most recent release (pytorch 1.11.0), along with the last minor version of the last X major versions (pytorch 1.10.2, pytorch 1.9.1, pytorch 1.8.2, ...). I'll create a ticket to investigate this

yeandy · 2022-04-21T19:09:03Z

Just checked: I'm able to read from the path because I have permissions to the gs://apache-beam-ml bucket, even though it's not public. I'm going to remove this unit test (in a quick PR) for now, and then put it with the e2e tests when that suite is released.

TheNeuralBit · 2022-04-21T19:33:13Z

I think it's ok to keep the unit test in for now, it's useful to have it validate that functionality until we have an e2e test that can do it.

github-actions bot added the python label Mar 28, 2022

yeandy marked this pull request as ready for review April 4, 2022 19:16

github-actions bot added the docker label Apr 4, 2022

ryanthompson591 reviewed Apr 5, 2022

View reviewed changes

TheNeuralBit reviewed Apr 6, 2022

View reviewed changes

github-actions bot added build and removed build labels Apr 8, 2022

yeandy added 15 commits April 13, 2022 16:22

Initial pytorch implementation

dfbecbf

Clean up pytorch implementation; Works for single example

0850bde

Fix for multiple examples in a batch

8089716

Fix header and documentation

137dfd6

Add multifeature tests; Add numpy/tensor conversion

dc99291

Add torch to setup.py

09857d0

Add ml to tox.ini

4f1fe2a

Remove numpy checks/conversions; Address PR comments

49ba91a

Remove GPU code and test

ad5cb0e

Add Filesystems

797fb2a

Add separate pytorch install and tox test

599e649

Fix typos in gradle and tox files

cf01f45

Add separate tox tests for pytorch; Remove torch setup install

316d9c9

fix import error

1f171ab

Add unittest main()

d0473fd

yeandy force-pushed the runinference_pytorch branch from 3320640 to d0473fd Compare April 13, 2022 21:24

Add PredictionResult; Refactor tests

674812d

ryanthompson591 approved these changes Apr 18, 2022

View reviewed changes

yeandy added 2 commits April 18, 2022 14:01

Fix docs; Remove keyed test

97a8e0d

Add gcp to tox

91ca931

TheNeuralBit approved these changes Apr 21, 2022

View reviewed changes

TheNeuralBit merged commit 5954209 into apache:master Apr 21, 2022

Conversation

yeandy commented Mar 28, 2022

GitHub Actions Tests Status (on master branch)

Uh oh!

yeandy commented Apr 4, 2022

Uh oh!

yeandy commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanthompson591 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kerrydc commented Apr 6, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ryanthompson591 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeandy commented Apr 18, 2022

Uh oh!

tvalentyn commented Apr 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeandy commented Apr 21, 2022

Uh oh!

TheNeuralBit commented Apr 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yeandy commented Apr 4, 2022 •

edited

Loading

codecov bot commented Apr 12, 2022 •

edited

Loading

TheNeuralBit commented Apr 21, 2022 •

edited

Loading