Skip to content

[BEAM-13984] Implement RunInference for PyTorch#17196

Merged
TheNeuralBit merged 18 commits intoapache:masterfrom
yeandy:runinference_pytorch
Apr 21, 2022
Merged

[BEAM-13984] Implement RunInference for PyTorch#17196
TheNeuralBit merged 18 commits intoapache:masterfrom
yeandy:runinference_pytorch

Conversation

@yeandy
Copy link
Copy Markdown
Contributor

@yeandy yeandy commented Mar 28, 2022

Pytorch implementation of RunInference


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@yeandy yeandy marked this pull request as ready for review April 4, 2022 19:16
@yeandy
Copy link
Copy Markdown
Contributor Author

yeandy commented Apr 4, 2022

R: @ryanthompson591 @AnandInguva Please take a look, thanks!

Still working on fixing pytorch imports in the meantime.

@yeandy
Copy link
Copy Markdown
Contributor Author

yeandy commented Apr 4, 2022

I think I should remove any traces of GPU logic lest we give the impression that it has been fully tested. Or is the minimal GPU logic I have ok?

@github-actions github-actions bot added the docker label Apr 4, 2022
Copy link
Copy Markdown
Contributor

@ryanthompson591 ryanthompson591 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Like the tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean all python sdks require pytorch going forward. Is this a heavy requirement?

If so let's make sure it's fine to require this for everyone.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see the original change now, but I think this was in an extra. It would be a requirement that's added when you pip install apache-beam[ml], so it wouldn't be a hard dependency for all users.

Regardless I think we should hold off adding a requirement spec anywhere, unless we need to communicate that there's restricted version range we support.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, to allow most flexibility, we're going to support pip install apache-beam pytorch

@kerrydc
Copy link
Copy Markdown
Contributor

kerrydc commented Apr 6, 2022

R: @TheNeuralBit for review and merge

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't our Beam abstraction be opinionated about the element type instead of using the native type for each library? Otherwise I'd argue this is just a library of similar-looking extensions for different ML libraries

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use FileSystem to add support for gs:// and s3:// paths? This seems to rely on reading from a local filesystem which is problematic when running on distributed workers.

(admittedly I may be misunderstanding when this is used as I'm still catching up on this work)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I can look into adding that. My initial work was assuming that that we are reading only from local filesystem.

Summary

  • self._state_dict_path is that path to a file that stores model states.
  • And self._model_class is a Python Pytorch class that defines the model structure.

We're basically reading in a dictionary of coefficients/parameters/states that specify how to populate the model's structure (passed in via the argument model_class) with certain values.

The load_model will then be acquired by a Shared() instance.

@github-actions github-actions bot added build and removed build labels Apr 8, 2022
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2022

Codecov Report

Merging #17196 (91ca931) into master (2aa24dc) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17196      +/-   ##
==========================================
- Coverage   73.99%   73.99%   -0.01%     
==========================================
  Files         685      685              
  Lines       89727    89735       +8     
==========================================
+ Hits        66395    66399       +4     
- Misses      22172    22176       +4     
  Partials     1160     1160              
Flag Coverage Δ
python 83.67% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/source_test_utils.py 88.01% <0.00%> (-1.39%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 92.25% <0.00%> (-0.71%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 93.03% <0.00%> (-0.39%) ⬇️
sdks/python/apache_beam/io/iobase.py 86.18% <0.00%> (-0.04%) ⬇️
sdks/python/apache_beam/io/textio.py 96.89% <0.00%> (+0.11%) ⬆️
sdks/python/apache_beam/coders/coders.py 88.44% <0.00%> (+0.12%) ⬆️
...ks/python/apache_beam/runners/worker/sdk_worker.py 89.06% <0.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aa24dc...91ca931. Read the comment docs.

@yeandy yeandy force-pushed the runinference_pytorch branch from 3320640 to d0473fd Compare April 13, 2022 21:24
Copy link
Copy Markdown
Contributor

@ryanthompson591 ryanthompson591 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I like it.

for example in examples]).reshape(-1, 1))
]

gs_pth = 'gs://apache-beam-ml/pytorch_lin_reg_model_2x+0.5_state_dict.pth'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test gcs in my implementation. Does it make sense to break these larger E2E tests into another module instead of the more unit test like tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right on separating out the E2E tests. I was thinking that this should be small enough tough to verify the usage of FileSystems module though. Perhaps I can break it out when we start adding the E2E testing file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable not to require GCP credentials for a unit test, I'm actually not clear on this, can someone run this unit test and read this GCS path without setting up GCP credentials?

@yeandy
Copy link
Copy Markdown
Contributor Author

yeandy commented Apr 18, 2022

R: @TheNeuralBit @tvalentyn

@tvalentyn
Copy link
Copy Markdown
Contributor

@TheNeuralBit thanks for providing feedback on this change. have all yor comments been addressed?

for example in examples]).reshape(-1, 1))
]

gs_pth = 'gs://apache-beam-ml/pytorch_lin_reg_model_2x+0.5_state_dict.pth'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable not to require GCP credentials for a unit test, I'm actually not clear on this, can someone run this unit test and read this GCS path without setting up GCP credentials?


toxTask "testPy38pytorch-110", "py38-pytorch-110"
test.dependsOn "testPy38pytorch-110"
preCommitPy38.dependsOn "testPy38pytorch-110"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pytorch commonly make changes that will break us between minor versions? We test different minor versions for pandas because our special usage of pandas in the DataFrame API leads to breakages even between minor versions. In the case of pyarrow, every release is a major version, which is meant to communicate that the API can change (https://arrow.apache.org/docs/format/Versioning.html).

How will we keep this up to date as new version of pytorch come out?

Neither of these are blockers, but these are questions we should consider

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytorch has a 90-day release cycle, but it doesn't seem that their changes really touch on the APIs that we use.

Maybe we can test the most recent release (pytorch 1.11.0), along with the last minor version of the last X major versions (pytorch 1.10.2, pytorch 1.9.1, pytorch 1.8.2, ...). I'll create a ticket to investigate this

@TheNeuralBit TheNeuralBit merged commit 5954209 into apache:master Apr 21, 2022
@yeandy
Copy link
Copy Markdown
Contributor Author

yeandy commented Apr 21, 2022

Just checked: I'm able to read from the path because I have permissions to the gs://apache-beam-ml bucket, even though it's not public. I'm going to remove this unit test (in a quick PR) for now, and then put it with the e2e tests when that suite is released.

@TheNeuralBit
Copy link
Copy Markdown
Member

TheNeuralBit commented Apr 21, 2022

I think it's ok to keep the unit test in for now, it's useful to have it validate that functionality until we have an e2e test that can do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants