[BEAM-1251] Upgrade from buffer to memoryview for Python 3 by cclauss · Pull Request #4820 · apache/beam

cclauss · 2018-03-07T20:52:13Z

buffer was removed in Python 3 in favor of memoryview.

http://portingguide.readthedocs.io/en/latest/etc.html#raw-buffer-protocol-buffer-and-memoryview

DESCRIPTION HERE

Follow this checklist to help us incorporate your contribution quickly and easily:

cclauss · 2018-03-13T19:09:58Z

@holdenk Your review please?

holdenk · 2018-03-14T19:00:37Z

LGTM but from my memory I think I saw a similar PR, was that also yours? (Or am I just imagining things).

cclauss · 2018-03-17T09:09:18Z

This is the only PR that I have that touches this code. I mentioned this issue in #4798 but I did not propose a fix in that PR.

@aaltay Your review please?

aaltay · 2018-03-18T18:54:06Z

      # Compressed data includes a 4-byte CRC32 checksum which we verify.
      # We take care to avoid extra copies of data while slicing large objects
-      # by use of a buffer.
-      result = snappy.decompress(buffer(data)[:-4])


Have you tested this change? When I ran it, it fails with: TypeError: argument 1 must be string or read-only buffer, not memoryview.

This is because, a slice of a buffer will return the raw data, but in case of memoryview a slice will return a memoryview object for that sub section.

cclauss · 2018-03-18T21:03:45Z

Thanks for catching this. I did not have an effective way to test. Reading through:

https://docs.python.org/2.7/library/stdtypes.html#memoryview
https://docs.python.org/3/library/stdtypes.html#memoryview
I get the sense that the next thing to try would be to wrap it with six.binary_type() as in six.binary_type(memoryview(data)[:-4])

memoryview exists in all versions of Python that Beam supports so once we find a memoryview-based solution that works, we should be able to drop buffer altogether.

cclauss · 2018-03-19T20:46:23Z

@aaltay Can you please retry with this update?

aaltay · 2018-03-19T23:43:59Z

No, the changed version also does not work. This six.binary_type(memoryview(data)[:-4]) results in the literal string of the form <memory at 0x7f62ee334510> and fails with snappy.UncompressError: Error while decompressing: invalid input

Besides binary_type is just str, even if it worked as expected in this case it would have created a copy of data, which beats the purpose.

The real solution here would be to upgrade snappy to accept memoryview as an argument. If we cannot do that, we can remove the optimization and settle for snappy.decompress(data[:-4]). Or perhaps better we can conditionally keep the buffer for python2 only.

CC'ing a few people who might have an idea of the impact of copying data here:
cc: @chamikaramj @katsiapis

cclauss · 2018-03-20T05:45:53Z

Are we using the current python-snappy 0.52? Perhaps @martindurant has some ideas for us.

aaltay · 2018-03-20T20:15:29Z

Yes we are depending on the python-snappy pypi. Dataflow has 0.5.1 installed, not the latest 0.5.2. But I do not think there is a change related to this. I tested with the latest available version for this PR.

cclauss · 2018-03-21T15:16:38Z

intake/python-snappy#65 (comment)

stale · 2018-06-07T04:01:25Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

stale · 2018-06-14T04:53:49Z

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

cclauss · 2018-07-02T14:47:56Z

A fix has been checked into intake/python-snappy#72

aaltay reviewed Mar 18, 2018

View reviewed changes

[BEAM-1251] Upgrade from buffer to memoryview for Python 3

4e5e1bf

cclauss force-pushed the buffer-to-memoryview branch from 61d4aff to 4e5e1bf Compare March 19, 2018 20:44

cclauss mentioned this pull request Jun 1, 2018

[WIP][BEAM-2784] Run python 2 to 3 migration and fix resulting Python 2 errors #3772

Closed

stale Bot added the wontfix label Jun 7, 2018

stale Bot closed this Jun 14, 2018

kennknowles added stale and removed wontfix labels Jun 25, 2018

cclauss mentioned this pull request Jun 30, 2018

Fix #65: support new buffer API intake/python-snappy#72

Merged

cclauss mentioned this pull request Jul 4, 2018

[BEAM-1251] Upgrade from buffer to memoryview (again) #5887

Merged

2 tasks

cclauss deleted the buffer-to-memoryview branch July 4, 2018 12:02

Uh oh!

Conversation

cclauss commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cclauss commented Mar 13, 2018

Uh oh!

holdenk commented Mar 14, 2018

Uh oh!

cclauss commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaltay Mar 18, 2018

Choose a reason for hiding this comment

Uh oh!

cclauss commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cclauss commented Mar 19, 2018

Uh oh!

aaltay commented Mar 19, 2018

Uh oh!

cclauss commented Mar 20, 2018

Uh oh!

aaltay commented Mar 20, 2018

Uh oh!

cclauss commented Mar 21, 2018

Uh oh!

stale Bot commented Jun 7, 2018

Uh oh!

stale Bot commented Jun 14, 2018

Uh oh!

cclauss commented Jul 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cclauss commented Mar 7, 2018 •

edited

Loading

cclauss commented Mar 17, 2018 •

edited

Loading

cclauss commented Mar 18, 2018 •

edited

Loading