Skip to content

[BEAM-1251] Upgrade from buffer to memoryview for Python 3#4820

Closed
cclauss wants to merge 1 commit into
apache:masterfrom
cclauss:buffer-to-memoryview
Closed

[BEAM-1251] Upgrade from buffer to memoryview for Python 3#4820
cclauss wants to merge 1 commit into
apache:masterfrom
cclauss:buffer-to-memoryview

Conversation

@cclauss

@cclauss cclauss commented Mar 7, 2018

Copy link
Copy Markdown

buffer was removed in Python 3 in favor of memoryview.

DESCRIPTION HERE


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand:
    • What the pull request does
    • Why it does it
    • How it does it
    • Why this approach
  • Each commit in the pull request should have a meaningful subject line and body.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@cclauss

cclauss commented Mar 13, 2018

Copy link
Copy Markdown
Author

@holdenk Your review please?

@holdenk

holdenk commented Mar 14, 2018

Copy link
Copy Markdown
Contributor

LGTM but from my memory I think I saw a similar PR, was that also yours? (Or am I just imagining things).

@cclauss

cclauss commented Mar 17, 2018

Copy link
Copy Markdown
Author

This is the only PR that I have that touches this code. I mentioned this issue in #4798 but I did not propose a fix in that PR.

@aaltay Your review please?

# Compressed data includes a 4-byte CRC32 checksum which we verify.
# We take care to avoid extra copies of data while slicing large objects
# by use of a buffer.
result = snappy.decompress(buffer(data)[:-4])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this change? When I ran it, it fails with: TypeError: argument 1 must be string or read-only buffer, not memoryview.

This is because, a slice of a buffer will return the raw data, but in case of memoryview a slice will return a memoryview object for that sub section.

@cclauss

cclauss commented Mar 18, 2018

Copy link
Copy Markdown
Author

Thanks for catching this. I did not have an effective way to test. Reading through:

memoryview exists in all versions of Python that Beam supports so once we find a memoryview-based solution that works, we should be able to drop buffer altogether.

@cclauss cclauss force-pushed the buffer-to-memoryview branch from 61d4aff to 4e5e1bf Compare March 19, 2018 20:44
@cclauss

cclauss commented Mar 19, 2018

Copy link
Copy Markdown
Author

@aaltay Can you please retry with this update?

@aaltay

aaltay commented Mar 19, 2018

Copy link
Copy Markdown
Member

No, the changed version also does not work. This six.binary_type(memoryview(data)[:-4]) results in the literal string of the form <memory at 0x7f62ee334510> and fails with snappy.UncompressError: Error while decompressing: invalid input

Besides binary_type is just str, even if it worked as expected in this case it would have created a copy of data, which beats the purpose.

The real solution here would be to upgrade snappy to accept memoryview as an argument. If we cannot do that, we can remove the optimization and settle for snappy.decompress(data[:-4]). Or perhaps better we can conditionally keep the buffer for python2 only.

CC'ing a few people who might have an idea of the impact of copying data here:
cc: @chamikaramj @katsiapis

@cclauss

cclauss commented Mar 20, 2018

Copy link
Copy Markdown
Author

Are we using the current python-snappy 0.52? Perhaps @martindurant has some ideas for us.

@aaltay

aaltay commented Mar 20, 2018

Copy link
Copy Markdown
Member

Yes we are depending on the python-snappy pypi. Dataflow has 0.5.1 installed, not the latest 0.5.2. But I do not think there is a change related to this. I tested with the latest available version for this PR.

@cclauss

cclauss commented Mar 21, 2018

Copy link
Copy Markdown
Author

@stale

stale Bot commented Jun 7, 2018

Copy link
Copy Markdown

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@stale stale Bot added the wontfix label Jun 7, 2018
@stale

stale Bot commented Jun 14, 2018

Copy link
Copy Markdown

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@cclauss

cclauss commented Jul 2, 2018

Copy link
Copy Markdown
Author

A fix has been checked into intake/python-snappy#72

@cclauss cclauss deleted the buffer-to-memoryview branch July 4, 2018 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants