-
Notifications
You must be signed in to change notification settings - Fork 4.2k
ARROW-12431: [Python] Mask is inverted when creating FixedSizeBinaryArray #10199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e5656e6
6e5d952
f8ad0d5
b2f936c
7777a17
998ac4e
0d7672c
e06be2e
85993b9
8d6f1bc
7cf6cd1
d0602c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2714,6 +2714,51 @@ def test_array_masked(): | |
| assert arr.type == pa.int64() | ||
|
|
||
|
|
||
| def test_binary_array_masked(): | ||
| # ARROW-12431 | ||
| masked_basic = pa.array([b'\x05'], type=pa.binary(1), | ||
| mask=np.array([False])) | ||
| assert [b'\x05'] == masked_basic.to_pylist() | ||
|
|
||
| # Fixed Length Binary | ||
| masked = pa.array(np.array([b'\x05']), type=pa.binary(1), | ||
| mask=np.array([False])) | ||
| assert [b'\x05'] == masked.to_pylist() | ||
|
|
||
| masked_nulls = pa.array(np.array([b'\x05']), type=pa.binary(1), | ||
| mask=np.array([True])) | ||
| assert [None] == masked_nulls.to_pylist() | ||
|
|
||
| # Variable Length Binary | ||
| masked = pa.array(np.array([b'\x05']), type=pa.binary(), | ||
| mask=np.array([False])) | ||
| assert [b'\x05'] == masked.to_pylist() | ||
|
|
||
| masked_nulls = pa.array(np.array([b'\x05']), type=pa.binary(), | ||
| mask=np.array([True])) | ||
| assert [None] == masked_nulls.to_pylist() | ||
|
|
||
| # Fixed Length Binary, copy | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand what this is supposed to test. The fact that a copy is made is just an implementation detail.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It verifies that the behaviour is the same that we get from variable length binary arrays, which do not reuse the numpy array memory. I don't think it's an implementation detail because it changes the user experience. The fact that the underlying numpy array is shared or not changes the user experience as it means the user can't modify the original numpy array without indirectly modifying (probably unexpectedly) the Arrow array too. which lead me to create https://issues.apache.org/jira/browse/ARROW-12666 because in some cases we reuse the numpy memory (all basic types) and in other cases we don't (the string, binary etc... types). The follow up ticket suggests to make that behaviour clear as numpy does by adding a We can discuss further what's the best way to go in that dedicated ticket, here I wanted to make sure we were consistent with that happens when |
||
| npa = np.array([b'aaa', b'bbb', b'ccc']*10) | ||
| arrow_array = pa.array(npa, type=pa.binary(3), | ||
| mask=np.array([False, False, False]*10)) | ||
| npa[npa == b"bbb"] = b"XXX" | ||
| assert ([b'aaa', b'bbb', b'ccc']*10) == arrow_array.to_pylist() | ||
|
|
||
|
|
||
| def test_binary_array_strided(): | ||
| # Masked | ||
| nparray = np.array([b"ab", b"cd", b"ef"]) | ||
| arrow_array = pa.array(nparray[::2], pa.binary(2), | ||
| mask=np.array([False, False])) | ||
| assert [b"ab", b"ef"] == arrow_array.to_pylist() | ||
|
|
||
| # Unmasked | ||
| nparray = np.array([b"ab", b"cd", b"ef"]) | ||
| arrow_array = pa.array(nparray[::2], pa.binary(2)) | ||
| assert [b"ab", b"ef"] == arrow_array.to_pylist() | ||
|
|
||
|
|
||
| def test_array_invalid_mask_raises(): | ||
| # ARROW-10742 | ||
| cases = [ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this solve the strided conversion case? If so, perhaps you can add a test for it?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly not, I expected it would, but I wrote some tests and it wasn't enough. That's why I made https://issues.apache.org/jira/browse/ARROW-12667 as a follow up issue. So that I can test it for all various types and make sure it works in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added tests and fix for strided binary arrays (with and without mask)