Skip to content

ARROW-7858: [C++][Python] Support casting from ExtensionArray#6633

Closed
kszucs wants to merge 9 commits into
apache:masterfrom
kszucs:ext-cast
Closed

ARROW-7858: [C++][Python] Support casting from ExtensionArray#6633
kszucs wants to merge 9 commits into
apache:masterfrom
kszucs:ext-cast

Conversation

@kszucs

@kszucs kszucs commented Mar 16, 2020

Copy link
Copy Markdown
Member

This add support for casting from ExtensionType. We should probably add support for converting to an ExtensionType, but we can handle it in a follow-up.

@github-actions

Copy link
Copy Markdown

@kszucs

kszucs commented Mar 16, 2020

Copy link
Copy Markdown
Member Author

The test failure is related to #6632

@jorisvandenbossche jorisvandenbossche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to its own storage type (so eg not a different integer bit), still returns an ExtensionArray. With the example from the test:

In [3]: arr.cast(pa.int64()) 
Out[3]: 
<pyarrow.lib.ExtensionArray object at 0x7f4b45ca5fa8>
[
  1,
  2,
  3,
  4
]

In [4]: arr.cast(pa.int32()) 
Out[4]: 
<pyarrow.lib.Int32Array object at 0x7f4b45ca5348>
[
  1,
  2,
  3,
  4
]

I am not fully sure that this is the expected behaviour.

@kszucs

kszucs commented Mar 17, 2020

Copy link
Copy Markdown
Member Author

Nice catch, it was caused by the identity shortcut. Added a separate kernel which handles the identity case properly.

@jorisvandenbossche jorisvandenbossche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now! (but not very familiar with the cast kernel code)

@bkietz bkietz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few notes

Comment thread python/pyarrow/tests/test_extension_type.py
Comment thread cpp/src/arrow/compute/kernels/cast.cc Outdated
Comment thread cpp/src/arrow/compute/kernels/cast.cc Outdated
@kszucs

kszucs commented Mar 17, 2020

Copy link
Copy Markdown
Member Author

@bkietz please take a look again

Comment thread cpp/src/arrow/compute/kernels/cast.cc Outdated
Comment thread cpp/src/arrow/compute/kernels/cast.cc Outdated
Comment thread cpp/src/arrow/compute/kernels/cast_test.cc
kszucs and others added 2 commits March 17, 2020 21:48
Co-Authored-By: Benjamin Kietzman <bengilgit@gmail.com>

@wesm wesm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title misled me when I reviews this, it should be "Support casting the storage of an ExtensionArray" or similar

// construct an ArrayData object with the underlying storage type
auto new_input = input.array()->Copy();
new_input->type = storage_type_;
return InvokeWithAllocation(ctx, storage_caster_.get(), new_input, out);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it allocate if the out_type and storage_type are the same?

@kszucs kszucs Mar 18, 2020

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests on both C++ and Python side, seems like no allocation happens.

Comment thread cpp/src/arrow/compute/kernels/cast.cc Outdated
}
return true;
return (other.extension_name() == this->extension_name());
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be the default implementation of ExtensionEquals

@kszucs kszucs Mar 18, 2020

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and the storage type should be equal, created a follow-up JIRA https://issues.apache.org/jira/browse/ARROW-8143

@wesm

wesm commented Mar 18, 2020

Copy link
Copy Markdown
Member

There was more C++ code in the implementation than I expected, not sure what could be done to make implementing something that "seems easy" (on paper) easier

@kszucs kszucs changed the title ARROW-7858: [C++][Python] Support casting an Extension type to its storage type ARROW-7858: [C++][Python] Support casting from ExtensionArray Mar 18, 2020

@wesm wesm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks @kszucs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants