Skip to content

GH-40078: [C++] Import/Export ArrowDeviceArrayStream#40807

Merged
zeroshade merged 41 commits into
apache:mainfrom
zeroshade:import-export-device-stream
May 21, 2024
Merged

GH-40078: [C++] Import/Export ArrowDeviceArrayStream#40807
zeroshade merged 41 commits into
apache:mainfrom
zeroshade:import-export-device-stream

Conversation

@zeroshade

@zeroshade zeroshade commented Mar 26, 2024

Copy link
Copy Markdown
Member

Rationale for this change

The original PRs for adding support for importing and exporting the new C Device interface (#36488 / #36489) only added support for the Arrays themselves, not for the stream structure. We should support both.

What changes are included in this PR?

Adding parallel functions for Import/Export of streams that accept ArrowDeviceArrayStream.

Are these changes tested?

Test writing in progress, wanted to get this up for review while I write tests.

Are there any user-facing changes?

No, only new functions have been added.

@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #40078 has been automatically assigned in GitHub to PR creator.

@paleolimbot paleolimbot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited this will work for ChunkedArray as well!

Comment thread cpp/src/arrow/c/bridge.cc Outdated
Comment thread cpp/src/arrow/c/bridge.h Outdated
Comment thread cpp/src/arrow/c/helpers.h Outdated
Comment thread cpp/src/arrow/record_batch.h Outdated
@github-actions github-actions Bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Mar 27, 2024
@zeroshade zeroshade force-pushed the import-export-device-stream branch from deb9fa3 to 37e93b0 Compare March 27, 2024 16:43
@github-actions github-actions Bot added awaiting changes Awaiting changes Component: Python awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Mar 27, 2024

@jorisvandenbossche jorisvandenbossche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Comment thread cpp/src/arrow/record_batch.h Outdated
Comment on lines 260 to 278

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this part. AFAIK a RecordBatch currently is agnostic the device of its buffers. So for example you can have a RecordBatch backed by buffers that live in CUDA memory, but then this method will always hardcoded return NULL, which is not correct in that case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, at the moment I don't have an implementation there. I put a note on the ImportDeviceRecordBatchReader documentation if you look at bridge.h:

/// We are not yet bubbling the sync events from the buffers up to
/// the `GetSyncEvent` method of an imported RecordBatch. This will be added in a future
/// update.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that longer term there would be a generic implementation here in RecordBatch that checks the sync events of its underlying buffers? Because in practice, we don't subclass RecordBatch in Arrow C++ to add CUDA support, so that method cannot be overriden

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was either that or introducing a subclass, yes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've propagated the event and device_type throughout the record batch and the Make functions (defaulting it to nullptr) which should allow us to ensure this is all correct now. I'll update the documentation comments accordingly

Comment thread cpp/src/arrow/c/bridge.h Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially related to my other comment, but the existing ExportDeviceArray has a sync keyword that the user of this API needs to provide. Is there a reason those methods don't have that?
The returned ArrowDeviceArrayStream itself doesn't have a sync event member, but the ArrowDeviceArrays that it will return still have that. The user shouldn't pass the sync event to set in those arrays?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I put the GetSyncEvent on the RecordBatch object since there could potentially be a different sync event for each record batch returned by the reader. Right now it currently is hardcoded to return null, but the GetSyncEvent function is virtual, so someone could potentially have a RecordBatchReader that returns a class which inherits from RecordBatch that implements GetSyncEvent to return the correct event object etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add GetSyncEvent() when it actually does something? It seems like this is perhaps guessing at a future API that we don't know will exist yet or that we are not sure will be used?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a bunch of changes propagating the sync event and the device type now, so GetSyncEvent now does something :)

Thoughts?

@felipecrv felipecrv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Seems to match the structure of the single array x non-stream case.

Comment thread cpp/src/arrow/c/helpers.h Outdated
Comment thread cpp/src/arrow/c/bridge.cc Outdated
Comment thread cpp/src/arrow/record_batch.h Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be wrong if we import a RecordBatchReader from a non-CPU ArrowDeviceArrayStream? Or is that right now not yet possible?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of yet, I haven't implemented figuring out what type to send here. Since we aren't storing the device type at the record batch level, this would need to spin through the buffers of its columns which could potentially be from multiple devices if a user did something weird. So I'm not sure what the best solution here is. I'd rather not add a RecordBatch level device_type member, but I don't yet know how to best handle the situation if a consumer does something bad

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like for now this will have to be supplied by the caller on construction? That seems like a better intermediate state than returning a value that could lead to segfaults?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is an abstract class, so there's no constructor here :-)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the suggestion is we should add a device_type member to the RecordBatch class and have it be provided?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've propagated the device_type throughout the readers now and to the static Make methods for constructing the readers, so now this should be correct.

@jorisvandenbossche

Copy link
Copy Markdown
Member

Needs some tests?

Comment thread cpp/src/arrow/c/helpers.h Outdated
Comment thread cpp/src/arrow/record_batch.h Outdated
Comment thread cpp/src/arrow/record_batch.h Outdated
Comment thread cpp/src/arrow/record_batch.h Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead add the necessary declarations to type_fwd.h?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to move the nested Device::SyncEvent outside of the Device class to do that since you can't forward declare an inner class like that. It would also require pushing the entire DeviceAllocationType declaration to type_fwd.h which I'm not sure if we want to do. So I don't think we can do this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least moving DeviceAllocationType to type_fwd.h would make sense.

As for Device::SyncEvent, that's indeed a problem with nested classes (and a good reason to avoid them :-)). That could be fixed by moving SyncEvent out of Device and adding a compatibility alias.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least moving DeviceAllocationType to type_fwd.h would make sense.

I've done that here #43853

Comment thread cpp/src/arrow/record_batch.h Outdated
Comment thread cpp/src/arrow/record_batch.h Outdated
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8169d6e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.

int type = 0;
for (const auto& buf : buffers) {
if (!buf) continue;
if (type == 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition implies that, conversely, in non-debug mode we could immediately return when we encounter a buffer? Instead of continue looping on all buffers and children...

/// \see GetNullCount
int64_t ComputeLogicalNullCount() const;

/// \brief Returns the device_type of the underlying buffers and children

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but we tend to use infinitives in docstring ("return", not "returns")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants