Skip to content

GH-45263: [MATLAB] Add ability to construct RecordBatchStreamReader from uint8 array#45274

Merged
kevingurney merged 10 commits into
apache:mainfrom
mathworks:GH-45263
Jan 17, 2025
Merged

GH-45263: [MATLAB] Add ability to construct RecordBatchStreamReader from uint8 array#45274
kevingurney merged 10 commits into
apache:mainfrom
mathworks:GH-45263

Conversation

@kevingurney

@kevingurney kevingurney commented Jan 15, 2025

Copy link
Copy Markdown
Member

Rationale for this change

To enable more workflows using the IPC Stream format in the MATLAB interface, this pull request adds the ability to construct a RecordBatchStreamReader from a MATLAB uint8 array.

This is helpful, for example, to enable Arrow-over-HTTP workflows in conjunction with the MATLAB webread function (which can return a uint8 array from an HTTP request).

This is a followup issue to #44923.

What changes are included in this PR?

  1. Added a new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes).
  2. Added a new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).
  3. Changed the signature of the arrow.io.ipc.RecordBatchStreamReader constructor to no longer directly accept a filename as an input. Instead, a arrow.io.ipc.RecordBatchStreamReader can now only be directly constructed from a libmexclass.proxy.Proxy instance. This mirrors the design of other MATLAB classes which wrap Proxy instances in the MATLAB interface. To construct RecordBatchStreamReader objects from an Arrow IPC Stream file on disk, users can instead use the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).

Are these changes tested?

Yes.

  1. Updated tests in arrow/matlab/test/arrow/io/ipc/tRecordBatchStreamReader.m to be parameterized over the fromFile and fromBytes "construction functions".
  2. Added a new test to verify that an appropriate error is thrown if the RecordBatchStreamReader constructor is called directly with an input that is not a libmexclass.proxy.Proxy instance.

Are there any user-facing changes?

Yes.

  1. Users can now create arrow.io.ipc.RecordBatchStreamReader objects from an Arrow IPC Stream file on disk using the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).
  2. Users can now create arrow.io.ipc.RecordBatchStreamReader objects from an in-memory MATLAB uint8 "bytes" array using the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes).

This PR includes breaking changes to public APIs.

This PR changes the signature of the public arrow.io.ipc.RecordBatchStreamReader constructor to no longer directly accept a filename as an input. Instead, a arrow.io.ipc.RecordBatchStreamReader can now only be directly constructed from a libmexclass.proxy.Proxy instance. This mirrors the design of other MATLAB classes which wrap Proxy instances in the MATLAB interface. To construct RecordBatchStreamReader objects from an Arrow IPC Stream file on disk, users can instead use the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).

Future Directions

  1. Use the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes) in an example to demonstrate how to read an Arrow IPC Stream from an HTTP endpoint as part of apache/arrow-experiments.

Notes

  1. Thank you @sgilmore10 for your help with this pull request!

@kevingurney kevingurney marked this pull request as ready for review January 15, 2025 21:33
Comment thread matlab/src/matlab/+arrow/+io/+ipc/RecordBatchStreamReader.m
@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 16, 2025

@kou kou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions Bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Jan 17, 2025
@kevingurney

Copy link
Copy Markdown
Member Author

+1

@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Jan 17, 2025
@kevingurney kevingurney merged commit 1fe27fe into apache:main Jan 17, 2025
@kevingurney kevingurney removed the awaiting changes Awaiting changes label Jan 17, 2025
@kevingurney kevingurney deleted the GH-45263 branch January 17, 2025 21:54
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 1fe27fe.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants