Skip to content

ARROW-16782: [Format] Add REE definitions to FlatBuffers#14176

Merged
zeroshade merged 28 commits into
apache:masterfrom
zagto:rle-flatbuffers
Jan 12, 2023
Merged

ARROW-16782: [Format] Add REE definitions to FlatBuffers#14176
zeroshade merged 28 commits into
apache:masterfrom
zagto:rle-flatbuffers

Conversation

@zagto

@zagto zagto commented Sep 20, 2022

Copy link
Copy Markdown
Contributor

No description provided.

@github-actions

Copy link
Copy Markdown

@zeroshade zeroshade marked this pull request as ready for review December 19, 2022 19:44
@zeroshade zeroshade changed the title ARROW-16782: [Format] Add RLE definitions to FlatBuffers ARROW-16782: [Format] Add REE definitions to FlatBuffers Dec 19, 2022
@zeroshade zeroshade requested review from kou, pitrou and rok December 20, 2022 15:34
@kou kou requested review from alamb and jorgecarleitao December 21, 2022 07:33

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FBS change looks good to me -- thank you @zagto

Is it possible to incorporate the documentation from #13333 into this pR as well so that the new changes are documented?

For anyone following along, see https://lists.apache.org/thread/539scy67qom5t2fkkd1m6fvh5htvwo3s discussion for more context

cc @zeroshade

@pitrou

pitrou commented Jan 3, 2023

Copy link
Copy Markdown
Member

Is it possible to incorporate the documentation from #13333 into this pR as well so that the new changes are documented?

+1 on this. It will also make reviewing easier.

@zeroshade

Copy link
Copy Markdown
Member

@alamb @pitrou I rebased the docs from #13333 into this PR (to maintain the commit history from @zagto) and then updated them with the new name being "Run-End encoded" rather than "Run-Length encoded". Let me know if there's anything else needed to add.

Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
@zeroshade

zeroshade commented Jan 4, 2023

Copy link
Copy Markdown
Member

@github-actions crossbow submit preview-docs

EDIT: this PR was old, I've rebased it now and am going to make this comment again so it should work now.

@github-actions

github-actions Bot commented Jan 4, 2023

Copy link
Copy Markdown
Unable to match any tasks for `preview-docs`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3840303309

Comment thread format/Schema.fbs Outdated
Comment thread docs/source/format/Columnar.rst Outdated
@zeroshade

Copy link
Copy Markdown
Member

@westonpace I've updated with a line prescribing the standard names for the children and updated the example layout to add the underscore so that it uses those standard names run_ends and values.

Once I get some explicit approvals on this I'll merge it in.

Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread format/Schema.fbs
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Comment thread docs/source/format/Columnar.rst
Comment thread docs/source/format/Columnar.rst Outdated
Comment thread docs/source/format/Columnar.rst Outdated
zeroshade and others added 2 commits January 10, 2023 10:59
Co-authored-by: Weston Pace <weston.pace@gmail.com>
@lidavidm

Copy link
Copy Markdown
Member

General question here: how are we going to handle versioning of the Arrow specification? Are we going to merge this and down the line release an "Arrow specification 1.1.0"/MetadataVersion.V6 once we also have types like StringView, or are we going to bump these versions for each new type?

@zeroshade

Copy link
Copy Markdown
Member

That is a very good question @lidavidm.

What do you think @westonpace @pitrou @alamb @ianmcook ?

@pitrou

pitrou commented Jan 11, 2023

Copy link
Copy Markdown
Member

I think we only increase MetadataVersion when we make incompatible changes to the Flatbuffers definition, which is not the case here.

As for the format version, we should perhaps bump it:

/// Format Version History.
/// Version 1.0 - Forward and backwards compatibility guaranteed.
/// Version 1.1 - Add Decimal256 (No format release).
/// Version 1.2 (Pending)- Add Interval MONTH_DAY_NANO

@zeroshade

Copy link
Copy Markdown
Member

@pitrou I can bump it in the docs in this PR, but do we need to do a vote on the mailing list or anything to make the new version official?

@pitrou

pitrou commented Jan 11, 2023

Copy link
Copy Markdown
Member

I don't think a new vote is needed, though I don't know for sure.

@lidavidm

Copy link
Copy Markdown
Member

I suppose the version bump is implicit in adding a new data type? Looking at the Git history/the MONTH_DAY_NANO PR, the versions were just bumped when we added the type. (...And 1.2 is still "pending", guess we should take that out now and put 1.3 as adding REE.)

@zeroshade

zeroshade commented Jan 11, 2023

Copy link
Copy Markdown
Member

I've updated the Schema.fbs and Columnar.rst files for the version and regenerated files to pick up the new comments.

I've got two approvals here, so unless someone comments otherwise I'll merge this after EOD today, to give people a chance to add any final comments. Thanks everyone!

Comment thread format/Schema.fbs Outdated

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed only the changes to docs/source/format/Columnar.rst and format/Schema.fbs but they look great to me. Thank you to everyone who has been working on this proposal -- it is super exciting to see it so close

Comment thread format/Schema.fbs
/// Version 1.0 - Forward and backwards compatibility guaranteed.
/// Version 1.1 - Add Decimal256 (No format release).
/// Version 1.2 (Pending)- Add Interval MONTH_DAY_NANO
/// Version 1.1 - Add Decimal256.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@ursabot

ursabot commented Jan 13, 2023

Copy link
Copy Markdown

Benchmark runs are scheduled for baseline = d0ffedb and contender = 3146588. 3146588 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.99% ⬆️0.24%] test-mac-arm
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️1.84% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3146588a ec2-t3-xlarge-us-east-2
[Finished] 3146588a test-mac-arm
[Finished] 3146588a ursa-i9-9960x
[Finished] 3146588a ursa-thinkcentre-m75q
[Finished] d0ffedbe ec2-t3-xlarge-us-east-2
[Failed] d0ffedbe test-mac-arm
[Finished] d0ffedbe ursa-i9-9960x
[Finished] d0ffedbe ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants