Skip to content

feat: logical plan protobuf representation for range repartitioning#23030

Merged
Dandandan merged 13 commits into
apache:mainfrom
saadtajwar:saadt/logical-protobuf-serialization-range-repartitioning
Jun 22, 2026
Merged

feat: logical plan protobuf representation for range repartitioning#23030
Dandandan merged 13 commits into
apache:mainfrom
saadtajwar:saadt/logical-protobuf-serialization-range-repartitioning

Conversation

@saadtajwar

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The range repartitioning scheme for logical plans does not currently have a protobuf representation.

What changes are included in this PR?

A protobuf representation of the RangeRepartition struct was added to datafusion.proto, and the codegened Rust types were created. Added logic for serializing and deserializing to and from the protobuf representation, and a roundtrip test as well!

Are these changes tested?

Yes! Added a test in roundtrip_logical_plan

Are there any user-facing changes?

No, adding internal protobuf serialization support for an existing logical plan variant

@github-actions github-actions Bot added the proto Related to proto crate label Jun 19, 2026
Comment thread datafusion/proto-models/proto/datafusion.proto Outdated
Error::General(message.into())
}

pub fn parse_protobuf_range_split_point(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied this & serialize_range_split_point from the physical plan logic by @gene-bordegaray ! 😁

Comment thread datafusion/proto/tests/cases/roundtrip_logical_plan.rs Outdated
Merge new MergeInto/DML imports from upstream with range
partitioning additions.
@saadtajwar

Copy link
Copy Markdown
Contributor Author

@gene-bordegaray tagging you in here - please let me know your thoughts! Looking forward to any feedback & working together further on this effort! 😁 🎉

@gene-bordegaray gene-bordegaray left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looking good 😄 , thank you for this work. Left some minor comments 👍

Comment thread datafusion/proto-models/proto/datafusion.proto Outdated
Comment thread datafusion/proto/tests/cases/roundtrip_logical_plan.rs Outdated
Comment thread datafusion/proto/src/logical_plan/from_proto.rs Outdated
Comment thread datafusion/proto-models/proto/datafusion.proto Outdated
Comment thread datafusion/proto/tests/cases/roundtrip_logical_plan.rs Outdated
@saadtajwar

Copy link
Copy Markdown
Contributor Author

@gene-bordegaray pushed changes to address your comments - let me know your thoughts! Really appreciate the time taken to review!

@gene-bordegaray gene-bordegaray left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now 👍

Thank you!

For next reviewer, there is some duplication in the helpers parse_protobuf_range_split_point / serialize_range_split_point across logical and physical but think it is ok for now. If another use pops up we could make some common helper for this.

cc: @gabotechs

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @saadtajwar and @gene-bordegaray -- looks good to me

Field::new("ts", DataType::Int64, false),
Field::new("region", DataType::Utf8, false),
]));
let table = Arc::new(datafusion::datasource::empty::EmptyTable::new(Arc::clone(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a minor nit, datafusion::datasource::empty::EmptyTable: is pretty hard on the eyes. It would be nicer if there was a single use datafusion::datasource::empty::EmptyTable; at the top and then used down here (and the other tests)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you! Is there anything needed on my end to get this in? Appreciate you taking the time to review this!

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-proto v54.0.0 (current)
       Built [  57.761s] (current)
     Parsing datafusion-proto v54.0.0 (current)
      Parsed [   0.020s] (current)
    Building datafusion-proto v54.0.0 (baseline)
       Built [  57.413s] (baseline)
     Parsing datafusion-proto v54.0.0 (baseline)
      Parsed [   0.020s] (baseline)
    Checking datafusion-proto v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.379s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 117.803s] datafusion-proto
    Building datafusion-proto-models v54.0.0 (current)
       Built [  23.487s] (current)
     Parsing datafusion-proto-models v54.0.0 (current)
      Parsed [   0.134s] (current)
    Building datafusion-proto-models v54.0.0 (baseline)
       Built [  23.482s] (baseline)
     Parsing datafusion-proto-models v54.0.0 (baseline)
      Parsed [   0.134s] (baseline)
    Checking datafusion-proto-models v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   2.494s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/enum_variant_added.ron

Failed in:
  variant PartitionMethod:Range in /home/runner/work/datafusion/datafusion/datafusion/proto-models/src/generated/prost.rs:225
  variant PartitionMethod:Range in /home/runner/work/datafusion/datafusion/datafusion/proto-models/src/generated/prost.rs:225

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  51.204s] datafusion-proto-models

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 22, 2026
@Dandandan Dandandan added this pull request to the merge queue Jun 22, 2026
Merged via the queue into apache:main with commit a1f56b7 Jun 22, 2026
35 checks passed
@saadtajwar

Copy link
Copy Markdown
Contributor Author

@Dandandan & @alamb & @gene-bordegaray - thanks so much again for helping get this in team! Super excited to continue contributing to Datafusion & please let me know if there's anything else on this specific initiative I can help out with! 😁 🎉

@gene-bordegaray

Copy link
Copy Markdown
Contributor

@Dandandan & @alamb & @gene-bordegaray - thanks so much again for helping get this in team! Super excited to continue contributing to Datafusion & please let me know if there's anything else on this specific initiative I can help out with! 😁 🎉

No problem, I should be thanking you for great work. I have other related issues up for range partitioning. Feel free to pick up any you want to take on. I am more than happy to review and answer questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support logical protobuf serialization for range repartitioning

4 participants