Add Physical `Partitioning::Range` enum variant by gene-bordegaray · Pull Request #22207 · apache/datafusion

gene-bordegaray · 2026-05-15T16:57:19Z

Which issue does this PR close?

First mechanical PR for ExprPartitioning as described in thread: [DISCUSSION] Extending Partitioning to Support More Variants #21992.

Rationale for this change

DataFusion currently cannot truthfully represent range-partitioned physical data. Some sources may be range partitioned, but have to advertise another partitioning shape or fall back to unknown partitioning.

This PR introduces the metadata shape for range partitioning without implementing optimizer or execution behavior yet. The goal is to establish the public representation first, then implement planning, compatibility, and execution behavior incrementally in follow-up PRs.

What changes are included in this PR?

Adds Partitioning::Range(RangePartitioning).
Adds range metadata types:
- RangePartitioning
- RangePartition
- RangeInterval
- RangeBound
Adds proto serialization/deserialization.
Adds not_impl_err! handling for range partitioning at call sites.
Preserves range partitioning through projection only when all partition expressions can be projected, otherwise UnknownPartitioning.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. This adds new public physical partitioning API and proto for range partitioning.

github-actions · 2026-05-15T17:13:02Z

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details

     Cloning apache/main
    Building datafusion-ffi v53.1.0 (current)
       Built [  65.905s] (current)
     Parsing datafusion-ffi v53.1.0 (current)
      Parsed [   0.066s] (current)
    Building datafusion-ffi v53.1.0 (baseline)
       Built [  59.902s] (baseline)
     Parsing datafusion-ffi v53.1.0 (baseline)
      Parsed [   0.066s] (baseline)
    Checking datafusion-ffi v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.374s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [ 128.730s] datafusion-ffi
    Building datafusion-physical-expr v53.1.0 (current)
       Built [  25.410s] (current)
     Parsing datafusion-physical-expr v53.1.0 (current)
      Parsed [   0.050s] (current)
    Building datafusion-physical-expr v53.1.0 (baseline)
       Built [  25.668s] (baseline)
     Parsing datafusion-physical-expr v53.1.0 (baseline)
      Parsed [   0.049s] (baseline)
    Checking datafusion-physical-expr v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.523s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/enum_variant_added.ron

Failed in:
  variant Partitioning:Range in /home/runner/work/datafusion/datafusion/datafusion/physical-expr/src/partitioning.rs:122

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  53.586s] datafusion-physical-expr
    Building datafusion-physical-plan v53.1.0 (current)
       Built [  33.017s] (current)
     Parsing datafusion-physical-plan v53.1.0 (current)
      Parsed [   0.134s] (current)
    Building datafusion-physical-plan v53.1.0 (baseline)
       Built [  33.151s] (baseline)
     Parsing datafusion-physical-plan v53.1.0 (baseline)
      Parsed [   0.137s] (baseline)
    Checking datafusion-physical-plan v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.906s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  68.895s] datafusion-physical-plan
    Building datafusion-proto v53.1.0 (current)
       Built [  56.498s] (current)
     Parsing datafusion-proto v53.1.0 (current)
      Parsed [   0.149s] (current)
    Building datafusion-proto v53.1.0 (baseline)
       Built [  54.883s] (baseline)
     Parsing datafusion-proto v53.1.0 (baseline)
      Parsed [   0.146s] (baseline)
    Checking datafusion-proto v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   2.600s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/enum_variant_added.ron

Failed in:
  variant PartitionMethod:Range in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:2098
  variant PartitionMethod:Range in /home/runner/work/datafusion/datafusion/datafusion/proto/src/generated/prost.rs:2098

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [ 116.416s] datafusion-proto

stuhood

Thanks for iterating on this!

stuhood · 2026-05-18T18:51:36Z

+/// ```text
+/// exprs = [date, city]
+///
+/// partition 0:
+///   date in [2021-01-01, 2022-01-01)
+///   city in [Allston, Boston)
+///
+/// partition 1:
+///   date in [2021-01-01, 2022-01-01)
+///   city in [Boston, NYC)
+/// ```


Are these supposed to be representing compound keys or multi-dimensional partitioning?

If they are compound keys, then I think that it would be clearer to express them as:

[ (2021-01-01, Allston), (2022-01-01, Boston), ... ]

If this is supposed to be multi-dimensional partitioning, then I think that that might be unnecessary, as mentioned on the discussion thread: any particular join only needs to consider 1 dimension (possibly with compound keys).

This is supposed to be multi-dimensional. For example we are partitoned on independent id, time keys thus this would accurately represent our layout.

I do see what you are saying about the join needing a single key, which will work for our case as well. But maybe this can start as single dimension with compound keys and extend if the use case arises to avoid complexity?

Ok, interesting. Yea, if there are multiple consumers who are interested in multi-dimensional partitioning, and it can still reduce down to a base-case of single-dimension partitioning for consumers who don't need that complexity, then perhaps it could make sense to bake it in here.

I'll be honest though: my largest concern is just that I have no experience with multi: only single. So I have less useful feedback to give.

One thing that could likely be a good exercise in terms of the representation would be figuring out what datastructure you would/could use to efficiently partition in multiple dimensions, and then bias towards a representation which allows you to construct that datastructure. In one dimensional partitioning, that's essentially just a binary-tree/b-tree/sorted structure: hence the desire for non-overlapping contiguous ranges (to avoid needing something more complex like an interval tree). For multi-dimensional partitioning, what structure would you use, and what would the inputs to construct one be? I expect that fully covering the space (contiguous, no-overlap) makes the multi-dimensional datastructure cheaper/simpler as well.

Yeah, maybe look to see how other DBs support this. I know ClickHouse and InfluxDB do

I looked at ClickHouse and InfluxDB, I foudn that they store physical partitioning metadata, but did not find anything like a “multi-dimensional repartition this row.”

I looked into systems that try to do a true multi-dimensional partitioning and there aren't many that really do it. I think fo good reason. It would treat the columns like time and city as independent axes, which in simple cases is great and easy but when things start to overlap or more nuanced it seem we would need a routing structure like a grid/sparse map/KDB-tree (these were very complicated).

The closest thing I found was in Sedona where they do spatial partitioning using quadtree and kdbtree:

https://sedona.apache.org/1.7.1/api/rdocs/reference/sedona_apply_spatial_partitioner.html

Quadtree: https://www.geeksforgeeks.org/dsa/quad-tree/

KDBTree: https://en.wikipedia.org/wiki/K-D-B-tree

With compound-key range partitioning it is more clear and still efficient on repartition routing:

1. evaluate `(time, city)` as one ordered key 2. binary-search split points 3. route to a partition.

Compound-key range partitioning should cover most join/planner cases like @stuhood mentioned. We are typicaly asking "are the two sides of this join compatible" for things like dynamic filters. The thing it lacks compare to true multi-dimensional partitioning is independent routing. So, for example, it cannot directly represent “time bucket X and city bucket Y map to partition P” which is useful when we want to do optimizations on each axis independently like pruning on the individual columns:

WHERE time >= '2022' AND time < '2023' AND city >= 'Boston' AND city < 'NYC'

So I think compound-key range partitioning is the right move. If there is a use for this I would say that this should be its own separate implementation.

I foudn that they store physical partitioning metadata, but did not find anything like a “multi-dimensional repartition this row.”

I agree -- Influx's model is best modeled as "compound key" (it is not multi-dimensional partitioning)

So I think compound-key range partitioning is the right move. If there is a use for this I would say that this should be its own separate implementation.

I agree

stuhood · 2026-05-18T19:05:40Z

+    lower: Option<RangeBound>,
+    upper: Option<RangeBound>,


The problem with encoding these as intervals as opposed to points (as suggested here) is that in order to use a more efficient re-partitioning strategy based on a sorted representation, you need to start by converting this representation back into the points representation, which involves a bunch of validation that the ranges are not overlapping, sorted, contiguous (so that you can floor), etc.

I don't feel strongly about it, but I think that a point-based representation involves a lot fewer special cases.

A points representation must cover the entire set of valid values (by construction). That doesn't let you use the partitioning strategy to short circuit if the ranges "Partially" cover the valid values (in the sense of being a partial function... e.g. TryFrom vs From). But honestly, I don't think that allowing for partial partitioning is a good idea anyway: for example, the Repartition operator wouldn't actually know what to do with a row which didn't map to any partition: it can't discard rows, because it doesn't know what operator is consuming it... so it would have to error. So I think that in practice, all Range partitioning strategies would need to be complete anyway, and this extra generality is just complexity.

The reason I leaned toward this was readability. I think we could make the documentation clear or even provide helpers to abstract this nicely so I am not concerned with this.

I am ok with dong split points as well as long as other maintainers think this is ok for public API 👍

A points representation must cover the entire set of valid values (by construction).

I also prefer a split points representation for the same reason. Specifically, I think split points ensures that any particular row value is in EXACTLY one partition. We would prevent user errors that could lead to cases where there are rows that don't belong in any partition or in more than one partition.

This also would make the sorting semantics easier

alamb · 2026-05-20T13:01:26Z

Note there is more discussion here

[DISCUSSION] Extending Partitioning to Support More Variants #21992 (comment)

alamb

Thank you @stuhood , @Dandandan and @gene-bordegaray for your work on this

I think this looks good to me. I see a new update just got pushed so submitting y feedback now and will review what just got pushed

Suggestions:

We wait until we branch datafusion 54 (#21080) -- should be today or tomorrow

alamb · 2026-05-20T13:48:02Z

+/// ```text
+/// exprs = [date, city]
+///
+/// partition 0:
+///   date in [2021-01-01, 2022-01-01)
+///   city in [Allston, Boston)
+///
+/// partition 1:
+///   date in [2021-01-01, 2022-01-01)
+///   city in [Boston, NYC)
+/// ```


I foudn that they store physical partitioning metadata, but did not find anything like a “multi-dimensional repartition this row.”

I agree -- Influx's model is best modeled as "compound key" (it is not multi-dimensional partitioning)

So I think compound-key range partitioning is the right move. If there is a use for this I would say that this should be its own separate implementation.

I agree

alamb · 2026-05-20T13:55:50Z

+    lower: Option<RangeBound>,
+    upper: Option<RangeBound>,


A points representation must cover the entire set of valid values (by construction).

I also prefer a split points representation for the same reason. Specifically, I think split points ensures that any particular row value is in EXACTLY one partition. We would prevent user errors that could lead to cases where there are rows that don't belong in any partition or in more than one partition.

This also would make the sorting semantics easier

alamb

Looking really nice to me. I had a few more small API suggestions, but overall I think the representation is 👌

Let's wait to see what @stuhood thinks too

alamb · 2026-05-20T14:46:12Z

+    sort_exprs: Vec<PhysicalSortExpr>,
+    /// Boundaries between adjacent partitions. `N` split points define `N + 1`
+    /// lower-inclusive, upper-exclusive partitions. Values equal to a split
+    /// point belong to the partition after that split point.


See above for a potential more formal way of specifying this. i recommend making the docs on RangePartitioning detailed and just leave a pointer from split_points to the main docs

alamb · 2026-05-20T14:47:26Z

+    /// Ordered partitioning key. Sort options are part of the partitioning
+    /// because `ASC`/`DESC` and null ordering decide which side of a split point
+    /// a row belongs to.
+    sort_exprs: Vec<PhysicalSortExpr>,


Instead of sort_exprs, what do you think about using pre-existing LexOrdering: https://docs.rs/datafusion/latest/datafusion/physical_expr/struct.LexOrdering.html ?

yup much better 👍

alamb · 2026-05-20T14:50:09Z

+    /// Boundaries between adjacent partitions. `N` split points define `N + 1`
+    /// lower-inclusive, upper-exclusive partitions. Values equal to a split
+    /// point belong to the partition after that split point.
+    split_points: Vec<Vec<ScalarValue>>,


For future API extenability I recommend wrapping this Vec in a struct

split_points: Vec<SpitPoint>,

And then

struct SplitPoint { points: Vec<ScalarValue> }

That woudl both give us a good place to add documentation and things like Display impls, but if we ever wanted to add additional types of split points (like maybe inf or expressions) we wouldn't have to make a bunch of API changes

yes, good point thank you

stuhood

This representation looks good to me!

As to the multi-dimensional partitioning decision: I am very fine either way, but I do think that it could be very cool to actually lean in there (especially with 3+ consumers doing multi-dimensional)... it could potentially allow more optimization passes in upstream around pruning partitions.

But it could also be a third variant at some point, so not blocking.

alamb

Thank you @gene-bordegaray and @stuhood -- I had a few small comments but I think this PR now looks (really) good to merge

We have also branched the 54 release so I think we are clear to merge it from a release perspective.

Since this is a fairly substantial / fundamental new feature, before merging I think we should

Send an email to the dev list (and maybe discord channel) saying that we have a proposed new API for a new partitioning and invite anyone else that is interested to review and provide feedback
Leave this open for several more days to allow more time to collect feedback
Add a note in the 55 upgrade guide (to be written) explaining that we adding new range partitioning

All in all, this is great work -- thanks again

alamb · 2026-05-20T21:24:38Z

+
+    #[test]
+    fn test_multi_partition_range_does_not_satisfy_hash_distribution() -> Result<()> {
+        let schema = Arc::new(Schema::new(vec![


the setup for the schema and creating col_a and col_b and the quivalence properties and range partitioning is the same in a bunch of these cases -- maybe it could be moved into a helper function to reduce code repetition which would make it easier to verify what each test is checking)

For example, if you had something like this

struct TestFixture { /// schema with columns a, b schema: SchemaRef, col: Arc<dyn PhysicalExpr>, ... }

You could write this test with a lot less boilerplate like

#[test] fn test_multi_partition_range_does_not_satisfy_hash_distribution() -> Result<()> { let fixture = TestFixture::new(); let required = Distribution::HashPartitioned(vec![fixture.col_a, fixture.col_b]); assert_eq!( range_partitioning.satisfaction(&required, &fixture.eq_properties, false), PartitioningSatisfaction::NotSatisfied ); }

added a fixture for all partitioning tests, eliminted lots of boiler plate 😄

alamb · 2026-05-20T21:31:16Z

+    /// The caller is responsible for satisfying the contract documented on
+    /// [`RangePartitioning`].
+    pub fn new(ordering: LexOrdering, split_points: Vec<SplitPoint>) -> Self {
+        Self {


Given there is an invariant that all the values in split_points are in the correct order compared to the ordering, it seems like we should at least offer a RangePartitioning::try_new that validates that invariant and document that new does not chekc

Good point, added this and use now in the proto

Also added some unit tests covering this 👍

gene-bordegaray · 2026-05-26T12:31:25Z

Feel free to ping when we are going to merge this guy and I can rebase it 👍

…05/expr_partitioning_enum_mechanical

alamb · 2026-05-26T19:56:51Z

I took the liberty of merging up from main to resolve a conflict. I'll plan to merge this tomorrow unless someone wants more time to review or add comments.

I think we should drop a note to the mailing list (dev@datafusion.apache.org)

I didn't see anything on https://lists.apache.org/list.html?dev@datafusion.apache.org

Maybe just a quick note like this: https://lists.apache.org/thread/mbw6q0ccndn2xq0kq8f28jrj5wppzqdn

I think we can merge this PR in first, and then send the note (as people will hvae plenty of time to comment before we release DataFusion 55)

alamb · 2026-05-27T17:48:57Z

Here is a note that was sent to the dev mailing list: https://lists.apache.org/thread/14d9fthyoyq76xd3yb89swxclvw91jfp

I put this PR in the merge queue -- thank you @gene-bordegaray -- very excited to see this make progress

alamb · 2026-05-27T17:49:09Z

Thanks again for your help @stuhood and @NGA-TRAN

gene-bordegaray · 2026-05-27T18:24:47Z

Thanks again for your help @stuhood and @NGA-TRAN

@alamb @NGA-TRAN @stuhood thank you everyone for the feedback, very excited for this 😄

github-actions Bot added physical-expr Changes to the physical-expr crates proto Related to proto crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels May 15, 2026

gene-bordegaray mentioned this pull request May 15, 2026

[DISCUSSION] Extending Partitioning to Support More Variants #21992

Open

Dandandan reviewed May 15, 2026

View reviewed changes

Comment thread datafusion/physical-expr/src/partitioning.rs Outdated

github-actions Bot added the auto detected api change Auto detected API change label May 15, 2026

stuhood reviewed May 15, 2026

View reviewed changes

Comment thread datafusion/physical-plan/src/repartition/mod.rs Outdated

gene-bordegaray force-pushed the gene.bordegaray/2026/05/expr_partitioning_enum_mechanical branch from 366c4ac to cad5e05 Compare May 18, 2026 15:12

gene-bordegaray changed the title ~~Add expression partitioning enum variant~~ Add range partitioning enum variant May 18, 2026

stuhood reviewed May 18, 2026

View reviewed changes

alamb reviewed May 20, 2026

View reviewed changes

alamb changed the title ~~Add range partitioning enum variant~~ Add Physical Partitioning::Range enum variant May 20, 2026

stuhood approved these changes May 20, 2026

View reviewed changes

alamb approved these changes May 20, 2026

View reviewed changes

gene-bordegaray added 10 commits May 21, 2026 13:58

Add expression partitioning enum variant

fdfdbff

Add range partitioning metadata

e336867

Model range partitioning with split points

33ae1c8

Refine range partitioning API

e18739e

Use LexOrdering for range partitioning

f6e818e

Link range partitioning follow-up issues

b979109

Dedupe docs a bit

68e732e

Clean up range partitioning docs and tests

fbe536a

Add try_new constructor with validation and explicit matching

b1da6a6

Tighten partitioning tests

11c4c18

gene-bordegaray force-pushed the gene.bordegaray/2026/05/expr_partitioning_enum_mechanical branch from d1aa4c5 to 11c4c18 Compare May 21, 2026 18:07

Merge remote-tracking branch 'apache/main' into gene.bordegaray/2026/…

66e95c9

…05/expr_partitioning_enum_mechanical

alamb added this pull request to the merge queue May 27, 2026

Merged via the queue into apache:main with commit 7a6b062 May 27, 2026
38 checks passed

gabotechs mentioned this pull request May 29, 2026

Add range partitioning sqllogictest fixture #22607

Merged

This was referenced Jun 22, 2026

API: runtime partition extrema for range-aware operators #23089

Open

API: Partitioning::DynamicRange for runtime-discovered split points #23093

Open

feat(physical-expr): add Partitioning::DynamicRange variant #23094

Open

Uh oh!

Conversation

gene-bordegaray commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented May 20, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Suggestions:

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

stuhood May 18, 2026 •

edited

Loading

gene-bordegaray May 21, 2026 •

edited

Loading