Add range partitioning sqllogictest fixture by gene-bordegaray · Pull Request #22607 · apache/datafusion

gene-bordegaray · 2026-05-29T01:00:13Z

Which issue does this PR close?

Part of Implement Range Repartitioning #22397.
Discussion: [DISCUSSION] Extending Partitioning to Support More Variants #21992.

Rationale for this change

This adds a focused sqllogictest fixture for source-provided Range partitioning before changing optimizer behavior. It follows the direction discussed in #21992 and gives later planning PRs stable baselines for current behavior.

What changes are included in this PR?

Registers a range_partitioned test table for range_partitioning.slt.
Adds a sqllogictest-only source wrapper that reports Range partitioning when range_key is projected, and UnknownPartitioning when it is not.
Adds baselines for grouping on the range key, grouping on a non-range key, joining on the range key, and UNION ALL over range-partitioned inputs.

Are these changes tested?

Yes.

cargo fmt --all
cargo test -p datafusion-sqllogictest --test sqllogictests range_partitioning
cargo clippy --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is test-only infrastructure and sqllogictest coverage.

gene-bordegaray · 2026-05-29T01:03:12Z

cc: @NGA-TRAN @gabotechs @alamb

gabotechs · 2026-05-29T07:05:12Z

+    }
+}
+
+fn register_range_partitioned_table(ctx: &SessionContext) {


🤔 could it be feasable to have a table function instead? just in case this single table is not capable of satisfying all corner cases in the future.

Just food for thought, if you think a hardcoded table is good then let's stick with it (it's actually simpler)

yes I think its feasbile and would be useful for future PRs. Ditto to my comment about this being as simple as possible

alamb

Thanks @gene-bordegaray -- I think this is a good step and could be merged as is. However, I have some suggestions for follow on work/how to make this simpler and more general. Let me know what you think

Thank you @gabotechs and @asolimando for your reviews

alamb · 2026-05-29T11:07:13Z

+}
+
+impl RangePartitionedTable {
+    fn output_partitioning(


I didn't realize it would take so much ceremony to get partitioned data (aka create an entire DataSource/TableProvider) and I agree with @gabotechs that if we go down this approach it will be hard to generate all the possible cases we would like to test.

SO I would suggest we start by integrate pre-defined partitioning in higher level APIs, starating with FileScanConfig, which I think will be needed eventually to use this feature

I dug around a bit and it seems like it will be straightforward to make FileScanConfig::with_partitioned_by_file_group more general

datafusion/datafusion/datasource/src/file_scan_config/mod.rs

Lines 494 to 505 in 04c01bb

/// Set whether file groups are organized by partition column values.

///

/// When set to true, the output partitioning will be declared as Hash partitioning

/// on the partition columns.

pub fn with_partitioned_by_file_group(

mut self,

partitioned_by_file_group: bool,

) -> Self {

self.partitioned_by_file_group = partitioned_by_file_group;

self

}

What I suggest we do (could be a follow on PR)

Update FileScanConfig so it specifies a predefined output partitioning (output_partitioning: Option<Partitioning>) rather than a bool

Deprecate FileScanConfig::with_partitoned_by_file_groups

Add a new FileScanConfig::with_output_partitioning similar to FileScanConfig::with_output_ordering

Use the DataSourceExec::from(...) for that config

That will save a lot of boiler plate in this setup, and I think you'll need the more general form to take advantage of range partitioning externally (aka it won't be wasted code)

Eventually, I think we should be targeting ListingTable s that declared it was RangePartitioned via ListingOptions (and thus eventually via SQL) .

However, it seems like ListingTable/ListingOptions don't have the output hooks yet and we need to buid there incrementally
https://github.com/apache/datafusion/blob/4c909bafc5c50749884fdd80a06235d7bd72dbde/datafusion/catalog-listing/src/options.rs#L32-L31

I think this would be a good clean up. I would be ok with doing this in this PR but think it would be better on its own as I think it will have some ripppling effects

I poke around and will see if it is feasible to add in here

Ok, I think that the FileScanConfig::with_output_partitioning(...) with the ListingOptions::with_output_partitioning(...) is the right move. It will help us declaring range partitioning on actual file/listing tables so that is great

I think we can leave the FileScanConfig / ListingTable work as a follow-up. There is some things we need to ahndle there I don't think belong in here.

There is a smaller cleanup where MemorySourceConfig could have an output_partitioning builder, which would clean up a lot of the boilerplate, but after looking at it I don’t think it is good because adding it only to MemorySourceConfig feels like a one-off public API just for this fixture.

I would be ok with moving this to its own module now and clean up with the right fix after.

alamb · 2026-05-29T11:11:11Z

    );
 }

+// ==============================================================================


if we really need a few hundred lines of setip, maybe we can put it into a module like datafusion/sqllogictest/src/test_context/range_partitioning.rs

However, I have suggestions below that I think could make this substantially smaller

I ended up doing this with the justification in comment on refactoring. If we are ok with this path I can open an issue 😄

alamb

Looks good to me -- thank you @gene-bordegaray - I think this is a good start and we can iterate on this / clean it up in follow ons (clearly I trust that you will do so 😆 )

Since it is just test infrastructure I'll put it in the merge queue now

gene-bordegaray · 2026-05-30T12:38:58Z

I have made a follow up issue on enhancing the file/listing scan APIs for output_partitioning: #22645

…_partitioning` for pre-defined file partitioning (apache#22657) ## Which issue does this PR close?  - Closes apache#22645. ## Rationale for this change  This follows up on apache#22607 by replacing range-partitioning sqllogictest boilerplate with a general file/listing scan API for declared output partitioning. Related: apache#21992, apache#22607, apache#22607 (comment) ## What changes are included in this PR?  - Add declared `output_partitioning` to file scan and listing table configuration. - Preserve declared partition counts during listing-table file grouping. - Serialize scan `output_partitioning` through physical plan proto. - Refactor `range_partitioning.slt` to use a CSV `ListingTable` instead of a custom test-only `TableProvider` / `DataSource`. Contract: - Declared partitioning expressions are written against the full table schema before scan projection. For example, `Range([range_key@0], [(10), (20)], 3)` remains valid if the scan projects `range_key` and falls back to `UnknownPartitioning(3)` if `range_key` is not projected. - Listing tables create one file group per declared output partition (which can exceed `target_partitions`). It is up to the user to plan their partitioning. For example, a 4-partition range declaration creates four scan file groups, adding empty trailing groups when fewer files are present. - File group index is part of the contract: file group `i` must contain rows for declared output partition `i`. DataFusion does not validate row placement, matching other user-declared properties such as sortedness. ## Are these changes tested?  Yes. ## Are there any user-facing changes?  Yes. This adds public API for declaring file/listing scan output partitioning. No breaking API changes.  --------- Co-authored-by: Gabriel <45515538+gabotechs@users.noreply.github.com>

github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 29, 2026

gabotechs reviewed May 29, 2026

View reviewed changes

gene-bordegaray force-pushed the gene.bordegaray/2026/05/range-partitioning-slt-source branch 2 times, most recently from 366e4dc to d0ede58 Compare May 29, 2026 10:43

add range table and initial slt test

ada1564

gene-bordegaray force-pushed the gene.bordegaray/2026/05/range-partitioning-slt-source branch from d0ede58 to ada1564 Compare May 29, 2026 10:43

github-actions Bot added the physical-expr Changes to the physical-expr crates label May 29, 2026

alamb approved these changes May 29, 2026

View reviewed changes

gene-bordegaray added 2 commits May 29, 2026 07:24

Clarify range partitioning SLT fixture comments

91ab2ab

Move range partitioning SLT fixture to module

fd632b2

alamb approved these changes May 29, 2026

View reviewed changes

alamb added this pull request to the merge queue May 29, 2026

Merged via the queue into apache:main with commit b6d4c25 May 29, 2026
39 checks passed

gene-bordegaray mentioned this pull request May 30, 2026

Support declared output partitioning for file/listing scans #22645

Closed

gene-bordegaray mentioned this pull request May 30, 2026

Add ListingOptions::output_partitioning and FileScanConfig::output_partitioning for pre-defined file partitioning #22657

Merged

This was referenced Jun 22, 2026

API: runtime partition extrema for range-aware operators #23089

Open

API: Partitioning::DynamicRange for runtime-discovered split points #23093

Open


	/// Set whether file groups are organized by partition column values.
	///
	/// When set to true, the output partitioning will be declared as Hash partitioning
	/// on the partition columns.
	pub fn with_partitioned_by_file_group(
	mut self,
	partitioned_by_file_group: bool,
	) -> Self {
	self.partitioned_by_file_group = partitioned_by_file_group;
	self
	}

Uh oh!

Conversation

gene-bordegaray commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

gene-bordegaray commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gabotechs May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alamb May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alamb May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gene-bordegaray May 29, 2026

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gene-bordegaray commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gene-bordegaray commented May 29, 2026 •

edited

Loading

gene-bordegaray May 29, 2026 •

edited

Loading

gene-bordegaray May 29, 2026 •

edited

Loading