Skip to content

Conversation

@shashidhar-bm
Copy link
Contributor

@shashidhar-bm shashidhar-bm commented Dec 19, 2025

Which issue does this PR close?

Rationale for this change

The test suite for ParquetOpener exhibited substantial code duplication across multiple test functions. Each test was constructing ParquetOpener instances with largely identical field values, resulting in verbose and repetitive code that hindered maintainability and obscured the distinguishing characteristics of each test case.

What changes are included in this PR?

  • Added ParquetOpenerBuilder struct in the test module (#[cfg(test)]) with sensible defaults matching the original test code patterns
  • Refactored 8 test functions to use the builder pattern:
    • test_prune_on_statistics
    • test_prune_on_partition_statistics_with_dynamic_expression
    • test_prune_on_partition_values_and_file_statistics
    • test_prune_on_partition_value_and_data_value
    • test_opener_pruning_skipped_on_static_filters
    • test_reverse_scan_row_groups
    • test_reverse_scan_single_row_group
    • test_reverse_scan_with_row_selection
  • Reduced code from ~28 lines per test to ~6-8 lines, highlighting only the fields that differ

Are these changes tested?

Yes

Are there any user-facing changes?

No

Copilot AI review requested due to automatic review settings December 19, 2025 10:57
@github-actions github-actions bot added the datasource Changes to the datasource crate label Dec 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ParquetOpener test suite to eliminate substantial code duplication by introducing a builder pattern. The main improvement is the addition of a test-only ParquetOpenerBuilder that provides sensible defaults and fluent methods for configuring ParquetOpener instances, reducing each test's setup from ~28 lines to ~6-8 lines.

  • Added ParquetOpenerBuilder struct with default values matching original test patterns
  • Refactored 8 test functions to use the builder pattern, making test intent clearer
  • Maintained test behavior while significantly reducing code duplication and improving maintainability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ShashidharM0118 -- this looks great. there is just one small suggestion, but otherwise it looks great to me

FYI @xudong963 and @adriangb

max_predicate_cache_size: None,
reverse_row_groups: false,
}
ParquetOpenerBuilder::new()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so much nicer

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit about the schemas but otherwise looks good, thank you! I've wanted this every time I looked at these tests but just never had the bandwith to do it.

TableSchema already contains file_schema internally, so storing it
separately in the builder was redundant. This simplifies the builder
by removing duplicate state and ensures file_schema is always derived
from table_schema in build().
@shashidhar-bm shashidhar-bm force-pushed the add-parquet-opener-builder branch from b139c4a to e6ec5f9 Compare December 20, 2025 05:01
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this!

@adriangb adriangb added this pull request to the merge queue Dec 20, 2025
Merged via the queue into apache:main with commit d8e68a4 Dec 20, 2025
27 checks passed
@shashidhar-bm shashidhar-bm deleted the add-parquet-opener-builder branch December 20, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ParquetOpenerBuilder to make code more clean and readable

3 participants