-
Notifications
You must be signed in to change notification settings - Fork 1.9k
refactor: add ParquetOpenerBuilder to reduce test code duplication #19405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: add ParquetOpenerBuilder to reduce test code duplication #19405
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the ParquetOpener test suite to eliminate substantial code duplication by introducing a builder pattern. The main improvement is the addition of a test-only ParquetOpenerBuilder that provides sensible defaults and fluent methods for configuring ParquetOpener instances, reducing each test's setup from ~28 lines to ~6-8 lines.
- Added
ParquetOpenerBuilderstruct with default values matching original test patterns - Refactored 8 test functions to use the builder pattern, making test intent clearer
- Maintained test behavior while significantly reducing code duplication and improving maintainability
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ShashidharM0118 -- this looks great. there is just one small suggestion, but otherwise it looks great to me
FYI @xudong963 and @adriangb
| max_predicate_cache_size: None, | ||
| reverse_row_groups: false, | ||
| } | ||
| ParquetOpenerBuilder::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so much nicer
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit about the schemas but otherwise looks good, thank you! I've wanted this every time I looked at these tests but just never had the bandwith to do it.
TableSchema already contains file_schema internally, so storing it separately in the builder was redundant. This simplifies the builder by removing duplicate state and ensures file_schema is always derived from table_schema in build().
b139c4a to
e6ec5f9
Compare
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this!
Which issue does this PR close?
Rationale for this change
The test suite for
ParquetOpenerexhibited substantial code duplication across multiple test functions. Each test was constructingParquetOpenerinstances with largely identical field values, resulting in verbose and repetitive code that hindered maintainability and obscured the distinguishing characteristics of each test case.What changes are included in this PR?
ParquetOpenerBuilderstruct in the test module (#[cfg(test)]) with sensible defaults matching the original test code patternstest_prune_on_statisticstest_prune_on_partition_statistics_with_dynamic_expressiontest_prune_on_partition_values_and_file_statisticstest_prune_on_partition_value_and_data_valuetest_opener_pruning_skipped_on_static_filterstest_reverse_scan_row_groupstest_reverse_scan_single_row_grouptest_reverse_scan_with_row_selectionAre these changes tested?
Yes
Are there any user-facing changes?
No