[Improvement] Storage Partition Join#13390
Conversation
…h execution mode Backport apache#10832 This pr is to Backport infer source parallelism for [FLIP-27](https://jira.pinadmin.com/browse/FLIP-27) source in batch execution mode. Note: This is not a clean backport. RowDataConverter is not present in our forked version. So i had to add it to make this backport work.
Currently, Apache Flink's does not support storage partition join, which can lead to unnecessary data shuffles in batch mode. We have implemented the Query Planner changes in Flink already here apache/flink#26715 This feature **IS ONLY APPLIED via a config** `table.optimizer.storage-partition-join-enabled=true` o/w there is no impact to current jobs. This PR only supports batch execution mode. This PR consists of relevant changes for the Flink Iceberg Source. Please note that **these changes are ONLY included for FLIP27 (new Source API)**. This PR adds the following support. - Enhances `IcebergTableSource` to implement `SupportsPartitioning` interface which we defined on the [flink side](apache/flink#26715) which enables Iceberg to report Partitioning metadata to the Flink Query planner. Done via `outputPartitioning()` returning `KeyGroupedPartitioning` with table’s partition scheme. It can support various transform types including bucket, identity, month, day, year. - Improvements to `IcebergSource` to support StoragePartitionJoin - Enhances `FlinkSplitPlanner` to include a method to group ScanTasks by groupingKey (Partition Values) which enables us to ensure that all records within the same partition end up being processed by the same subtask. - PartitionAwareSplitAssignment capabilities including a new `PartitionAwareSplitAssignerFactory` and `PartitionAwareSplitAssigner` which is responsible for ensuring that records with the same partition are assigned to the same subtask via deterministic assignment - Includes a new`SpecTransformToFlinkTransform` to map the various TransformExpressions used to represent the partitions to the Flink System * Added Unit tests to `TestPartitionAwareSplitAssigner` to verify that splits were deterministically applied to the correct subtasks * Added Unit Tests `TestFlinkSplitPlanner` to test improved functionality to get batchSplits based on `ScanGroup` * Added Unit test `TestStoragePartitionedJoin` to verify that we correctly ensure that we get the correct metadata
|
@tharvey5 I see this is targeting |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Summary
Currently, Apache Flink's does not support storage partition join, which can lead to unnecessary data shuffles in batch mode. We have implemented the Query Planner changes in Flink already here apache/flink#26715
This feature IS ONLY APPLIED via a config
table.optimizer.storage-partition-join-enabled=trueo/w there is no impact to current jobs. This PR only supports batch execution mode.This PR consists of relevant changes for the Flink Iceberg Source. Please note that these changes are ONLY included for FLIP27 (new Source API).
NOTE: We migrated to usnig FLIP27 and have included that backport in this PR Backport: #10832
This PR adds the following support.
- #10832 for Iceberg 1.5.x
- Enhances
IcebergTableSourceto implementSupportsPartitioninginterface which we defined on the flink
side which enables Iceberg
to report Partitioning metadata to the Flink Query planner. Done via
outputPartitioning()returningKeyGroupedPartitioningwith table’spartition scheme. It can support various transform types including
bucket, identity, month, day, year.
- Improvements to
IcebergSourceto support StoragePartitionJoin- Enhances
FlinkSplitPlannerto include a method to group ScanTasks bygroupingKey (Partition Values) which enables us to ensure that all
records within the same partition end up being processed by the same
subtask.
- PartitionAwareSplitAssignment capabilities including a new
PartitionAwareSplitAssignerFactoryandPartitionAwareSplitAssignerwhich is responsible for ensuring that records with the same partition
are assigned to the same subtask via deterministic assignment
- Includes a new
SpecTransformToFlinkTransformto map the variousTransformExpressions used to represent the partitions to the Flink
System
Testing
TestPartitionAwareSplitAssignerto verify that splits were deterministically applied to the correct subtasksTestFlinkSplitPlannerto test improved functionality to get batchSplits based onScanGroupTestStoragePartitionedJointo verify that we correctly ensure that we get the correct metadataCorrectly use SPJ
Correctly cannot use SPJ