Flink: Adjust the configuration precedence for the dynamic sink#13609
Conversation
There was a problem hiding this comment.
Thanks for addressing this TODO @Guosmilesmile!
| // TODO: Handle precedence correctly for the write properties coming from | ||
| // the sink conf and from the table defaults | ||
| Map<String, String> tableWriteProperties = |
There was a problem hiding this comment.
If we addressed this, we should remove the comment.
| Maps.newHashMap(commonWriteProperties); | ||
| tableWriteProperties.putAll(table.properties()); | ||
| Maps.newHashMap(table.properties()); | ||
| tableWriteProperties.putAll(commonWriteProperties); |
There was a problem hiding this comment.
I wonder, should we use
here?There was a problem hiding this comment.
The Map<String, String> tableWriteProperties is passed multiple times to other classes within RowDataTaskWriterFactory. Changing it to use FlinkConfParser would involve significant modifications, especially since some places have special logic for tableWriteProperties.
Alternatively, could we add a new method Map<String, String> properties() in FlinkConfParser to handle priority uniformly?
Do you have any better suggestions?
There was a problem hiding this comment.
I made a version using FlinkConfParser. Please take a look and see if it’s appropriate.
There was a problem hiding this comment.
Sorry, I think the old version was better. The Flink related configs are resolved earlier. At this point, we only have the general Iceberg write properties, for which it doesn't make sense to use the Flink conf parser.
There was a problem hiding this comment.
Thank you for your suggestion. I will revert the changes.
8e4d947 to
55c1afc
Compare
This reverts commit 55c1afc.
|
Is this a behavioral change? Do we need to document it? Is there a release already with this feature? |
pvary
left a comment
There was a problem hiding this comment.
+1 so it could be merged to 1.10
Please create a test for this in follow up PR
|
Merged to main. Please in a follow-up PR create a test case which checks the config precedence. |
Yes, but there is no official release of Dynamic Sink yet. +1 for the test case. |
|
Thanks for the review ! I will prepare a new pr for the test case soon. |
|
This change results in unintuitive behaviour for parquet compression codec: This is flink-write-conf: It defaults to Now, in my table properties, I have set in Dynamic Writer: So as a result of this change, my explicit setting of |
|
@b-rick Thank you very much for pointing this out. The configuration for parquet's default compression codec is somewhat special. Since version 1.4.0, parquet's default compression codec changed from gzip to zstd by explicitly setting defaults in the table properties that apply only to new tables. iceberg/core/src/main/java/org/apache/iceberg/TableMetadata.java Lines 94 to 96 in 1bd8d5e Common sinks can obtain the table information in advance, but dynamic sinks can only get the table information at runtime. So, in common sinks, the table is passed to In my opinion, instead of passing null to |
|
I think we should be moving configuration code like |
|
I was playing around getting the config to the DynamicWriter, and found some serialization issues. If we can solve those then it would be nice the use the same mechanism to solve the configurations that in the normal sink |
…e#13662) These changes introduce a bug where flink default configuration (which is incorrect) overrides table configuration, resulting in the parquet compression type becoming gzip instead of zstd. Revert "Flink: Adjust the configuration precedence for the dynamic sink (apache#13609)" This reverts commit bc15453. Revert "Flink: Add test for adjust the configuration precedence in the Dynamic Sink (apache#13662)" This reverts commit 83da920. Change-Id: I9ffaca3deed73b3160c2e21e2a632cb56213ba3f Reviewed-on: https://gerrit.trading.imc.intra/c/data-engineering/iceberg/+/625661 Static-Analysis: Teamcity Reviewed-by: Alexander Borodin <alex.borodin@imc.com> Tested-by: Teamcity
Now in DynamicWriter , the table configuration have higher priority and override the sink configuration. But generally, User-provided options should have precedence over table properties.
This aim is to adjust the configuration precedence for the dynamic sink.