Flink: apply row-level delete when reading by chenjunjiedada · Pull Request #1517 · apache/iceberg

chenjunjiedada · 2020-09-27T13:39:09Z

This includes #1497, I will rebase when #1497 get merged.

chenjunjiedada · 2020-09-27T14:12:46Z

    Avro.ReadBuilder builder = Avro.read(getInputFile(task))
-        .reuseContainers()
-        .project(projectedSchema)
+        .reuseContainers(false)


@JingsongLi , This is used to fix the UT. How can we copy the RowData?

@chenjunjiedada can you please clarify for me what the UT stands for?

I mean the unit tests in TestFlinkScan. The getRows puts each row from inputformat.nexRecord(null) into List while the row is reused when the file format is Avro, so the result in List is wrong.

I didn't find a simple way to copy the GenericRowData, so I set reuseContainer to false to align with Parquet and ORC cases. I changed it back in 8526b6d since I found the converter could be used to copy the row. But there comes new concern about the double copies for Parquet and ORC. @openinx @JingsongLi @rdblue , Should we reuse the container for Flink read?

The inputformat.nexRecord returns reused record, it is OK in Flink.

I think we can create a PR to reuse Parquet container for Flink and Spark.

@JingsongLi , Thanks for your comments. I think it would be better to use an option to set reuse, let me create one.

I am slight -1 for option, If there are no side effects, why do we need to provide this option? (testing is not a good example, we should consider user-face interface)

And if this reuse flap is false, I think there may also be some risks.
Note in Flink and Spark reader, we are reusing binary for StringReader.
Maybe these binaries are chunk buffers that have been reused by parquet reader (CC: @rdblue), so even if reuse flag is false, users cannot assume returning row's security.

@JingsongLi, I created #1522. We could also discuss there.

rdblue · 2020-09-28T17:56:58Z

-  private static StructLikeSet rowSet(Table table) throws IOException {
-    return rowSet(table, "*");
+  private StructLikeSet rowSet(Table tbl) throws IOException {
+    return rowSet(tbl, "*");


Why rename this variable?

The checkstyle reports hidden variable issue because it has the same name between the parameter and class member. I updated the member table to testTable in the PR. will rebase this accordingly.

rdblue · 2020-10-06T21:41:37Z

@chenjunjiedada, now that #1497 is merged, can you rebase this one on top of that? That should remove most of the test changes.

chenjunjiedada · 2020-10-09T14:16:46Z

Just back from holiday, I will update this tomorrow.

rdblue · 2020-10-09T20:24:11Z

Thanks, and welcome back.

rdblue · 2020-10-10T21:24:34Z

+
+    Stream<EncryptedInputFile> encrypted = task.files().stream()
+        .flatMap(fileScanTask -> Stream.concat(Stream.of(fileScanTask.file()), fileScanTask.deletes().stream()))
+        .distinct()


distinct compares files using equals, which is not overridden for data or delete files. This should instead use the approach that Spark uses:

Map<String, ByteBuffer> keyMetadata = Maps.newHashMap(); task.files().stream() .flatMap(fileScanTask -> Stream.concat(Stream.of(fileScanTask.file()), fileScanTask.deletes().stream())) .forEach(file -> keyMetadata.put(file.path().toString(), file.keyMetadata())); Stream<EncryptedInputFile> encrypted = keyMetadata.entrySet().stream() .map(entry -> EncryptedFiles.encryptedInput(io.newInputFile(entry.getKey()), entry.getValue())); // decrypt with the batch call to avoid multiple RPCs to a key server, if possible Iterable<InputFile> decryptedFiles = encryptionManager.decrypt(encrypted::iterator);

I see, updated.

rdblue · 2020-10-10T21:30:51Z

+  private final HadoopTables tables = new HadoopTables(conf);
+  private final FileFormat format;
+
+  private String tableLocation;


If this used a Hive table instead, then it wouldn't be necessary to keep state that isn't passed as method arguments. I think that would be less brittle.

Make sense to me, updated.

rdblue · 2020-10-10T21:43:01Z

+    FlinkSource.Builder builder = FlinkSource.forRowData().tableLoader(TableLoader.fromHadoopTable(tableLocation));
+    Schema projected = testTable.schema().select(columns);
+    RowType rowType = FlinkSchemaUtil.convert(projected);
+    FlinkInputFormat inputFormat = builder.project(FlinkSchemaUtil.toSchema(rowType)).buildFormat();


Why half-configure the builder above and then finish it here? I think it would be simpler to use this:

Schema projected = testTable.schema().select(columns); RowType rowType = FlinkSchemaUtil.convert(projected); FlinkInputFormat inputFormat = FlinkSource.forRowData() .tableLoader(TableLoader.fromHadoopTable(tableLocation)) .project(FlinkSchemaUtil.toSchema(rowType)) .buildFormat();

rdblue · 2020-10-10T21:46:03Z

+        .collect(Collectors.toList());
+  }
+
+  public static List<Row> getRows(FlinkInputFormat inputFormat) throws IOException {


I think it is a bad practice to make helper methods in one test suite public and use them in another suite. Instead, helper methods should be moved to an appropriate test utility class. That way, we don't have test utility code that is hard to find because it lives in whatever test was written first.

OK, I added TestHelpers to contain getRows and getRowData.

rdblue · 2020-10-12T17:50:24Z

+      inputFormat.open(s);
+      while (!inputFormat.reachedEnd()) {
+        RowData row = inputFormat.nextRecord(null);
+        results.add((Row) converter.toExternal(row));


This seems strange to me. Why convert rows to external and convert them back to internal in the getRowData method? Why not move this implementation into getRowData and convert to external in this one?

Also, is there a better name for these methods? What about readRows or scan? Those would be a bit more clear about what is going on in these. The original was called runFormat, which is also a good name.

inputFormat.nextRecord() returns the record which will be reused, so it needs to copy the returned RowData otherwise the element in the result list are same. Currently, there's no explicitly API in Flink to copy RowData. Only the serializer RowDataSerializer is a class in Flink for copying RowData while it is an internal class. DataStructureConverter.toExternal and toInteranl can construct the record according to RowData and Row. @JingsongLi @openinx Do you have any suggestion on this?

I think you can get serializer from FlinkSource.Builder.build().getType().createSerializer()

Conflicts: flink/src/main/java/org/apache/iceberg/flink/source/FlinkInputFormat.java flink/src/main/java/org/apache/iceberg/flink/source/FlinkSource.java flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkInputFormat.java

chenjunjiedada · 2020-10-13T09:38:15Z

  private TestHelpers() {
  }

+  public static RowData copyRowData(RowData from, RowType rowType) {


@JingsongLi , I can not use RowDataSerializer directly since the returned RowData may contain metadata column after merging with position deletes. So I created this function to do the copy job.

rdblue · 2020-10-13T16:31:20Z

Looks like tests are failing with this:

org.apache.iceberg.flink.source.TestFlinkInputFormat > testPartitionTypes[format=orc] FAILED
    java.lang.NullPointerException
        at org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.copy(BytePrimitiveArraySerializer.java:54)
        at org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.copy(BytePrimitiveArraySerializer.java:33)
        at org.apache.iceberg.flink.TestHelpers.copyRowData(TestHelpers.java:75)
        at org.apache.iceberg.flink.TestHelpers.readRowData(TestHelpers.java:89)
        at org.apache.iceberg.flink.TestHelpers.readRows(TestHelpers.java:101)
        at org.apache.iceberg.flink.source.TestFlinkInputFormat.runFormat(TestFlinkInputFormat.java:105)
        at org.apache.iceberg.flink.source.TestFlinkInputFormat.run(TestFlinkInputFormat.java:66)

chenjunjiedada · 2020-10-14T02:44:11Z

Thanks for reminding! It works before but I changed to use getter way to avoid null checking while not realize that serializer cannot copy the null value. Too late didn't wait for the build result...

rdblue · 2020-10-14T23:12:01Z

Thanks, @chenjunjiedada! Good to have all of the read paths updated for row-level deletes!

probot-autolabeler Bot added data flink MR spark labels Sep 27, 2020

chenjunjiedada commented Sep 27, 2020

View reviewed changes

chenjunjiedada mentioned this pull request Sep 28, 2020

reuse container when reading parquet records #1522

Merged

rdblue reviewed Sep 28, 2020

View reviewed changes

chenjunjiedada force-pushed the flink-apply-delete branch from 8526b6d to dc2b9db Compare October 10, 2020 08:01

rdblue reviewed Oct 10, 2020

View reviewed changes

chenjunjiedada force-pushed the flink-apply-delete branch from ab23ed6 to 5656be8 Compare October 11, 2020 13:48

chenjunjiedada commented Oct 12, 2020

View reviewed changes

Comment thread flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkInputFormatReaderDeletes.java

rdblue reviewed Oct 12, 2020

View reviewed changes

chenjunjiedada added 2 commits October 13, 2020 14:53

Flink: apply row-level delete when reading

ab3cad9

Conflicts: flink/src/main/java/org/apache/iceberg/flink/source/FlinkInputFormat.java flink/src/main/java/org/apache/iceberg/flink/source/FlinkSource.java flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkInputFormat.java

use hive catalog to create table

3115544

chenjunjiedada force-pushed the flink-apply-delete branch from 5656be8 to 75dce7f Compare October 13, 2020 07:05

rename the helper function and merge test helpers

265524f

chenjunjiedada force-pushed the flink-apply-delete branch from 75dce7f to 265524f Compare October 13, 2020 07:07

copy RowData instead of converting back and forth

3a61448

chenjunjiedada force-pushed the flink-apply-delete branch from c8eecd3 to 3a61448 Compare October 13, 2020 09:36

chenjunjiedada commented Oct 13, 2020

View reviewed changes

rdblue added this to the Java 0.10.0 Release milestone Oct 13, 2020

fix failed unit tests

9fb7c94

rdblue merged commit a238a90 into apache:master Oct 14, 2020

openinx mentioned this pull request Oct 21, 2020

Flink: write the CDC records into apache iceberg tables. #1639

Closed

jshmchenxi mentioned this pull request Mar 9, 2021

Iceberg: Apply row-level delete when reading trinodb/trino#7226

Closed

Uh oh!

Conversation

chenjunjiedada commented Sep 27, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Sep 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Oct 6, 2020

Uh oh!

chenjunjiedada commented Oct 9, 2020

Uh oh!

rdblue commented Oct 9, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rdblue Oct 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenjunjiedada commented Oct 14, 2020

Uh oh!

rdblue commented Oct 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JingsongLi Sep 28, 2020 •

edited

Loading

rdblue Oct 12, 2020 •

edited

Loading

JingsongLi Oct 13, 2020 •

edited

Loading

rdblue commented Oct 13, 2020 •

edited

Loading