Flink 1.20: Update Flink to use planned Avro reads by jbonofre · Pull Request #11386 · apache/iceberg

jbonofre · 2024-10-24T08:12:07Z

No description provided.

jbonofre · 2024-10-24T08:17:41Z

@pvary @aokolnychyi @rdblue this PR mimics what has been done in Spark to use Avro planned reads.

jbonofre · 2024-10-24T11:38:41Z

It seems the upsert test is stale. I'm investigating (maybe the schema type mapping).

jbonofre · 2024-10-24T16:55:08Z

The problem seems to be related to:

java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.flink.table.data.StringData (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.flink.table.data.StringData is in unnamed module of loader 'app')
        at org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169)
        at org.apache.flink.table.data.RowData.lambda$createFieldGetter$245ca7d1$1(RowData.java:221)
        at org.apache.iceberg.flink.data.RowDataProjection.getValue(RowDataProjection.java:172)
        at org.apache.iceberg.flink.data.RowDataProjection.isNullAt(RowDataProjection.java:193)
        at org.apache.iceberg.flink.data.RowDataUtil.clone(RowDataUtil.java:96)
        at org.apache.iceberg.flink.source.reader.RowDataRecordFactory.clone(RowDataRecordFactory.java:71)
        at org.apache.iceberg.flink.source.reader.RowDataRecordFactory.clone(RowDataRecordFactory.java:28)
        at org.apache.iceberg.flink.source.reader.ArrayPoolDataIteratorBatcher$ArrayPoolBatchIterator.next(ArrayPoolDataIteratorBatcher.java:98)
        at org.apache.iceberg.flink.source.reader.ArrayPoolDataIteratorBatcher$ArrayPoolBatchIterator.next(ArrayPoolDataIteratorBatcher.java:67)
        at org.apache.iceberg.flink.source.reader.IcebergSourceSplitReader.fetch(IcebergSourceSplitReader.java:96)
        at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
        at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:165)
        ... 6 more

So it seems related to string read from the GenericRowData. I'm checking.

jbonofre · 2024-10-26T06:15:33Z

I fixed the issue on ValueReaders about strings.

pvary · 2024-10-26T06:24:37Z

        Avro.read(Files.localInput(recordsFile))
            .project(schema)
-            .createReaderFunc(FlinkAvroReader::new)
+            .createResolvingReader(FlinkPlannedAvroReader::create)


QQ: Do we have remaining tests for the old reader? I usually try to keep at least a few tests for the deprecated features as well, so they are not broken unintentionally by future changes (and mark them as deprecated, so we don't forget to remove them when the feature is removed).
If there are other tests? Do we test the same functions for the new reader?

That's a good point. I can add tests specific to the "old" FlinkAvroReader.

jbonofre · 2024-10-28T06:47:41Z

I fixed all types (primitves and arrays) reads. Tests should be happy now 😄

RussellSpitzer · 2024-10-28T17:44:12Z

@pvary + @jbonofre how important is this for 1.7? If it is important we need to wrap this up ASAP

jbonofre · 2024-10-29T05:26:21Z

@RussellSpitzer the planned Avro reads has been added to Spark (for Iceberg 1.7.x). This one is not a blocker for 1.7.0 but a good to have to benefit the same performance boost as Spark.

Fokko

I agree with @pvary, but apart from that, it looks good to me 👍

RussellSpitzer · 2024-10-29T16:38:52Z

Alright! Chatted with folks and we will do the old-path tests in a follow up.

RussellSpitzer · 2024-10-29T16:54:28Z

Thanks @jbonofre for the PR and @Fokko and @pvary for Review. Let's add some additional tests for Spark and Flink in a followup for the old path.

pvary · 2024-11-04T11:09:10Z

@jbonofre: I think we should backport this change to Flink 1.19, and Flink 1.18 as well.

github-actions Bot added the flink label Oct 24, 2024

jbonofre requested review from RussellSpitzer, aokolnychyi, pvary and rdblue October 24, 2024 08:34

pvary reviewed Oct 25, 2024

View reviewed changes

Comment thread flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/data/FlinkPlannedAvroReader.java Outdated

jbonofre force-pushed the FLINK_AVRO_PLANNED_READS branch from ec483e0 to 0f3ca5b Compare October 26, 2024 06:15

pvary reviewed Oct 26, 2024

View reviewed changes

jbonofre force-pushed the FLINK_AVRO_PLANNED_READS branch 2 times, most recently from c0481ed to a4e7692 Compare October 28, 2024 06:46

Flink 1.20: Update Flink to use planned Avro reads

65b1165

jbonofre force-pushed the FLINK_AVRO_PLANNED_READS branch from a4e7692 to 65b1165 Compare October 28, 2024 06:52

jbonofre added this to the Iceberg 1.7.0 milestone Oct 28, 2024

Fokko approved these changes Oct 29, 2024

View reviewed changes

RussellSpitzer merged commit 602c2b2 into apache:main Oct 29, 2024

jbonofre mentioned this pull request Oct 30, 2024

Flink: Test both "new" Flink Avro planned reader and "deprecated" Avro reader #11430

Merged

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Flink 1.20: Update Flink to use planned Avro reads (apache#11386)

2ddb804

Conversation

jbonofre commented Oct 24, 2024

Uh oh!

jbonofre commented Oct 24, 2024

Uh oh!

jbonofre commented Oct 24, 2024

Uh oh!

jbonofre commented Oct 24, 2024

Uh oh!

Uh oh!

jbonofre commented Oct 26, 2024

Uh oh!

pvary Oct 26, 2024

Choose a reason for hiding this comment

Uh oh!

jbonofre Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

jbonofre commented Oct 28, 2024

Uh oh!

RussellSpitzer commented Oct 28, 2024

Uh oh!

jbonofre commented Oct 29, 2024

Uh oh!

Fokko left a comment

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer commented Oct 29, 2024

Uh oh!

RussellSpitzer commented Oct 29, 2024

Uh oh!

pvary commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pvary commented Nov 4, 2024 •

edited

Loading