Flink: deprecate ReaderFunction with a new Converter interface to simplify user experience by stevenzwu · Pull Request #10956 · apache/iceberg

stevenzwu · 2024-08-17T17:54:52Z

No description provided.

pvary · 2024-08-18T05:44:48Z

+
+    @Override
+    public TypeInformation getProducedType() {
+      return TypeInformation.of(RowData.class);


Don't we need the RowType here?

you are right. we need to do sth like this.

TypeInformation<RowData> typeInfo = FlinkCompatibilityUtil.toTypeInfo(FlinkSchemaUtil.convert(readSchema));

I will actually remove the whole IdentityConverter. earlier I was thinking about create a default IdentityConverter if the converter is null. but that wasn't needed anymore.

Will construct the proper type info when adding the buildStream(env) for the inferring parallelism PR.

pvary · 2024-08-18T05:47:01Z

+    RowDataFileScanTaskReader rowDataReader =
+        new RowDataFileScanTaskReader(tableSchema, readSchema, nameMapping, caseSensitive, filters);
+    return new LimitableDataIterator<>(
+        new ConverterFileScanTaskReader<>(rowDataReader, converter),


Would it worth to add the converter to the reader instead of adding a new wrapper? Getting a bit hard to follow this many embedded readers

I have thought it too. To execute the converter in this method, we would need to override/extend the DataIterator. That is also non trivial, as DataIterator is not a pure simple CloseableIterator. Running the converter inside the ConverterFileScanTaskReader is a little simpler.

…plify user experience

pvary · 2024-08-21T09:41:23Z

+import org.apache.iceberg.io.CloseableIterator;
+
+@Internal
+public class ConverterFileScanTaskReader<T> implements FileScanTaskReader<T> {


Maybe add this as an internal class to the ConverterReaderFunction? Both classes are small, and not used elsewhere, so this could help us keeping the source file number lower.

Not a strong opinion, just an idea

that is a great idea. will do

pvary · 2024-08-21T09:44:38Z

+      {FileFormat.AVRO, 2, false},
+      {FileFormat.PARQUET, 2, true},
+      {FileFormat.PARQUET, 2, false},
+      {FileFormat.ORC, 2, true}


Why did you opt for this exact parametrization?

I happy for having tests for Parquet in both cases, as that is the main usecase for us now:

{FileFormat.PARQUET, 2, true}, {FileFormat.PARQUET, 2, false},

I might have opted for testing the future features, and keep one test for backward comp, like this:

{FileFormat.AVRO, 2, true}, {FileFormat.PARQUET, 2, true}, {FileFormat.PARQUET, 2, false}, {FileFormat.ORC, 2, true}

sounds good to me. will change

pvary

LGTM, some questions, where I have no strong opinion, but I wanted to raise them, so we decide on the consciously.

…es ReaderFunction

…derFunction (#10985)

…plify user experience (apache#10956)

…es ReaderFunction (apache#10985)

…plify user experience (apache#10956) (cherry picked from commit 85cf79d)

…es ReaderFunction (apache#10985) (cherry picked from commit 7fec19f)

stevenzwu requested a review from pvary August 17, 2024 17:54

github-actions Bot added the flink label Aug 17, 2024

stevenzwu mentioned this pull request Aug 17, 2024

Flink: deprecate ReaderFunction to a new Reader interface that can also return output type info #10944

Closed

stevenzwu force-pushed the flip27-source-converter branch from 003d9e6 to be29eaf Compare August 17, 2024 18:06

pvary reviewed Aug 18, 2024

View reviewed changes

stevenzwu added 3 commits August 19, 2024 09:17

Flink: deprecate ReaderFunction with a new Converter interface to sim…

1fa6c95

…plify user experience

remove IdentityConverter as it is not needed

5769271

add type <T> to ResultTypeQueryable

24e315a

stevenzwu force-pushed the flip27-source-converter branch from 4219f03 to 24e315a Compare August 19, 2024 16:17

pvary reviewed Aug 21, 2024

View reviewed changes

pvary approved these changes Aug 21, 2024

View reviewed changes

address Peter's comments

cf75b28

stevenzwu merged commit 85cf79d into apache:main Aug 21, 2024

stevenzwu deleted the flip27-source-converter branch August 21, 2024 22:05

stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request Aug 22, 2024

Flink: backport PR apache#10956 for converter interface that deprecat…

a739b81

…es ReaderFunction

stevenzwu added a commit that referenced this pull request Aug 22, 2024

Flink: backport PR #10956 for converter interface that deprecates Rea…

7fec19f

…derFunction (#10985)

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Flink: deprecate ReaderFunction with a new Converter interface to sim…

a24407a

…plify user experience (apache#10956)

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Flink: backport PR apache#10956 for converter interface that deprecat…

a0eb013

…es ReaderFunction (apache#10985)

czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025

Flink: deprecate ReaderFunction with a new Converter interface to sim…

dae5efd

…plify user experience (apache#10956) (cherry picked from commit 85cf79d)

czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025

Flink: backport PR apache#10956 for converter interface that deprecat…

ef494ea

…es ReaderFunction (apache#10985) (cherry picked from commit 7fec19f)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink: deprecate ReaderFunction with a new Converter interface to simplify user experience#10956

Flink: deprecate ReaderFunction with a new Converter interface to simplify user experience#10956
stevenzwu merged 4 commits into
apache:mainfrom
stevenzwu:flip27-source-converter

stevenzwu commented Aug 17, 2024

Uh oh!

pvary Aug 18, 2024

Uh oh!

stevenzwu Aug 18, 2024 •

edited

Loading

Uh oh!

pvary Aug 18, 2024

Uh oh!

stevenzwu Aug 18, 2024

Uh oh!

pvary Aug 21, 2024

Uh oh!

stevenzwu Aug 21, 2024

Uh oh!

pvary Aug 21, 2024

Uh oh!

stevenzwu Aug 21, 2024

Uh oh!

pvary left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stevenzwu commented Aug 17, 2024

Uh oh!

pvary Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pvary Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

pvary Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

pvary Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

stevenzwu Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

pvary left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stevenzwu Aug 18, 2024 •

edited

Loading