Flink: deprecate ReaderFunction with a new Converter interface to simplify user experience#10956
Conversation
003d9e6 to
be29eaf
Compare
|
|
||
| @Override | ||
| public TypeInformation getProducedType() { | ||
| return TypeInformation.of(RowData.class); |
There was a problem hiding this comment.
Don't we need the RowType here?
There was a problem hiding this comment.
you are right. we need to do sth like this.
TypeInformation<RowData> typeInfo =
FlinkCompatibilityUtil.toTypeInfo(FlinkSchemaUtil.convert(readSchema));
I will actually remove the whole IdentityConverter. earlier I was thinking about create a default IdentityConverter if the converter is null. but that wasn't needed anymore.
Will construct the proper type info when adding the buildStream(env) for the inferring parallelism PR.
| RowDataFileScanTaskReader rowDataReader = | ||
| new RowDataFileScanTaskReader(tableSchema, readSchema, nameMapping, caseSensitive, filters); | ||
| return new LimitableDataIterator<>( | ||
| new ConverterFileScanTaskReader<>(rowDataReader, converter), |
There was a problem hiding this comment.
Would it worth to add the converter to the reader instead of adding a new wrapper? Getting a bit hard to follow this many embedded readers
There was a problem hiding this comment.
I have thought it too. To execute the converter in this method, we would need to override/extend the DataIterator. That is also non trivial, as DataIterator is not a pure simple CloseableIterator. Running the converter inside the ConverterFileScanTaskReader is a little simpler.
4219f03 to
24e315a
Compare
| import org.apache.iceberg.io.CloseableIterator; | ||
|
|
||
| @Internal | ||
| public class ConverterFileScanTaskReader<T> implements FileScanTaskReader<T> { |
There was a problem hiding this comment.
Maybe add this as an internal class to the ConverterReaderFunction? Both classes are small, and not used elsewhere, so this could help us keeping the source file number lower.
Not a strong opinion, just an idea
There was a problem hiding this comment.
that is a great idea. will do
| {FileFormat.AVRO, 2, false}, | ||
| {FileFormat.PARQUET, 2, true}, | ||
| {FileFormat.PARQUET, 2, false}, | ||
| {FileFormat.ORC, 2, true} |
There was a problem hiding this comment.
Why did you opt for this exact parametrization?
I happy for having tests for Parquet in both cases, as that is the main usecase for us now:
{FileFormat.PARQUET, 2, true},
{FileFormat.PARQUET, 2, false},
I might have opted for testing the future features, and keep one test for backward comp, like this:
{FileFormat.AVRO, 2, true},
{FileFormat.PARQUET, 2, true},
{FileFormat.PARQUET, 2, false},
{FileFormat.ORC, 2, true}
There was a problem hiding this comment.
sounds good to me. will change
pvary
left a comment
There was a problem hiding this comment.
LGTM, some questions, where I have no strong opinion, but I wanted to raise them, so we decide on the consciously.
…es ReaderFunction
…plify user experience (apache#10956)
…es ReaderFunction (apache#10985)
…plify user experience (apache#10956) (cherry picked from commit 85cf79d)
…es ReaderFunction (apache#10985) (cherry picked from commit 7fec19f)
No description provided.