-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The writing of the thrift ColumnMetaData outside of the Parquet file footer was recently deprecated (apache/parquet-format#440), as was the setting of the ColumnChunk::file_offset field. Also, the ColumnMetaData currently written has incorrect values for dictionary_page_offset and data_page_offset (they are relative to the start of the chunk rather than being offset to their location in the file).
Describe the solution you'd like
The current Parquet spec indicates the file_offset field should be set to 0, and ColumnMetaData should no longer be written inline with the data.
Describe alternatives you've considered
If not removed, the offsets mentioned above should be set to correct values.