Skip to content

ColumnMetaData should no longer be written inline with data #6115

@etseidl

Description

@etseidl

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The writing of the thrift ColumnMetaData outside of the Parquet file footer was recently deprecated (apache/parquet-format#440), as was the setting of the ColumnChunk::file_offset field. Also, the ColumnMetaData currently written has incorrect values for dictionary_page_offset and data_page_offset (they are relative to the start of the chunk rather than being offset to their location in the file).

Describe the solution you'd like
The current Parquet spec indicates the file_offset field should be set to 0, and ColumnMetaData should no longer be written inline with the data.

Describe alternatives you've considered
If not removed, the offsets mentioned above should be set to correct values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions