Skip to content

ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet#8366

Closed
pitrou wants to merge 3 commits into
apache:masterfrom
pitrou:ARROW-9943-parquet-nested-metadata
Closed

ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet#8366
pitrou wants to merge 3 commits into
apache:masterfrom
pitrou:ARROW-9943-parquet-nested-metadata

Conversation

@pitrou

@pitrou pitrou commented Oct 6, 2020

Copy link
Copy Markdown
Member

This allows roundtripping complex types such as list<dictionary<utf8>>, list<extension>, etc.

Comment thread cpp/src/parquet/arrow/schema.cc Outdated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emkornfield I think you'll be able to add your changes here.

@pitrou pitrou marked this pull request as draft October 6, 2020 16:42
@pitrou pitrou force-pushed the ARROW-9943-parquet-nested-metadata branch from bbf193f to 289fb9c Compare October 6, 2020 16:50
@pitrou pitrou marked this pull request as ready for review October 6, 2020 16:50
@pitrou

pitrou commented Oct 6, 2020

Copy link
Copy Markdown
Member Author

Also cc @jorisvandenbossche

…Parquet

This allows roundtripping complex types such as `list<dictionary<utf8>>`, `list<extension>`, etc.
@pitrou pitrou force-pushed the ARROW-9943-parquet-nested-metadata branch from 289fb9c to 6673a8a Compare October 6, 2020 16:53
@github-actions

github-actions Bot commented Oct 6, 2020

Copy link
Copy Markdown

bkietz
bkietz previously requested changes Oct 6, 2020
Comment thread cpp/src/parquet/arrow/schema.cc Outdated
Comment thread cpp/src/parquet/arrow/schema.cc Outdated
if (origin_type.id() == ::arrow::Type::LIST) {
return RewrapListField;
return [](FieldVector fields) {
DCHECK_EQ(fields.size(), 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a schema be deserialized that has more then 1 field? i.e. should this return a user space error instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally no, because we already checked that origin_type->num_fields() == inferred_type->num_fields().

Comment thread cpp/src/parquet/arrow/schema.h Outdated
Comment thread cpp/src/parquet/arrow/schema.cc Outdated
Comment thread cpp/src/parquet/arrow/schema.cc Outdated
@pitrou pitrou dismissed bkietz’s stale review October 7, 2020 09:38

Changes applied

@pitrou

pitrou commented Oct 7, 2020

Copy link
Copy Markdown
Member Author

Will merge if CI green. Thank you for the reviews!

@pitrou pitrou closed this in ef08a9d Oct 7, 2020
@pitrou pitrou deleted the ARROW-9943-parquet-nested-metadata branch October 7, 2020 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants