Skip to content

ARROW-10259: [Rust] Add custom metadata to Field#9025

Closed
mqy wants to merge 14 commits into
apache:masterfrom
mqy:field-metadata
Closed

ARROW-10259: [Rust] Add custom metadata to Field#9025
mqy wants to merge 14 commits into
apache:masterfrom
mqy:field-metadata

Conversation

@mqy

@mqy mqy commented Dec 28, 2020

Copy link
Copy Markdown
Contributor

This PR adds the missing custom metadata to data type Field -- the requirement is specified by ARROW-10259.

To adapt existing tests for custom metadata, I updated Field's display: print the struct with debug, this will be improved in later PRs.

@github-actions

Copy link
Copy Markdown

@mqy mqy marked this pull request as draft December 28, 2020 08:06
@mqy mqy changed the title ARROW-10259: [Rust] Add custom_metadata to Field ARROW-10259: [Rust] Add custom metadata to Field Dec 28, 2020
@mqy mqy force-pushed the field-metadata branch 2 times, most recently from 3623c7e to 0ec54e3 Compare December 28, 2020 10:12
@mqy mqy marked this pull request as ready for review December 28, 2020 10:14
@mqy mqy marked this pull request as draft December 28, 2020 11:44
@nevi-me nevi-me self-requested a review December 29, 2020 07:02

@nevi-me nevi-me left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @mqy. Please enable the relevant integration tests (see 18b9281#diff-16170ebfd75c15cd0ef92a3e4f35dba353edcb11352101c363fc35cf3729267d). They should be marked with a TODO :)

Comment thread rust/arrow/src/datatypes.rs Outdated
Comment thread rust/arrow/src/datatypes.rs Outdated
Comment thread rust/arrow/src/datatypes.rs Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should serialise the metadata to a JSON struct in the form of:

{
  "metadata": [
    {"key": "k", "value": "v"}
  ]
}

There are integration tests that we need to enable, I'll find the relevant file in the repository.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in impl From<&Field> for ArrowJsonField, thanks!

@mqy mqy Dec 30, 2020

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nevi-me I just saw the comment "The JSON can either be an Object or an Array of Objects" from ARROW-8883: [Rust] [Integration] Enable more tests

That's interesting! But I haven't seen the Object example from testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/generated_custom_metadata.json.gz. So would you please give some pointers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember where I encountered it, might be in the 0.14.1 test files. A quick way of seeing where the object example comes from, is to comment out that portion of the code, and run the integration tests.

@mqy mqy Dec 30, 2020

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember where I encountered it, might be in the 0.14.1 test files. A quick way of seeing where the object example comes from, is to comment out that portion of the code, and run the integration tests.

@nevi-me After comment out, datatypes::tests::schema_json failed, but the test data are not from testing module, instead are manually constructed. Files with "metadata" in the file name can only be found in these directories: 1.0.0-bigendian and 1.0.0-littleendian/. The Object format for Schema metadata came from #5907 by @grundprinzip

Anyway, I think it's no harm to support parsing Field metadata from JSON Object(Map), just as what Schema did.
What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to reply earlier. Yeah, there's no harm in keeping the behaviour as is in this PR

Comment thread rust/arrow/src/util/integration_util.rs Outdated
@mqy

mqy commented Dec 29, 2020

Copy link
Copy Markdown
Contributor Author

Thanks for working on this @mqy. Please enable the relevant integration tests (see 18b9281#diff-16170ebfd75c15cd0ef92a3e4f35dba353edcb11352101c363fc35cf3729267d). They should be marked with a TODO :)

@nevi-me thanks!
I had looked at recent PRs for reader/writer, tried to figure out what to do, finally got lost :( Your feed back helps a lot!.

@codecov-io

codecov-io commented Jan 1, 2021

Copy link
Copy Markdown

Codecov Report

Merging #9025 (d70f6bd) into master (5f2495d) will decrease coverage by 0.01%.
The diff coverage is 82.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9025      +/-   ##
==========================================
- Coverage   82.57%   82.56%   -0.02%     
==========================================
  Files         204      204              
  Lines       50330    50487     +157     
==========================================
+ Hits        41561    41684     +123     
- Misses       8769     8803      +34     
Impacted Files Coverage Δ
rust/arrow/src/array/builder.rs 85.89% <ø> (ø)
rust/datafusion/src/physical_plan/planner.rs 77.89% <ø> (ø)
rust/datafusion/src/sql/planner.rs 84.05% <33.33%> (+0.08%) ⬆️
rust/arrow/src/ipc/convert.rs 89.77% <47.61%> (-2.14%) ⬇️
rust/arrow/src/util/integration_util.rs 66.59% <81.96%> (-0.89%) ⬇️
rust/arrow/src/datatypes.rs 78.69% <86.99%> (+1.02%) ⬆️
rust/datafusion/src/logical_plan/dfschema.rs 85.52% <100.00%> (+0.06%) ⬆️
rust/datafusion/src/logical_plan/display.rs 94.93% <100.00%> (ø)
rust/datafusion/src/logical_plan/plan.rs 82.48% <100.00%> (ø)
rust/datafusion/src/physical_plan/expressions.rs 84.48% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a911e67...d70f6bd. Read the comment docs.

@mqy mqy requested a review from nevi-me January 4, 2021 09:17
@mqy mqy marked this pull request as ready for review January 4, 2021 09:17
@nevi-me

nevi-me commented Jan 4, 2021

Copy link
Copy Markdown
Contributor

Hey @mqy, I'm back in the city, and have access to my desktop; so I'll be able to review this PR and help you enable integration tests during the week.

@mqy

mqy commented Jan 5, 2021

Copy link
Copy Markdown
Contributor Author

Hey @mqy, I'm back in the city, and have access to my desktop; so I'll be able to review this PR and help you enable integration tests during the week.

Welcome back!

@nevi-me

nevi-me commented Jan 5, 2021

Copy link
Copy Markdown
Contributor

Please enable generate_custom_metadata_case()in dev/archery/archery/integration/datagen.py. You'll see the Rust case disabled.
That integration test fails with

Converting /tmp/arrow-integration-kj0llr2i/generated_custom_metadata.json to /tmp/tmpecqxfv9u/70af42de_generated_custom_metadata.json_as_file
Error: ParseError("Field \'metadata\' must have exact one json map for a key-value pair")

@mqy mqy marked this pull request as draft January 6, 2021 10:15
@mqy mqy marked this pull request as ready for review January 6, 2021 11:02
@mqy

mqy commented Jan 6, 2021

Copy link
Copy Markdown
Contributor Author

@nevi-me the integration test passed at commit "ipc: build field metadata", please take some time to review, thanks!

@nevi-me nevi-me left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patiently working on this @mqy. I'm happy with the implementation.

I have one change on with_metadata(&mut self, ...) -> Self;, where we should change the signature to not return the cloned Self.

I'll think about whether we should keep the {"k": "v"} format in the long run, or whether to standardise with {"key": "k", "value": "v"}. I might enounter more examples of this in the coming weeks as I work on testing the Rust Parquet work with other languages.

@alamb @jorgecarleitao may I please have a concurring review, more to check code style.

@carols10cents this might also be of interest to you, not sure if we have Flight tests that require Field to have custom metadata.

Comment thread rust/arrow/src/datatypes.rs Outdated
}

/// Merge field into self if it is compatible. Struct will be merged recursively.
/// NOTE: `self` may be updated to unexpected state in case of merge failure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, I'm happy with someone who encounters an issue here in future, providing an alternative implementation that doesn't partially mutate &mut self if there's a failure.

@houqp any opinion here, as you contributed this function?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this "good now, better later" strategy is a good one. I agree with @nevi-me 's suggestion

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i agree as well. would be better to have an atomic merge implementation done as a separate PR in the future.

Comment thread rust/arrow/src/datatypes.rs Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to reply earlier. Yeah, there's no harm in keeping the behaviour as is in this PR

@nevi-me nevi-me closed this in 98159f1 Jan 7, 2021

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice change -- thank you @mqy and @nevi-me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants