Skip to content

ARROW-1261: [Java] Add MapVector with reader and writer#4444

Closed
BryanCutler wants to merge 13 commits into
apache:masterfrom
BryanCutler:java-map-type-ARROW-1279
Closed

ARROW-1261: [Java] Add MapVector with reader and writer#4444
BryanCutler wants to merge 13 commits into
apache:masterfrom
BryanCutler:java-map-type-ARROW-1279

Conversation

@BryanCutler

Copy link
Copy Markdown
Member

This adds MapVector as a subclass of ListVector where the data vector is a Struct with 2 fields: "key" and "value". A new writer UnionMapWriter is added that extends UnionListWriter to simplify writing key, value fields. Similarly, the UnionMapReader is added to read key, value fields.

@BryanCutler

BryanCutler commented May 31, 2019

Copy link
Copy Markdown
Member Author

This is a WIP, currently need to add roundtrip tests in Java and complete unit tests - and add Javadocs of course :)

Java roundtrip tests passing
MapVector tests passing
Javadocs added

@emkornfield

Copy link
Copy Markdown
Contributor

@BryanCutler given its a WIP, do you still want a review?

@BryanCutler BryanCutler changed the title [WIP] ARROW-1279: [Java] Add MapVector with reader and writer [WIP] ARROW-1261: [Java] Add MapVector with reader and writer Jun 2, 2019
@BryanCutler

Copy link
Copy Markdown
Member Author

@BryanCutler given its a WIP, do you still want a review?

@emkornfield it might be better to hold off on a detailed review until I finish up with tests and make another pass through. If you'd like to take a high-level look now and discuss the new classes and APIs, that would be much appreciated! I basically extended everything from a List of key/value structs, but there are other ways to do it too.

@BryanCutler

Copy link
Copy Markdown
Member Author

Making a MapWriter that is able to constrain the StructVector and key Vector to be non-nullable got a little messy. The vector writers are all designed to only make nullable vectors, so I added a flag that can be set, so that when the writer creates a vector, it can be made as non-nullable. I tried a couple other ways to go about it, like using NonNullableStructVector, but it didn't work out very well so this was the cleanest way I could find to go about it. I didn't try to make a MapWriter from scratch, because that seemed overkill, but it could be possible also.

@BryanCutler

Copy link
Copy Markdown
Member Author

@bkietz I think this is ready to try integration tests with. How are things on the C++ side?

@codecov-io

codecov-io commented Jun 5, 2019

Copy link
Copy Markdown

Codecov Report

Merging #4444 into master will increase coverage by 1.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4444      +/-   ##
==========================================
+ Coverage   88.42%   89.47%   +1.05%     
==========================================
  Files         793      645     -148     
  Lines      101335    89835   -11500     
  Branches     1253        0    -1253     
==========================================
- Hits        89602    80384    -9218     
+ Misses      11486     9451    -2035     
+ Partials      247        0     -247
Impacted Files Coverage Δ
python/pyarrow/compat.py 57.14% <0%> (-32.97%) ⬇️
python/pyarrow/flight.py 80% <0%> (-20%) ⬇️
cpp/src/arrow/json/chunked-builder.cc 79.91% <0%> (-9.63%) ⬇️
python/pyarrow/util.py 47.45% <0%> (-8.48%) ⬇️
python/pyarrow/error.pxi 53.19% <0%> (-6.39%) ⬇️
cpp/src/arrow/compute/kernels/filter.h 93.75% <0%> (-6.25%) ⬇️
cpp/src/arrow/compute/kernels/filter.cc 95.23% <0%> (-4.77%) ⬇️
python/pyarrow/serialization.py 84.76% <0%> (-3.67%) ⬇️
cpp/src/arrow/status.cc 33.69% <0%> (-3.27%) ⬇️
python/pyarrow/tests/conftest.py 71.42% <0%> (-3.07%) ⬇️
... and 226 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3379ec2...f53d11e. Read the comment docs.

@BryanCutler BryanCutler changed the title [WIP] ARROW-1261: [Java] Add MapVector with reader and writer ARROW-1261: [Java] Add MapVector with reader and writer Jun 5, 2019
@BryanCutler

Copy link
Copy Markdown
Member Author

I think this is ready to review, although still need to work on integration testing. cc @emkornfield @siddharthteotia @pravindra @jacques-n

@emkornfield emkornfield left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, mostly style comments. I still haven't fully wrapped my head around readers/writers concept yet so I probably shouldn't be the one to do a final review (will have a better sense of them after I finish my work on Unions in java though, in case no one else has bandwidth to review).

Comment thread java/vector/src/main/codegen/templates/AbstractFieldWriter.java
break;
case UNION:
UnionWriter writer = new UnionWriter(container.addOrGet(child.getName(), FieldType.nullable(MinorType.UNION.getType()), UnionVector.class), getNullableStructWriterFactory());
FieldType fieldType = new FieldType(addVectorAsNullable, MinorType.UNION.getType(), null, null);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the nulls being passed through here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are for DictionaryEncoding and metadata. If you want to do dictionary encoding, I don't think it is supported with the writers. If metadata is initialized to null, it creates an empty map.

Comment thread java/vector/src/main/codegen/templates/UnionMapWriter.java
Comment thread java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java Outdated
ByteArrayInputStream input = new ByteArrayInputStream(stream.toByteArray());
ArrowStreamReader arrowReader = new ArrowStreamReader(input, readerAllocator)) {
VectorSchemaRoot root = arrowReader.getVectorSchemaRoot();
Schema schema = root.getSchema();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure there is a unit or integration test that calls the columns by different names and we can still read/write the type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be good. I'll add it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread java/vector/src/main/codegen/templates/UnionMapWriter.java

private MapWriteMode mode = MapWriteMode.OFF;
private StructWriter entryWriter;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the writer enforce uniqueness/sorted-ness (I suppose this would be difficult in the general case)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was discussed somewhere else, and was decided it is up to the application to ensure these things. The sortedKeys field is just used as a hint

@bkietz

bkietz commented Jun 6, 2019

Copy link
Copy Markdown
Member

@BryanCutler C++ side should be merged soon

@BryanCutler

Copy link
Copy Markdown
Member Author

@BryanCutler C++ side should be merged soon

Great, thanks @bkietz ! It looks like you have integration tests all ready in your PR, so after you merge then I should be able to enable and run map tests from here?

@BryanCutler

Copy link
Copy Markdown
Member Author

Thanks for reviewing @emkornfield , I made some updates but still need to add the test you suggested. I don't normally use the reader/writer classes so it would be good to have maybe @pravindra or @siddharthteotia take a look at these.

@tvamsikalyan

tvamsikalyan commented Jun 7, 2019

Copy link
Copy Markdown

Hi Bryan, Greetings!
@pravindra asked me to review this PR once.
I will review this PR over this weekend. If you need it before weekend, please let me know.
Thank you so much.
T. Vamsi Kalyan.

@tvamsikalyan tvamsikalyan left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.
Only comment I have is one test case with nested types, example: map<int, list> would be useful in my opinion

@BryanCutler

Copy link
Copy Markdown
Member Author

Thanks @tvamsikalyan for the reveiew! I do have a test for reading/writing map<long, list> here https://github.com/apache/arrow/pull/4444/files#diff-4ca74488fffe65225bb3faf300664b23R482

@tvamsikalyan

tvamsikalyan commented Jun 11, 2019

Copy link
Copy Markdown

Thanks @tvamsikalyan for the reveiew! I do have a test for reading/writing map<long, list> here https://github.com/apache/arrow/pull/4444/files#diff-4ca74488fffe65225bb3faf300664b23R482

I see it now. Thank you so much for the reply and for the link.

@emkornfield

Copy link
Copy Markdown
Contributor

+1 LGTM

@BryanCutler BryanCutler deleted the java-map-type-ARROW-1279 branch June 11, 2019 17:57
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
This adds `MapVector` as a subclass of `ListVector` where the data vector is a Struct with 2 fields: "key" and "value".  A new writer `UnionMapWriter` is added that extends `UnionListWriter` to simplify writing key, value fields. Similarly, the `UnionMapReader` is added to read key, value fields.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes apache#4444 from BryanCutler/java-map-type-ARROW-1279 and squashes the following commits:

f53d11e <Bryan Cutler> Added test to write data as list with different field names
e68acd3 <Bryan Cutler> Expanded java docs for MapVector and UnionMapWriter
1b153e4 <Bryan Cutler> make StructVector respect nullable flag
f627ed0 <Bryan Cutler> revert changes from using NonNullableStructVector
3727643 <Bryan Cutler> use Preconditions.checkArgument in MapVector
7f602b8 <Bryan Cutler> Added split and transfer test
afe89e2 <Bryan Cutler> fix style checks and add javadocs
03c380f <Bryan Cutler> fixed initializeChildrenFromFields in MapVector
3a9c194 <Bryan Cutler> fix imports
c90347e <Bryan Cutler> Now using StructVector with nullable false for struct and key vectors, writers can set a nullable flag to create non-nullable vectors
3e620d9 <Bryan Cutler> make MapVector use NonNullableStructVector
e19f6cf <Bryan Cutler> Added roundtrip tests for MapVector Java IPC
4dcb622 <Bryan Cutler> initial MapVector with reader and writer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants