ARROW-1261: [Java] Add MapVector with reader and writer#4444
ARROW-1261: [Java] Add MapVector with reader and writer#4444BryanCutler wants to merge 13 commits into
Conversation
|
This is a WIP, currently need to add Java roundtrip tests passing |
|
@BryanCutler given its a WIP, do you still want a review? |
@emkornfield it might be better to hold off on a detailed review until I finish up with tests and make another pass through. If you'd like to take a high-level look now and discuss the new classes and APIs, that would be much appreciated! I basically extended everything from a List of key/value structs, but there are other ways to do it too. |
…, writers can set a nullable flag to create non-nullable vectors
|
Making a MapWriter that is able to constrain the StructVector and key Vector to be non-nullable got a little messy. The vector writers are all designed to only make nullable vectors, so I added a flag that can be set, so that when the writer creates a vector, it can be made as non-nullable. I tried a couple other ways to go about it, like using |
|
@bkietz I think this is ready to try integration tests with. How are things on the C++ side? |
Codecov Report
@@ Coverage Diff @@
## master #4444 +/- ##
==========================================
+ Coverage 88.42% 89.47% +1.05%
==========================================
Files 793 645 -148
Lines 101335 89835 -11500
Branches 1253 0 -1253
==========================================
- Hits 89602 80384 -9218
+ Misses 11486 9451 -2035
+ Partials 247 0 -247
Continue to review full report at Codecov.
|
|
I think this is ready to review, although still need to work on integration testing. cc @emkornfield @siddharthteotia @pravindra @jacques-n |
emkornfield
left a comment
There was a problem hiding this comment.
Seems reasonable to me, mostly style comments. I still haven't fully wrapped my head around readers/writers concept yet so I probably shouldn't be the one to do a final review (will have a better sense of them after I finish my work on Unions in java though, in case no one else has bandwidth to review).
| break; | ||
| case UNION: | ||
| UnionWriter writer = new UnionWriter(container.addOrGet(child.getName(), FieldType.nullable(MinorType.UNION.getType()), UnionVector.class), getNullableStructWriterFactory()); | ||
| FieldType fieldType = new FieldType(addVectorAsNullable, MinorType.UNION.getType(), null, null); |
There was a problem hiding this comment.
what are the nulls being passed through here?
There was a problem hiding this comment.
they are for DictionaryEncoding and metadata. If you want to do dictionary encoding, I don't think it is supported with the writers. If metadata is initialized to null, it creates an empty map.
| ByteArrayInputStream input = new ByteArrayInputStream(stream.toByteArray()); | ||
| ArrowStreamReader arrowReader = new ArrowStreamReader(input, readerAllocator)) { | ||
| VectorSchemaRoot root = arrowReader.getVectorSchemaRoot(); | ||
| Schema schema = root.getSchema(); |
There was a problem hiding this comment.
we should make sure there is a unit or integration test that calls the columns by different names and we can still read/write the type.
There was a problem hiding this comment.
Yeah, that would be good. I'll add it.
|
|
||
| private MapWriteMode mode = MapWriteMode.OFF; | ||
| private StructWriter entryWriter; | ||
|
|
There was a problem hiding this comment.
should the writer enforce uniqueness/sorted-ness (I suppose this would be difficult in the general case)?
There was a problem hiding this comment.
I think that was discussed somewhere else, and was decided it is up to the application to ensure these things. The sortedKeys field is just used as a hint
|
@BryanCutler C++ side should be merged soon |
Great, thanks @bkietz ! It looks like you have integration tests all ready in your PR, so after you merge then I should be able to enable and run map tests from here? |
|
Thanks for reviewing @emkornfield , I made some updates but still need to add the test you suggested. I don't normally use the reader/writer classes so it would be good to have maybe @pravindra or @siddharthteotia take a look at these. |
|
Hi Bryan, Greetings! |
tvamsikalyan
left a comment
There was a problem hiding this comment.
looks good to me.
Only comment I have is one test case with nested types, example: map<int, list> would be useful in my opinion
|
Thanks @tvamsikalyan for the reveiew! I do have a test for reading/writing map<long, list> here https://github.com/apache/arrow/pull/4444/files#diff-4ca74488fffe65225bb3faf300664b23R482 |
I see it now. Thank you so much for the reply and for the link. |
|
+1 LGTM |
This adds `MapVector` as a subclass of `ListVector` where the data vector is a Struct with 2 fields: "key" and "value". A new writer `UnionMapWriter` is added that extends `UnionListWriter` to simplify writing key, value fields. Similarly, the `UnionMapReader` is added to read key, value fields. Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#4444 from BryanCutler/java-map-type-ARROW-1279 and squashes the following commits: f53d11e <Bryan Cutler> Added test to write data as list with different field names e68acd3 <Bryan Cutler> Expanded java docs for MapVector and UnionMapWriter 1b153e4 <Bryan Cutler> make StructVector respect nullable flag f627ed0 <Bryan Cutler> revert changes from using NonNullableStructVector 3727643 <Bryan Cutler> use Preconditions.checkArgument in MapVector 7f602b8 <Bryan Cutler> Added split and transfer test afe89e2 <Bryan Cutler> fix style checks and add javadocs 03c380f <Bryan Cutler> fixed initializeChildrenFromFields in MapVector 3a9c194 <Bryan Cutler> fix imports c90347e <Bryan Cutler> Now using StructVector with nullable false for struct and key vectors, writers can set a nullable flag to create non-nullable vectors 3e620d9 <Bryan Cutler> make MapVector use NonNullableStructVector e19f6cf <Bryan Cutler> Added roundtrip tests for MapVector Java IPC 4dcb622 <Bryan Cutler> initial MapVector with reader and writer
This adds
MapVectoras a subclass ofListVectorwhere the data vector is a Struct with 2 fields: "key" and "value". A new writerUnionMapWriteris added that extendsUnionListWriterto simplify writing key, value fields. Similarly, theUnionMapReaderis added to read key, value fields.