ARROW-11310: [Rust] implement JSON writer#9256
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9256 +/- ##
==========================================
- Coverage 81.89% 81.87% -0.02%
==========================================
Files 215 216 +1
Lines 52988 53097 +109
==========================================
+ Hits 43392 43474 +82
- Misses 9596 9623 +27
Continue to review full report at Codecov.
|
jorgecarleitao
left a comment
There was a problem hiding this comment.
Gave this one a quick look, and it looks good so far. I think the nulls are not accounted for.
I left some comments accordingly.
| } | ||
|
|
||
| pub fn as_list_array<S: OffsetSizeTrait>(arr: &ArrayRef) -> &GenericListArray<S> { | ||
| pub fn as_generic_list_array<S: OffsetSizeTrait>(arr: &ArrayRef) -> &GenericListArray<S> { |
There was a problem hiding this comment.
breaking change to make naming more consistent with array type.
|
@jorgecarleitao @nevi-me ready for review. |
jorgecarleitao
left a comment
There was a problem hiding this comment.
Great work, @houqp ! This looks super clean and IMO ready to go.
I would add a note to the readme where we discuss json readers. IMO this is a major feature!
| struct_array_to_jsonmap_array(as_struct_array(array), array.len()); | ||
| jsonmaps.into_iter().map(Value::Object).collect() | ||
| } | ||
| _ => { |
There was a problem hiding this comment.
I think there'd be demand for dictionary support, but we can add that separately
There was a problem hiding this comment.
Yes I agree -- on both counts :)
There was a problem hiding this comment.
yeah, i was planning to add dictionary support, but this patch set grew larger than I expected, so I decided to leave that to a follow up PR for easier review.
|
I plan to review this PR later today and merge unless I hear otherwise |
…s newline delimited json streams
## Rationale
Currently the Arrow json writer makes JSON that looks like this (one record per line):
```json
{"foo":1}
{"bar":1}
```
Which is not technically valid JSON, which would look something like this:
```
[
{"foo":1},
{"bar":1}
]
```
## New Features
This PR parameterizes the JSON writer so it can write in either format. Note I needed this feature for in IOx, in influxdata/influxdb_iox#870, and I want to propose contributing it back here).
## Other Changes:
1. Added the function `into_inner()` to retrieve the inner writer from the JSON writer, following the model of the Rust standard library (e.g. [BufReader::into_inner](https://doc.rust-lang.org/std/io/struct.BufReader.html#method.into_inner)
2. Per Rust standard pattern, I change the JSON writer so that it doesn't add any Buffering (via `BufReader`) itself, and instead allows the caller the choice of what type of buffering, if any, is needed.
3. Added / cleaned up a bunch of documentation and comments.
## Questions
I went with parameterizing the `Writer` output as a trait rather than runtime dispatch, for performance. This shouldn't have backwards compatible issues Given the writer has not yet been released yet (introduced by @houqp #9256)
However would people prefer a single `Writer` that took an `Options` struct or something to determine how it wrote out data?
Closes #9575 from alamb/alamb/json_arrays
Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…s newline delimited json streams
## Rationale
Currently the Arrow json writer makes JSON that looks like this (one record per line):
```json
{"foo":1}
{"bar":1}
```
Which is not technically valid JSON, which would look something like this:
```
[
{"foo":1},
{"bar":1}
]
```
## New Features
This PR parameterizes the JSON writer so it can write in either format. Note I needed this feature for in IOx, in influxdata/influxdb_iox#870, and I want to propose contributing it back here).
## Other Changes:
1. Added the function `into_inner()` to retrieve the inner writer from the JSON writer, following the model of the Rust standard library (e.g. [BufReader::into_inner](https://doc.rust-lang.org/std/io/struct.BufReader.html#method.into_inner)
2. Per Rust standard pattern, I change the JSON writer so that it doesn't add any Buffering (via `BufReader`) itself, and instead allows the caller the choice of what type of buffering, if any, is needed.
3. Added / cleaned up a bunch of documentation and comments.
## Questions
I went with parameterizing the `Writer` output as a trait rather than runtime dispatch, for performance. This shouldn't have backwards compatible issues Given the writer has not yet been released yet (introduced by @houqp apache/arrow#9256)
However would people prefer a single `Writer` that took an `Options` struct or something to determine how it wrote out data?
Closes #9575 from alamb/alamb/json_arrays
Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Other than basic type support, this PR also covers the following cases: