Skip to content

ARROW-11773: [Rust] Support writing well formed JSON arrays as well as newline delimited json streams#9575

Closed
alamb wants to merge 1 commit into
apache:masterfrom
alamb:alamb/json_arrays
Closed

ARROW-11773: [Rust] Support writing well formed JSON arrays as well as newline delimited json streams#9575
alamb wants to merge 1 commit into
apache:masterfrom
alamb:alamb/json_arrays

Conversation

@alamb

@alamb alamb commented Feb 25, 2021

Copy link
Copy Markdown
Contributor

Rationale

Currently the Arrow json writer makes JSON that looks like this (one record per line):

{"foo":1}
{"bar":1}

Which is not technically valid JSON, which would look something like this:

[
  {"foo":1},
  {"bar":1}
]

New Features

This PR parameterizes the JSON writer so it can write in either format. Note I needed this feature for in IOx, in https://github.com/influxdata/influxdb_iox/pull/870, and I want to propose contributing it back here).

Other Changes:

  1. Added the function into_inner() to retrieve the inner writer from the JSON writer, following the model of the Rust standard library (e.g. BufReader::into_inner

  2. Per Rust standard pattern, I change the JSON writer so that it doesn't add any Buffering (via BufReader) itself, and instead allows the caller the choice of what type of buffering, if any, is needed.

  3. Added / cleaned up a bunch of documentation and comments.

Questions

I went with parameterizing the Writer output as a trait rather than runtime dispatch, for performance. This shouldn't have backwards compatible issues Given the writer has not yet been released yet (introduced by @houqp #9256)

However would people prefer a single Writer that took an Options struct or something to determine how it wrote out data?

//! ```
//!
//! Serialize record batches into line-delimited JSON bytes:
//! ## Writing JSON formatted byte streams

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc examples are probably the best way to see how to use this structure and what the changes looked like

@alamb

alamb commented Feb 25, 2021

Copy link
Copy Markdown
Contributor Author

cc @houqp

@houqp houqp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@houqp

houqp commented Feb 25, 2021

Copy link
Copy Markdown
Member

I personally prefer the more composable formatter trait approach.

@github-actions

Copy link
Copy Markdown

@nevi-me

nevi-me commented Feb 25, 2021

Copy link
Copy Markdown
Contributor

Which is not technically valid JSON, which would look something like this:

It's streaming JSON or NDJSON (https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON)

@nevi-me nevi-me left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's sometimes inconvenient when trying to read the JSON files in applications that don't support the streaming format/structure. I'm fine with this change

writer,
started: false,
finished: false,
format: F::default(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the default format? I can't tell from the code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a default format for Writer (I couldn't figure out how to make one).

This line makes an instance of the formatter. Though come to think of it, none of the formatters actually have state now 🤔 I could move some of the state into JsonArray maybe to make that clearer

@alamb alamb changed the title ARROW-11773: [Rust] Support writing well formed JSON arrays ARROW-11773: [Rust] Support writing well formed JSON arrays as well as newline delimited json streams Feb 25, 2021
@alamb

alamb commented Feb 26, 2021

Copy link
Copy Markdown
Contributor Author

The integration test failure seems like it is unrelated to the changes in this PR: https://issues.apache.org/jira/browse/ARROW-11717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants