Skip to content

[SPARK-11745][SQL] Enable more JSON parsing options#9724

Closed
rxin wants to merge 3 commits into
apache:masterfrom
rxin:SPARK-11745
Closed

[SPARK-11745][SQL] Enable more JSON parsing options#9724
rxin wants to merge 3 commits into
apache:masterfrom
rxin:SPARK-11745

Conversation

@rxin

@rxin rxin commented Nov 16, 2015

Copy link
Copy Markdown
Contributor

This patch adds the following options to the JSON data source, for dealing with non-standard JSON files:

  • allowComments (default false): ignores Java/C++ style comment in JSON records
  • allowUnquotedFieldNames (default false): allows unquoted JSON field names
  • allowSingleQuotes (default true): allows single quotes in addition to double quotes
  • allowNumericLeadingZeros (default false): allows leading zeros in numbers (e.g. 00012)

To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options.

Also updated documentation to explain these options.

Scala

screen shot 2015-11-15 at 6 12 12 pm

Python

screen shot 2015-11-15 at 6 11 28 pm

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now unused.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add samplingRatio?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we skipped it in the past because it had very little impact on performance, so in most cases it is better to just use 1.0... Maybe we should even deprecate that option.

@SparkQA

SparkQA commented Nov 16, 2015

Copy link
Copy Markdown

Test build #2061 has finished for PR 9724 at commit 00cfc19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Nov 16, 2015

Copy link
Copy Markdown

Test build #45972 has finished for PR 9724 at commit 00cfc19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin

rxin commented Nov 16, 2015

Copy link
Copy Markdown
Contributor Author

Alright I've updated it.

@yhuai

yhuai commented Nov 16, 2015

Copy link
Copy Markdown
Contributor

LGTM pending jenkins.

@SparkQA

SparkQA commented Nov 16, 2015

Copy link
Copy Markdown

Test build #45981 has finished for PR 9724 at commit d8ca56d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class JSONOptions(\n

@rxin

rxin commented Nov 16, 2015

Copy link
Copy Markdown
Contributor Author

Thanks - I'm merging this in.

asfgit pushed a commit that referenced this pull request Nov 16, 2015
This patch adds the following options to the JSON data source, for dealing with non-standard JSON files:
* `allowComments` (default `false`): ignores Java/C++ style comment in JSON records
* `allowUnquotedFieldNames` (default `false`): allows unquoted JSON field names
* `allowSingleQuotes` (default `true`): allows single quotes in addition to double quotes
* `allowNumericLeadingZeros` (default `false`): allows leading zeros in numbers (e.g. 00012)

To avoid passing a lot of options throughout the json package, I introduced a new JSONOptions case class to define all JSON config options.

Also updated documentation to explain these options.

Scala

![screen shot 2015-11-15 at 6 12 12 pm](https://cloud.githubusercontent.com/assets/323388/11172965/e3ace6ec-8bc4-11e5-805e-2d78f80d0ed6.png)

Python

![screen shot 2015-11-15 at 6 11 28 pm](https://cloud.githubusercontent.com/assets/323388/11172964/e23ed6ee-8bc4-11e5-8216-312f5983acd5.png)

Author: Reynold Xin <rxin@databricks.com>

Closes #9724 from rxin/SPARK-11745.

(cherry picked from commit 42de525)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 42de525 Nov 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants