Skip to content

[SPARK-20980] [SQL] Rename wholeFile to multiLine for both CSV and JSON#18202

Closed
gatorsmile wants to merge 3 commits into
apache:masterfrom
gatorsmile:renameCVSOption
Closed

[SPARK-20980] [SQL] Rename wholeFile to multiLine for both CSV and JSON#18202
gatorsmile wants to merge 3 commits into
apache:masterfrom
gatorsmile:renameCVSOption

Conversation

@gatorsmile

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

The current option name wholeFile is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is multiLine.

How was this patch tested?

N/A

@gatorsmile gatorsmile changed the title [SPARK-20980] [SQL] CSV: rename the option wholeFile to multiLine [SPARK-20980] [SQL] Rename the CSV option wholeFile to multiLine Jun 5, 2017
@gatorsmile

gatorsmile commented Jun 5, 2017

Copy link
Copy Markdown
Member Author

cc @cloud-fan @HyukjinKwon

@SparkQA

SparkQA commented Jun 5, 2017

Copy link
Copy Markdown

Test build #77733 has started for PR 18202 at commit 1fae66a.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Retest this please.

parameters.getOrElse("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"), timeZone, Locale.US)

val wholeFile = parameters.get("wholeFile").map(_.toBoolean).getOrElse(false)
val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false)

@dongjoon-hyun dongjoon-hyun Jun 5, 2017

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is different. Each JSON file only can parse at most one record when wholeFile is on.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Thank you!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rethinking the issue, we need to rename both CSV and JSON to multiLine and fix the JSON parsing to make them consistent.

@gatorsmile gatorsmile changed the title [SPARK-20980] [SQL] Rename the CSV option wholeFile to multiLine [SPARK-20980] [SQL] Rename wholeFile to multiLine for both CSV and JSON Jun 5, 2017
@SparkQA

SparkQA commented Jun 5, 2017

Copy link
Copy Markdown

Test build #77746 has finished for PR 18202 at commit 1fae66a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

HyukjinKwon commented Jun 5, 2017

Copy link
Copy Markdown
Member

I think the change itself looks good as targeted to me (if this is going to be included in 2.2.0 - I just saw https://issues.apache.org/jira/browse/SPARK-20980?focusedCommentId=16037416&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16037416). Looks we just need a decision. Probably, please let me cc @rxin who I believe came up the option name initially.

@rxin

rxin commented Jun 5, 2017

Copy link
Copy Markdown
Contributor

Wouldn't this break compatibility?

@SparkQA

SparkQA commented Jun 5, 2017

Copy link
Copy Markdown

Test build #77751 has finished for PR 18202 at commit f70994c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Jun 5, 2017

Copy link
Copy Markdown

Test build #77752 has finished for PR 18202 at commit 5780ef5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon

HyukjinKwon commented Jun 5, 2017

Copy link
Copy Markdown
Member

Both options look added in 2.2 assuming from the JIRAs https://issues.apache.org/jira/browse/SPARK-19610 and https://issues.apache.org/jira/browse/SPARK-18352. If it targets 2.2.0, I guess It wouldn't.

@cloud-fan

Copy link
Copy Markdown
Contributor

let's hold it until RC4 finishes. If RC4 passes, we need to update this PR to support the old option name, otherwise we can just rename.

@HyukjinKwon

Copy link
Copy Markdown
Member

Hi all, it sounds RC4 vote was failed. Should we proceed this one?

@asfgit asfgit closed this in 2051428 Jun 15, 2017
@cloud-fan

Copy link
Copy Markdown
Contributor

ah lucky :) merging to master/2.2!

asfgit pushed a commit that referenced this pull request Jun 15, 2017
… JSON

The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`.

N/A

Author: Xiao Li <gatorsmile@gmail.com>

Closes #18202 from gatorsmile/renameCVSOption.

(cherry picked from commit 2051428)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@felixcheung

Copy link
Copy Markdown
Member

@felixcheung

Copy link
Copy Markdown
Member

opened #18312

dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
… JSON

### What changes were proposed in this pull request?
The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`.

### How was this patch tested?
N/A

Author: Xiao Li <gatorsmile@gmail.com>

Closes apache#18202 from gatorsmile/renameCVSOption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants