[SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled by srowen · Pull Request #23495 · apache/spark

srowen · 2019-01-09T03:05:12Z

What changes were proposed in this pull request?

Per discussion in #23391 (comment) this proposes to just remove the old pre-Spark-3 time parsing behavior.

This is a rebase of #23411

How was this patch tested?

Existing tests.

SparkQA · 2019-01-09T05:17:50Z

Test build #100949 has finished for PR 23495 at commit 9d878b0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-01-09T06:27:16Z

retest this please

SparkQA · 2019-01-09T08:05:01Z

Test build #100955 has finished for PR 23495 at commit 9d878b0.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2019-01-09T13:49:05Z

@@ -1452,105 +1452,103 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
  }

  test("backward compatibility") {


As far as I remember this test requires https://github.com/apache/spark/pull/23495/files#diff-7f589e01d3e5e5ea284c1622527d4984L85 . I don't think it is possible to pass the test on new parser.

SparkQA · 2019-01-09T19:05:12Z

Test build #100972 has finished for PR 23495 at commit 2e7ec33.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-01-11T14:53:39Z

Merged to master

…rser.enabled ## What changes were proposed in this pull request? The SQL config `spark.sql.legacy.timeParser.enabled` was removed by #23495. The PR cleans up the SQL migration guide and the comment for `UnixTimestamp`. Closes #23529 from MaxGekk/get-rid-off-legacy-parser-followup. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

## What changes were proposed in this pull request? Per discussion in apache#23391 (comment) this proposes to just remove the old pre-Spark-3 time parsing behavior. This is a rebase of apache#23411 ## How was this patch tested? Existing tests. Closes apache#23495 from srowen/SPARK-26503.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

…rser.enabled ## What changes were proposed in this pull request? The SQL config `spark.sql.legacy.timeParser.enabled` was removed by apache#23495. The PR cleans up the SQL migration guide and the comment for `UnixTimestamp`. Closes apache#23529 from MaxGekk/get-rid-off-legacy-parser-followup. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

gatorsmile · 2020-02-27T04:43:39Z

-      //      in the JSON object.
-      //  - For Spark before 1.5.1, we do not generate UDTs. So, we manually added the UDT value to
-      //      JSON objects generated by those Spark versions (col17).
-      //  - If the type is NullType, we do not write data out.


What is the reason we removed this test case? This test case sounds very critical.

cc @srowen @marmbrus @MaxGekk @cloud-fan

This was added in the PR #8806

We have to capture all such changes before the final release of Spark 3.0. We can delay the release but we have to capture all such changes. Please let us know if you are aware of any similar change we made in the upcoming 3.0 release.

I think we need to add more such test cases for ensuring we do not break the backward compatibility for all the built-in data sources.

The test is gone because the old behavior is gone; that's all that's going on here.
See the OP with a link to the actual change. The key discussions were:

#23391 (comment)
#23391 (comment)

I think the TL;DR is that the legacy behavior is error-prone and already susceptible to getting wrong answers for old dates. That seems worth 'fixing' despite the forced behavior change.

Based on the latest discussions in #27710 (comment), we can't silently return the wrong results. The backward compatibility is very critical.

I defer to @MaxGekk and @cloud-fan on that thread. It's trading one set of problems for another but it could be the right thing. We will never get rid of the legacy behavior now, I'm pretty sure :)

Remove spark.sql.legacy.timeParser.enabled

9d878b0

srowen mentioned this pull request Jan 9, 2019

[SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled #23411

Closed

MaxGekk reviewed Jan 9, 2019

View reviewed changes

Remove legacy test

2e7ec33

MaxGekk approved these changes Jan 10, 2019

View reviewed changes

srowen closed this in 51a6ba0 Jan 11, 2019

MaxGekk mentioned this pull request Jan 12, 2019

[SPARK-26503][CORE][DOC][FOLLOWUP] Get rid of spark.sql.legacy.timeParser.enabled #23529

Closed

srowen deleted the SPARK-26503.2 branch January 21, 2019 17:56

gatorsmile mentioned this pull request Dec 23, 2019

[SPARK-26178][SQL] Use java.time API for parsing timestamps and dates from CSV #23150

Closed

gatorsmile reviewed Feb 27, 2020

View reviewed changes

		@@ -1452,105 +1452,103 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
		}

		test("backward compatibility") {

Uh oh!

Conversation

srowen commented Jan 9, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jan 9, 2019

Uh oh!

HyukjinKwon commented Jan 9, 2019

Uh oh!

SparkQA commented Jan 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 9, 2019

Uh oh!

srowen commented Jan 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants