[SPARK-26903][SQL] Remove the TimeZone cache by MaxGekk · Pull Request #23812 · apache/spark

MaxGekk · 2019-02-16T22:45:16Z

What changes were proposed in this pull request?

In the PR, I propose to convert time zone string to TimeZone by converting it to ZoneId which uses ZoneOffset internally. The ZoneOffset class of JDK 8 has a cache already: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/time/ZoneOffset.java#l205 . In this way, there is no need to support cache of time zones in Spark.

The PR removes computedTimeZones from DateTimeUtils, and uses ZoneId.of to convert time zone id string to ZoneId and to TimeZone at the end.

How was this patch tested?

The changes were tested by

SparkQA · 2019-02-17T01:01:43Z

Test build #102422 has finished for PR 23812 at commit e4a40b6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-17T01:15:31Z

Test build #102423 has finished for PR 23812 at commit f0e9d18.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-17T15:10:47Z

Test build #102427 has finished for PR 23812 at commit 37551d1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Side note: SimpleDateParam calls TimeZone.getTimeZone("GMT"). If you like you could make that a constant here or call to DateTimeUtils to fully remove those calls.

srowen · 2019-02-22T04:16:40Z

@MaxGekk I'll merge this with a rebase, and if you check out my few comments above

# Conflicts: # sql/core/benchmarks/DateTimeBenchmark-results.txt

SparkQA · 2019-02-22T18:01:02Z

Test build #102655 has finished for PR 23812 at commit 18e6e15.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-02-23T03:40:26Z

+        ToUTCTimestamp(
+          Literal(Timestamp.valueOf("2015-07-24 00:00:00")), Literal("\"quote")) :: Nil)
+    }.getMessage
+    assert(msg == "Invalid ID for region-based ZoneId, invalid format: \"quote")


Small last one -- make this consistent with the test below and remove comment about escaping. In fact, maybe the bad zone ID should be obviously wrong, like "NoSuchZone"

I added a couple more test cases

SparkQA · 2019-02-23T12:35:27Z

Test build #102698 has finished for PR 23812 at commit 9fded33.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-02-23T15:44:29Z

Merged to master

advancedxy · 2021-01-22T06:04:55Z

-
  def getTimeZone(timeZoneId: String): TimeZone = {
-    computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
+    val zoneId = ZoneId.of(timeZoneId, ZoneId.SHORT_IDS)


Hi @MaxGekk after upgrading Spark 2.3 to Spark3.0, we found this behaviour change are rejecting some valid timeZoneIds, for example

// GMT+8:00 is a valid timezone if parsed from TimeZone.getTimeZone("GMT+8:00") // However, ZoneId.of("GMT+8:00", ZoneId.SHORT_IDS) are rejected with an exception from_unix_time("2020-01-01 10:00:00", "GMT+8:00")

what do you think about support these kind of timezones, such as GMT+8:00?

https://issues.apache.org/jira/browse/SPARK-34392

MaxGekk added 5 commits February 16, 2019 21:59

Benchmark results before

f5559fc

Benchmark results after

2a2cc39

Embed getting zoneId into getTimeZone

e378bb5

Re-run the benchmark

e4a40b6

Support short zone IDs

f0e9d18

MaxGekk changed the title ~~[SPARK-26903][SQL] Remove the cache of TimeZones~~ [SPARK-26903][SQL] Remove the TimeZone cache Feb 16, 2019

Check the exception caused by wrong time zone id

37551d1

srowen requested changes Feb 17, 2019

View reviewed changes

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

Comment thread ...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala Outdated

srowen reviewed Feb 17, 2019

View reviewed changes

Comment thread ...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala Outdated

MaxGekk added 3 commits February 22, 2019 14:20

Merge remote-tracking branch 'origin/master' into timezone-cache

167ee60

# Conflicts: # sql/core/benchmarks/DateTimeBenchmark-results.txt

Check tested zoneId is presented in exception message

7e61d2a

Updating the migration guide

18e6e15

srowen reviewed Feb 23, 2019

View reviewed changes

Separate tests for invalid time zone ids

9fded33

srowen approved these changes Feb 23, 2019

View reviewed changes

srowen closed this in d0f2fd0 Feb 23, 2019

MaxGekk deleted the timezone-cache branch September 18, 2019 15:56

advancedxy reviewed Jan 22, 2021

View reviewed changes

Uh oh!

Conversation

MaxGekk commented Feb 16, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 17, 2019

Uh oh!

SparkQA commented Feb 17, 2019

Uh oh!

SparkQA commented Feb 17, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srowen commented Feb 22, 2019

Uh oh!

SparkQA commented Feb 22, 2019

Uh oh!

srowen Feb 23, 2019

Choose a reason for hiding this comment

Uh oh!

MaxGekk Feb 23, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 23, 2019

Uh oh!

srowen commented Feb 23, 2019

Uh oh!

advancedxy Jan 22, 2021

Choose a reason for hiding this comment

Uh oh!

wangyum Feb 7, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants