[SPARK-26903][SQL] Remove the TimeZone cache#23812
Conversation
|
Test build #102422 has finished for PR 23812 at commit
|
|
Test build #102423 has finished for PR 23812 at commit
|
|
Test build #102427 has finished for PR 23812 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Side note: SimpleDateParam calls TimeZone.getTimeZone("GMT"). If you like you could make that a constant here or call to DateTimeUtils to fully remove those calls.
|
@MaxGekk I'll merge this with a rebase, and if you check out my few comments above |
# Conflicts: # sql/core/benchmarks/DateTimeBenchmark-results.txt
|
Test build #102655 has finished for PR 23812 at commit
|
| ToUTCTimestamp( | ||
| Literal(Timestamp.valueOf("2015-07-24 00:00:00")), Literal("\"quote")) :: Nil) | ||
| }.getMessage | ||
| assert(msg == "Invalid ID for region-based ZoneId, invalid format: \"quote") |
There was a problem hiding this comment.
Small last one -- make this consistent with the test below and remove comment about escaping. In fact, maybe the bad zone ID should be obviously wrong, like "NoSuchZone"
There was a problem hiding this comment.
I added a couple more test cases
|
Test build #102698 has finished for PR 23812 at commit
|
|
Merged to master |
|
|
||
| def getTimeZone(timeZoneId: String): TimeZone = { | ||
| computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone) | ||
| val zoneId = ZoneId.of(timeZoneId, ZoneId.SHORT_IDS) |
There was a problem hiding this comment.
Hi @MaxGekk after upgrading Spark 2.3 to Spark3.0, we found this behaviour change are rejecting some valid timeZoneIds, for example
// GMT+8:00 is a valid timezone if parsed from TimeZone.getTimeZone("GMT+8:00")
// However, ZoneId.of("GMT+8:00", ZoneId.SHORT_IDS) are rejected with an exception
from_unix_time("2020-01-01 10:00:00", "GMT+8:00")
what do you think about support these kind of timezones, such as GMT+8:00?
There was a problem hiding this comment.
What changes were proposed in this pull request?
In the PR, I propose to convert time zone string to
TimeZoneby converting it toZoneIdwhich usesZoneOffsetinternally. TheZoneOffsetclass of JDK 8 has a cache already: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/time/ZoneOffset.java#l205 . In this way, there is no need to support cache of time zones in Spark.The PR removes
computedTimeZonesfromDateTimeUtils, and usesZoneId.ofto convert time zone id string toZoneIdand toTimeZoneat the end.How was this patch tested?
The changes were tested by