[SPARK-52837][CONNECT][PYTHON] Support TimeType in pyspark#51515
[SPARK-52837][CONNECT][PYTHON] Support TimeType in pyspark#51515dengziming wants to merge 5 commits into
Conversation
|
Hello @zhengruifeng @MaxGekk @peter-toth , please take a look at this, we should also support time literals in Connect. |
MaxGekk
left a comment
There was a problem hiding this comment.
@dengziming Please, open new ticket specifically for the python client.
|
@MaxGekk I have created a separate ticket. |
| return datetime.date.fromordinal(v + self.EPOCH_ORDINAL) | ||
|
|
||
|
|
||
| class TimeType(AtomicType): |
There was a problem hiding this comment.
this PR actually add a new datatype in pyspark, not just python client.
It should work with both python client and pyspark classic.
We'd better also add test for pyspark classic.
There was a problem hiding this comment.
And I just notice that the class hierarchy is not consistent with the JVM side:
case class TimeType(precision: Int) extends AnyTimeType
We'd better make them the same, by also introducing the AnyTimeType in pyspark. If we need to touch other existing date time types, we can do it in a separate PR.
There was a problem hiding this comment.
I'm not very professional with PySpark, this will take some time, I changed it to WIP by now.
0942af7 to
84d55ae
Compare
|
@zhengruifeng Fortunately, it's not at all complicated to make TimeType and DatetimeType consistent with JVM side definitions, I have finished these changes, please take a look. |
|
Hello @zhengruifeng @HyukjinKwon , please take a look at this PR when you are free, I have made a thorough check of the code. |
| class DatetimeType(AtomicType): | ||
| """Super class of all datetime data type.""" | ||
|
|
||
| def needConversion(self) -> bool: |
There was a problem hiding this comment.
DatetimeType should not have needConversion
| return True | ||
|
|
||
|
|
||
| class DateType(DatetimeType, metaclass=DataTypeSingleton): |
There was a problem hiding this comment.
DateType's needConversion is missing
|
|
||
| class TimestampType(AtomicType, metaclass=DataTypeSingleton): | ||
| """Timestamp (datetime.datetime) data type.""" | ||
| class AnyTimeType(DatetimeType): |
There was a problem hiding this comment.
let's keep the needConversion in each child class
|
|
||
| eventually(catch_assertions=True)(condition)() | ||
|
|
||
| def test_time_lit(self) -> None: |
There was a problem hiding this comment.
this test seems redundant.
Tests in test_column and test_functions will be reused on connect mode
There was a problem hiding this comment.
This test file is mainly for directly comparing behavior between connect and classic
zhengruifeng
left a comment
There was a problem hiding this comment.
In general LGTM, left a few comments
|
Thank you for the instructions @zhengruifeng . I fixed them, PTAL. |
|
thank @dengziming so much, merged to master |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @dengziming and all.
What changes were proposed in this pull request?
This supports TimeType literal in pyspark connect.
Why are the changes needed?
To align the Python Connect client with the Java/Scala Connect client.
Does this PR introduce any user-facing change?
Yes, we can use TimeType literal in several ways, for example,
PySparkSession.sql("SELECT TIME '12:13:14'")and
pyspark.sql.connect.functions.lit(datetime.time(12, 13, 14)).How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No