[SPARK-42051][SQL] Codegen Support for HiveGenericUDF#39555
Conversation
|
cc @cloud-fan @HyukjinKwon @dongjoon-hyun PTAL, thanks |
| } | ||
|
|
||
| private[hive] case class HiveGenericUDF( | ||
| case class HiveGenericUDF( |
There was a problem hiding this comment.
is it possible to rewrite HiveGenericUDF with Invoke? Then we can simply use RuntimeReplaceable
There was a problem hiding this comment.
Seems quite complicated to handle the hive value mapping and function wrapping
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. The current approach also looks reasonable to me.
Could you review this once more when you have some time, @cloud-fan and @sunchao ?
|
Also cc @LuciferYang , too |
|
Belated Happy new year! |
|
Since the last commit is for adding a test, I will merge this. thanks @dongjoon-hyun @LuciferYang. Merged to master |
|
Thank you, @yaooqinn and @LuciferYang ! |
|
late LGTM |
| extends DeferredObject with HiveInspectors { | ||
|
|
||
| private val wrapper = wrapperFor(oi, dataType) | ||
| private var func: () => Any = _ |
There was a problem hiding this comment.
@yaooqinn @dongjoon-hyun This change removes deferred evaluation and means it is no longer possible to implement short-circuiting in Hive generic UDFs. I filed https://issues.apache.org/jira/browse/SPARK-44616 for this.
There was a problem hiding this comment.
During the upgrade from Spark 3.3.1 to 3.5.1, we encountered syntax issues with this pr. The problem arose from DeferredObject currently passing a value instead of a function, which prevented users from catching exceptions in GenericUDF, resulting in semantic differences.
Here is an example case we encountered. Originally, the semantics were that str_to_map_udf would throw an exception due to issues with the input string, while merge_map_udf could catch the exception and return a null value. However, currently, any exception encountered by str_to_map_udf will cause the program to fail.
select merge_map_udf(str_to_map_udf(col1), parse_map_udf(col2), map("key", "value")) from table
There was a problem hiding this comment.
@yaooqinn is it easy to fix? If not we should probably revert it as this is not a critical perf improvement.
There was a problem hiding this comment.
Sorry for being late,my network glitches a lot recently. and thanks for reporting this issue. It’s easy to make a fix
What changes were proposed in this pull request?
As a subtask of SPARK-42050, this PR adds Codegen Support for
HiveGenericUDFWhy are the changes needed?
improve codegen coverage and performance
Does this PR introduce any user-facing change?
no
How was this patch tested?
new UT added