-
-
Notifications
You must be signed in to change notification settings - Fork 95
Description
chDB currently supports Python user-defined functions (UDFs) in queries, leveraging ClickHouse's native capability.
The existing implementation uses standard input/output as the communication channel and executes UDFs in a separate Python process.
While this approach is functional, it is not the most elegant or flexible solution for chDB. We propose refactoring the UDF execution mechanism to run UDFs directly within the main chDB process.
Key Benefits of In-Process UDF Execution:
-
Superior Performance: Remove overhead from inter-process communication, enabling faster UDF execution. We can also facilitate better optimization for batch processing scenarios.
-
Extended Flexibility: Lay the groundwork for supporting more advanced UDF types in the future, such as custom aggregate functions and custom table functions.
import chdb
from chdb.udf import chdb_udf
@chdb_udf()
def sum_udf(lhs, rhs):
return int(lhs) + int(rhs)
chdb.query("select sum_udf(12, 22)").show()