Skip to content

Refactor Python UDF Execution to In-Process #443

@wudidapaopao

Description

@wudidapaopao

chDB currently supports Python user-defined functions (UDFs) in queries, leveraging ClickHouse's native capability.
The existing implementation uses standard input/output as the communication channel and executes UDFs in a separate Python process.

While this approach is functional, it is not the most elegant or flexible solution for chDB. We propose refactoring the UDF execution mechanism to run UDFs directly within the main chDB process.

Key Benefits of In-Process UDF Execution:

  1. Superior Performance: Remove overhead from inter-process communication, enabling faster UDF execution. We can also facilitate better optimization for batch processing scenarios.

  2. Extended Flexibility: Lay the groundwork for supporting more advanced UDF types in the future, such as custom aggregate functions and custom table functions.

import chdb
from chdb.udf import chdb_udf

@chdb_udf()
def sum_udf(lhs, rhs):
    return int(lhs) + int(rhs)

chdb.query("select sum_udf(12, 22)").show()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions