Refactor Python UDF Execution to In-Process

chDB currently supports Python user-defined functions (UDFs) in queries, leveraging ClickHouse's native capability. 
The existing implementation uses standard input/output as the communication channel and executes UDFs in a separate Python process.

While this approach is functional, it is not the most elegant or flexible solution for chDB. We propose refactoring the UDF execution mechanism to run UDFs directly within the main chDB process.

**Key Benefits of In-Process UDF Execution:**

1. Superior Performance: Remove overhead from inter-process communication, enabling faster UDF execution. We can also facilitate better optimization for batch processing scenarios.

1. Extended Flexibility: Lay the groundwork for supporting more advanced UDF types in the future, such as custom aggregate functions and custom table functions.

```python
import chdb
from chdb.udf import chdb_udf

@chdb_udf()
def sum_udf(lhs, rhs):
    return int(lhs) + int(rhs)

chdb.query("select sum_udf(12, 22)").show()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor Python UDF Execution to In-Process #443

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Refactor Python UDF Execution to In-Process #443

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions