Skip to content

[Python] Allow calling UDF kernels with field/scalar expressions #33048

Description

@asfimport

From #13687 (comment), where it came up while adding documentation on how to use UDFs in Python. When just wanting to invoke a UDF with arrays, you can do pc.call_function("my_udf", [pc.field("a")]).

But if you want to use your UDF in a context that needs an expression (eg a dataset projection), you need to be able to call the UDF with expressions as argument. And currently, the pc.call_function doesn't work that way (it expects actual, materialized arrays/scalars as arguments). As a workaround, you can use the private Expression._call:

# doesn't work with expressions
>>> pc.call_function("my_udf", [pc.field("col")])
...
TypeError: Got unexpected argument type <class 'pyarrow._compute.Expression'> for compute function
# workaround
>>> pc.Expression._call("my_udf", [pc.field("col")])
<pyarrow.compute.Expression my_udf(col)>

So we should try to improve the usability here. Some options:

  • See if we can change pc.call_function to also accept Expressions as arguments

  • Make the _call public, so one can do pc.Expression.call("my_udf", [..])

    cc @westonpace @vibhatha

Reporter: Joris Van den Bossche / @jorisvandenbossche

Note: This issue was originally created as ARROW-17827. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions