feat: Add num_rows and TaskContext to CometUDFBridge.evaluate#4306
Conversation
andygrove
left a comment
There was a problem hiding this comment.
LGTM with one question for my understanding
|
Dropping
Even with that fixed, Workable fixes (move the suite into |
…nature PR apache#4306 added a numRows parameter to CometUDF.evaluate; merging main into this branch brought in the trait change but the six regexp UDF implementations still used the old single-argument signature, breaking comet-common compilation across all Spark profiles.
Followup to the apache/main merge: the framework's evaluate signature gained numRows in PR apache#4306, but the ArrayExistsUDF override was missed.
Which issue does this PR close?
Closes #.
Rationale for this change
CometUDFs can run on tokio threads while the original task thread is parked, so you can't just reliably retrieve it from Spark. We now stash the TaskContext on the native side via the planner. Also, we need to know the num_rows for CometUDFs that don't take input columns. These are changes already in #4267.
What changes are included in this PR?
Thread through TaskContext and num_rows over CometUDFBridge.
How are these changes tested?
No new tests. Nothing in production on this branch invokes the bridge yet, so end-to-end coverage lands with #4267 when the dispatcher drives it for real. An earlier unit suite was dropped because of the Arrow shading boundary in
common/: the suite compiled against unshaded Arrow but CI runs against the shaded jar, and the test-definedCometUDFsubclasses cannot override the shaded interface from outsidecommon/.