-
Notifications
You must be signed in to change notification settings - Fork 1.9k
FFI: return underlying trait type when converting from FFI structs #18672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFI: return underlying trait type when converting from FFI structs #18672
Conversation
adff895 to
27d983c
Compare
27d983c to
bc0968b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves FFI (Foreign Function Interface) handling by introducing a library_marker_id mechanism to detect when FFI objects originate from the same library, avoiding unnecessary wrapper overhead during round trips. This optimization allows the system to return original implementations directly when FFI structures are passed back to their originating library.
Key changes:
- Added
library_marker_idfield to all FFI structs to identify the originating library - Modified
Fromimplementations to returnArc<dyn Trait>instead ofForeign*wrapper types, bypassing wrappers when detecting local library origin - Updated all test code and examples to use the new direct trait conversion pattern
Reviewed Changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| datafusion/ffi/src/lib.rs | Adds get_library_marker_id() function and test mock to identify library origin |
| datafusion/ffi/README.md | Documents the library marker ID concept and its purpose |
| docs/source/library-user-guide/upgrading.md | Provides migration guide for API changes from Foreign* to Arc<dyn Trait> |
| datafusion/ffi/src/execution_plan.rs | Adds library marker support and bypass logic to execution plans |
| datafusion/ffi/src/table_provider.rs | Updates table provider with marker checks and helper methods |
| datafusion/ffi/src/catalog_provider*.rs | Implements library marker checks for catalog providers |
| datafusion/ffi/src/schema_provider.rs | Adds marker-based bypass for schema providers |
| datafusion/ffi/src/ud*.rs | Updates all user-defined function types with library marker logic |
| datafusion/ffi/src/udaf/accumulator*.rs | Implements bypass for accumulator types with null pointer safety |
| datafusion/ffi/src/udwf/partition_evaluator.rs | Adds marker checks to partition evaluators |
| datafusion/ffi/tests/*.rs | Updates test code to use new Arc<dyn Trait> conversion pattern |
| datafusion-examples/examples/ffi/ffi_module_loader/* | Updates example to use new API pattern |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks for the review @martin-g . Some of those are identical to the review from copilot, which I'm guessing you're using a AI bot to assist. It would be helpful to not get the duplicates if you can adjust it in that way. |
|
@alamb Thank you for the review! I believe I've addressed all of your comments. The changes I've pushed are only in documentation. |
|
Hi @timsaucer , Cool work! thanks! Right now we are relying on memory address of a static to determine if we are in a foreign/local context, and although that raw memory check makes it now, I am not sure if having some sort of global registry would help future proofing the whole improvement. The global registry could still hold the raw memory pointer but also some UUID generated when a new library gets registered. Do you think that would help? |
This is a good question. I did play around with doing some kind of registry, but I don't think it necessarily fits the model we currently have of how these structures are created and used. Take the There are other things one could use such a registry for. I haven't really thought of any so I'm not sure how your suggestion would make the appoach more future proof. Do you have a use case in mind already? If there's value add in doing so, we can rework this PR but it would likely be more complex because I suspect each one of these structs would need a custom registry. I think the advantage of what we have here is that it is very simple. The address for the static variable is guaranteed to be unique from one library or application to another. One thing this PR doesn't do and it would be awesome if I could figure out a way is to figure out if we have libraries that can access each other's internal memory structures. That's near impossible because Rust doesn't have a stable ABI so the same exact code built with two different versions of |
|
About that last suggestion, I did ask an AI agent and it did generate some code. So it may be possible, but that would also be of very limited utility as it would lock providers into building with the same toolchain. For purely internal development, they can also guarantee that already. But for anyone who wants to share libraries with one another in an open source project I don't think it's really worth the effort. |
Co-authored-by: Renato Marroquin <renatoj.marroquin@gmail.com>
Co-authored-by: Renato Marroquin <renatoj.marroquin@gmail.com>
Co-authored-by: Renato Marroquin <renatoj.marroquin@gmail.com>
datafusion/ffi/src/lib.rs
Outdated
| /// the same library. It is possible that the interplay between | ||
| /// foreign and local functions calls create one FFI struct that | ||
| /// references another. It is helpful to determine if a foreign | ||
| /// struct is truly foreign or in the same library. If we are in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please elaborate what foreign means in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some text
datafusion/ffi/src/lib.rs
Outdated
| /// This function works by checking the address of the library | ||
| /// marker. Each library that implements the FFI code will have | ||
| /// a different address for the marker. By checking the marker | ||
| /// address we can determine if a struct is truly Foreign or is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// address we can determine if a struct is truly Foreign or is | |
| /// address we can determine if a struct is truly foreign or is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so foreign means the function pointer comes from another library? can it be the same library but different version? like jvm hell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a second instance of the same library and that would give a different address. In practice I don't think this is a problem, though. The code would still work, they just wouldn't get the benefit of this marker id work. It would work as previous to this PR.
I don't think it's a problem in practice because it would mean importing the same library twice. The canonical example of datafusion-python doesn't do that. There may be an edge case somewhere. Related is this discussion, but that is a lot of effort for minimal return IMO.
| type Error = DataFusionError; | ||
|
|
||
| fn try_from(ffi_props: FFI_PlanProperties) -> Result<Self, Self::Error> { | ||
| if (ffi_props.library_marker_id)() == crate::get_library_marker_id() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we can add some comments here? afaiu it is the place where the plan properties are asserted as been foreign or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an assertion, but a check. I think there's plenty of documentation in (1) docstring for library_marker_id (2) docstring for get_library_marker_id and (3) the README.md that I don't think we need an additional inline doc for every time we call this function.
| impl From<&FFI_SchemaProvider> for Arc<dyn SchemaProvider + Send> { | ||
| fn from(provider: &FFI_SchemaProvider) -> Self { | ||
| Self(provider.clone()) | ||
| if (provider.library_marker_id)() == crate::get_library_marker_id() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| _usually_ do not need to worry about memory management. When interacting with | ||
| foreign code, this is not necessarily true. If you review the structures in | ||
| this crate, you will find that many of them implement the `Drop` trait and | ||
| perform a foreign call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would probably nice to remind 'Box::leak' and 'forget' when Rust delegates the destruction of allocated object to the callee site like Python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're asking for. And I don't think rust is delegating the destruction of the object to Python, but to the Rust code on the other side of the boundary which Python wraps around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sometimes ago I found an article https://www.ralphminderhoud.com/blog/rust-ffi-wrong-way/ that states Rust when created a pointer to an object then shouldn't deallocate it, rather delegating the ownership to the callee site, C or Python in this case
#[no_mangle]
pub unsafe extrn "C" fn file_data_read(f: *mut FileData, path: *const libc::c_char) -> bool {
// Get pointer to heap allocated `FileData`
let mut file_data = Box::from_raw(f);
// Build path from c-string path argument
let p = PathBuf::from(CStr::from_ptr(path).to_str().unwrap_or_default());
// Read the data and provide some minimal error handling
let res = file_data.read(&p).is_ok();
// Forget the memory so Rust doesn't deallocate when `file_data` is dropped
std::mem::forget(file_data);
res
}
I was thinking it might be agood knowledge to anyone reading FFI documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exact text would you like to see? We account for the memory forgetting in the Box::into_raw during FFI struct creation and the corresponding Box::from_raw when we do the release().
I'm not quite sure what additional text you think would be helpful.
The author of that blog was focused on language interoperability, which is something we've decided is not a current goal of this crate. Rather we're focused on Rust backed implementations. The Python work simply exposes these structs via PyCapsule - we don't actually support direct interaction of these methods with python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not trying to be difficult, but I really don't understand what parts need additional explanation. I acknowledge as the author of most of this work it is likely something is obvious to me that isn't clear to others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, it was quite a surprise for me when I first time found how leak or forget are used in FFI, I thought it might be beneficial for the reader, but you are right, this example would be quite out of the context for this doc.
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @timsaucer I'm not great FFi expert, but it seems fine for me, just couple of minor questions and thanks @alamb and @martin-g for the reviews.
|
Thank you for the review @comphead . I've tried to answer your questions, but I also think we talked past each other a bit in the comments. Please let me know if there's anything you think should block this PR from merging! |
this is a fair point 👍
I was thinking on a scenario where linking libraries would get us into trouble, but I guess that is also not applicable for the work here. thanks for the discussion @timsaucer ! |
Which issue does this PR close?
return_typeinFFI_ScalarUDFsince it will never be called due to thereturn_field_from_argsimplementation.Rationale for this change
This PR adds the concept of a
library_marker_idto the FFI crate. The reason for this is described in theREADMEtext as part of the PR. In our current use of the FFI library we get into issues where we round trip FFI structs back to their original library that now have FFI wrappers when they are no longer needed.What changes are included in this PR?
From<> for ForeigntoFrom<> for Arc<dyn ...>. The actual use case is essentially always to use these structs as their implementation of the trait they back.FFI_ScalarUDFAre these changes tested?
datafusion-python.main:Are there any user-facing changes?
This does change the API for the FFI functions.