ARROW-11719: [Rust][Datafusion] support creating memory table with merged schema#9537
ARROW-11719: [Rust][Datafusion] support creating memory table with merged schema#9537houqp wants to merge 2 commits into
Conversation
…rged schema * Added `contains` method for `arrow::datatypes::Schema` and `arrow::datatypes::Field` * Relax batch schema validation using `contains` check when creating a MemTable in datafusion
alamb
left a comment
There was a problem hiding this comment.
This is a cool change @houqp -- thanks.
I am not sure if DataFusion will work with different schemas without some additional modifications. Specifically, when the schemas are actually subsets of each other with different numbers of columns -- here is what I came up with: https://github.com/influxdata/influxdb_iox/blob/main/query/src/provider/adapter.rs#L44-L70
I think the contains logic makes sense, and is actually quite interesting -- in IOx, we have similar code to effectively merge schemas. This implements compatible definitions of merge.
|
|
||
| /// Check to see if `self` is a superset of `other` schema Here are the comparision rules: | ||
| /// | ||
| /// * for every field `f` in other, the field in self with corresponding index should be a |
There was a problem hiding this comment.
👍 thank you for the clear comments
|
@alamb good call, I only assumed that logically makes sense, but never checked to see if it's actually implemented in datafusion myself. I have pushed a commit to check for fields count in the I am a fan of your SchemaAdapterStream implementation, looks like it would be useful to include the core of that logic in datafusion as well. |
alamb
left a comment
There was a problem hiding this comment.
I think this looks great @houqp - thank you. @nevi-me / @jorgecarleitao / @andygrove any comments?
| ], | ||
| )?; | ||
|
|
||
| match MemTable::try_new(schema2, vec![vec![batch]]) { |
containsmethod forarrow::datatypes::Schemaandarrow::datatypes::Fieldcontainscheck when creating aMemTable in datafusion