Support ordering analysis with expressions (not just columns) by Replace OrderedColumn with PhysicalSortExpr#6501
Conversation
# Conflicts: # datafusion/core/src/physical_plan/windows/mod.rs # datafusion/physical-expr/src/equivalence.rs
# Conflicts: # datafusion/physical-expr/src/equivalence.rs
OrderedColumn with PhysicalSortExpr
alamb
left a comment
There was a problem hiding this comment.
I think this PR looks great -- thank you @mustafasrepo and adds a neat feature. cc @mingmwang in case you have any interest in reviewing this
However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.
We hit something similar when trying to make LogicalPlan implement hash (because of the LogicalPlan::Extension variant that has a Arc<dyn UserDefinedLogicalNode>
The solution we came up with was
And then implemented it like this: https://docs.rs/datafusion-expr/25.0.0/src/datafusion_expr/logical_plan/extension.rs.html#235-285
|
|
||
| /// Remove `entry` for the `in_data`, returns `true` if removal is successful (e.g `entry` is indeed in the `in_data`) | ||
| /// Otherwise return `false` | ||
| fn remove_from_vec<T: PartialEq>(in_data: &mut Vec<T>, entry: &T) -> bool { |
There was a problem hiding this comment.
Perhaps a more idiomatic way would be for this function to return Option<T> (which is what Some(in_data.remove()) returns )
That might allow you to avoid some of the other changes to remove later
There was a problem hiding this comment.
Since, we remove by giving element inside the vector. We already have removed element. If we return Option<T> the value inside Option will be entry argument to the function. Hence this function is more akin to HashSet remove. Also inside remove function we are interested in whether removal was successful, in this case we need to introduce is_some checks inside remove function.
Hence I think, current API is more clear, However, if it is misleading, or counter intuitive I can implement as your suggestion.
There was a problem hiding this comment.
makes sense -- thank you for the response
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
I will experiment with using |
Which issue does this PR close?
Closes #.
Rationale for this change
OrderedColumnstruct keeps columns that have ordering, with ordering information. This struct is used duringOrderingEquivalencecalculations. However, existingPhysicalSortExprcan keep track of this information. AlsoPhysicalSortExprsupports not just, columns but complex expressions also.We can use
PhysicalSortExprinstead ofOrderedColumn.What changes are included in this PR?
This PR removes
OrderedColumnstruct and usesPhysicalSortExprin its place.However, because
PhysicalSortExprdoesn't implementHashtrait (there is no trivial way to support this trait if any). We changed theEquivalentClassimplementation so that it doesn't requireHashtrait anymore.For this reason, we have replaced places in
EquivalentClasswhereHashSetis used withVector.Are these changes tested?
Yes existing tests should work, also new test is added (under
window.sltfile) to show that we can use complex expressions (not just Columns) during ordering equivalence calculations.Are there any user-facing changes?