Skip to content

Support ordering analysis with expressions (not just columns) by Replace OrderedColumn with PhysicalSortExpr#6501

Merged
mustafasrepo merged 13 commits into
apache:mainfrom
synnada-ai:feature/remove_ordered_column
Jun 5, 2023
Merged

Support ordering analysis with expressions (not just columns) by Replace OrderedColumn with PhysicalSortExpr#6501
mustafasrepo merged 13 commits into
apache:mainfrom
synnada-ai:feature/remove_ordered_column

Conversation

@mustafasrepo

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

OrderedColumn struct keeps columns that have ordering, with ordering information. This struct is used during OrderingEquivalence calculations. However, existing PhysicalSortExpr can keep track of this information. Also PhysicalSortExpr supports not just, columns but complex expressions also.

We can use PhysicalSortExpr instead of OrderedColumn.

What changes are included in this PR?

This PR removes OrderedColumn struct and uses PhysicalSortExpr in its place.

However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.
For this reason, we have replaced places in EquivalentClass where HashSet is used with Vector.

Are these changes tested?

Yes existing tests should work, also new test is added (under window.slt file) to show that we can use complex expressions (not just Columns) during ordering equivalence calculations.

Are there any user-facing changes?

@github-actions github-actions Bot added core Core DataFusion crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels May 31, 2023
@alamb alamb changed the title Replace OrderedColumn with PhysicalSortExpr Support ordering analysis with expressions (not just columns) by Replace OrderedColumn with PhysicalSortExpr Jun 2, 2023

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR looks great -- thank you @mustafasrepo and adds a neat feature. cc @mingmwang in case you have any interest in reviewing this

However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.

We hit something similar when trying to make LogicalPlan implement hash (because of the LogicalPlan::Extension variant that has a Arc<dyn UserDefinedLogicalNode>

The solution we came up with was

https://docs.rs/datafusion-expr/25.0.0/datafusion_expr/logical_plan/trait.UserDefinedLogicalNode.html#tymethod.dyn_hash

And then implemented it like this: https://docs.rs/datafusion-expr/25.0.0/src/datafusion_expr/logical_plan/extension.rs.html#235-285

Comment thread datafusion/core/tests/sqllogictests/test_files/window.slt
Comment thread datafusion/physical-expr/src/equivalence.rs
Comment thread datafusion/physical-expr/src/equivalence.rs Outdated

/// Remove `entry` for the `in_data`, returns `true` if removal is successful (e.g `entry` is indeed in the `in_data`)
/// Otherwise return `false`
fn remove_from_vec<T: PartialEq>(in_data: &mut Vec<T>, entry: &T) -> bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a more idiomatic way would be for this function to return Option<T> (which is what Some(in_data.remove()) returns )

That might allow you to avoid some of the other changes to remove later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, we remove by giving element inside the vector. We already have removed element. If we return Option<T> the value inside Option will be entry argument to the function. Hence this function is more akin to HashSet remove. Also inside remove function we are interested in whether removal was successful, in this case we need to introduce is_some checks inside remove function.
Hence I think, current API is more clear, However, if it is misleading, or counter intuitive I can implement as your suggestion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense -- thank you for the response

Comment thread datafusion/physical-expr/src/equivalence.rs
Comment thread datafusion/physical-expr/src/equivalence.rs
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@mustafasrepo

Copy link
Copy Markdown
Contributor Author

I think this PR looks great -- thank you @mustafasrepo and adds a neat feature. cc @mingmwang in case you have any interest in reviewing this

However, because PhysicalSortExpr doesn't implement Hash trait (there is no trivial way to support this trait if any). We changed the EquivalentClass implementation so that it doesn't require Hash trait anymore.

We hit something similar when trying to make LogicalPlan implement hash (because of the LogicalPlan::Extension variant that has a Arc<dyn UserDefinedLogicalNode>

The solution we came up with was

https://docs.rs/datafusion-expr/25.0.0/datafusion_expr/logical_plan/trait.UserDefinedLogicalNode.html#tymethod.dyn_hash

And then implemented it like this: https://docs.rs/datafusion-expr/25.0.0/src/datafusion_expr/logical_plan/extension.rs.html#235-285

I will experiment with using dyn hash, I think this will simplify the structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants