-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(proto): Add protobuf serialization for HashExpr #19379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
b2f4028
eaf4bf7
c2db472
2c89b17
6ccd80b
4012551
1cb6b1c
476415a
c68da54
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -124,7 +124,8 @@ impl std::fmt::Debug for HashExpr { | |
| .map(|e| e.to_string()) | ||
| .collect::<Vec<_>>() | ||
| .join(", "); | ||
| write!(f, "{}({})", self.description, cols) | ||
| let (s1, s2, s3, s4) = self.seeds(); | ||
| write!(f, "{}({cols}, [{s1},{s2},{s3},{s4}])", self.description) | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -258,12 +259,15 @@ impl Hash for HashTableLookupExpr { | |
| fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
| self.hash_expr.dyn_hash(state); | ||
| self.description.hash(state); | ||
| Arc::as_ptr(&self.hash_map).hash(state); | ||
| } | ||
| } | ||
|
|
||
| impl PartialEq for HashTableLookupExpr { | ||
| fn eq(&self, other: &Self) -> bool { | ||
| self.hash_expr.dyn_eq(&other.hash_expr) && self.description == other.description | ||
| self.hash_expr.dyn_eq(&other.hash_expr) | ||
| && self.description == other.description | ||
| && Arc::ptr_eq(&self.hash_map, &other.hash_map) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems like it was like this already, but want to make the question anyways: How it is that for two
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes in theory that would be better. But given the actual real world usage I think just saying "if they're not the same hash table treat them as not being the same" is good enough. The only case this doesn't cover is where two hash tables have the same data but given how this is used in HashJoinExec that's not possible (it would be a major bug!).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it looks correct to compare the pointer addresses today. I wonder how that decision will age though, if in the future someone starts relying on actual hash table comparison by value, do you think that can silently be rendered into a bug?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The alternative seems like it would have a lot of negative performance and complexity implications. Can we just document this on the trait? I feel like it's a pseudo-internal trait that we only use for implementation reasons, I don't know that there's an intention of users creating their own implementations even if it is theoretically possible to do so.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 I think just adding a comment for readers should be enough, that way if contributors need to perform comparisons in other places of the codebase over this struct they can easily take this into account.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do. I also don't know why |
||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is hashing the pointer address necessary here? it looks exotic enough to justify a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could not hash it and let any sort of hash comparison fall back to equality on collision.
The point (a good pun with pointer?) is that we don't want to actually compare the hash maps (expensive) but if the pointer points to different hashmaps we can just assume the expression is different.