Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
include seeds in debug repr, include hash map pointer in hash/equality
  • Loading branch information
adriangb committed Dec 18, 2025
commit 6ccd80b29ac876bf0d7fb17181d90c84db4ea859
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,8 @@ impl std::fmt::Debug for HashExpr {
.map(|e| e.to_string())
.collect::<Vec<_>>()
.join(", ");
write!(f, "{}({})", self.description, cols)
let (s1, s2, s3, s4) = self.seeds();
write!(f, "{}({cols}, [{s1},{s2},{s3},{s4}])", self.description)
}
}

Expand Down Expand Up @@ -258,12 +259,15 @@ impl Hash for HashTableLookupExpr {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.hash_expr.dyn_hash(state);
self.description.hash(state);
Arc::as_ptr(&self.hash_map).hash(state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is hashing the pointer address necessary here? it looks exotic enough to justify a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could not hash it and let any sort of hash comparison fall back to equality on collision.
The point (a good pun with pointer?) is that we don't want to actually compare the hash maps (expensive) but if the pointer points to different hashmaps we can just assume the expression is different.

}
}

impl PartialEq for HashTableLookupExpr {
fn eq(&self, other: &Self) -> bool {
self.hash_expr.dyn_eq(&other.hash_expr) && self.description == other.description
self.hash_expr.dyn_eq(&other.hash_expr)
&& self.description == other.description
&& Arc::ptr_eq(&self.hash_map, &other.hash_map)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like it was like this already, but want to make the question anyways:

How it is that for two HashTableLookupExpr to be equal, the pointer address of hash_map need to be the same? Shouldn't it be up to the trait JoinHashMapType implementation to decide how to perform the equality comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in theory that would be better. But given the actual real world usage I think just saying "if they're not the same hash table treat them as not being the same" is good enough. The only case this doesn't cover is where two hash tables have the same data but given how this is used in HashJoinExec that's not possible (it would be a major bug!).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks correct to compare the pointer addresses today. I wonder how that decision will age though, if in the future someone starts relying on actual hash table comparison by value, do you think that can silently be rendered into a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative seems like it would have a lot of negative performance and complexity implications. Can we just document this on the trait? I feel like it's a pseudo-internal trait that we only use for implementation reasons, I don't know that there's an intention of users creating their own implementations even if it is theoretically possible to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think just adding a comment for readers should be enough, that way if contributors need to perform comparisons in other places of the codebase over this struct they can easily take this into account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I also don't know why JoinHashMapType is public, I opened an issue in #19408

}
}

Expand Down