Skip to content

refactor: add try_to_proto to HashTableLookupExpr#22451

Merged
Dandandan merged 7 commits into
apache:mainfrom
AnuragRaut08:feat/port-hash-table-lookup-expr-proto
May 26, 2026
Merged

refactor: add try_to_proto to HashTableLookupExpr#22451
Dandandan merged 7 commits into
apache:mainfrom
AnuragRaut08:feat/port-hash-table-lookup-expr-proto

Conversation

@AnuragRaut08
Copy link
Copy Markdown

Which issue does this PR close?

Part of #22435

What changes are included?

Adds try_to_proto to HashTableLookupExpr so it participates in the
expression-local serialization pattern introduced in #21929.

HashTableLookupExpr holds a runtime Arc<Map> that cannot be
serialized, so try_to_proto replaces it with lit(true). This is
safe because the filter is a performance optimisation only — lit(true)
passes all rows and the join produces correct results either way.

The centralized arm in to_proto.rs remains as a fallback for now.
Cleanup can follow in a separate PR once this lands.

Are these changes tested?

Yes — covered by the existing roundtrip_hash_table_lookup_expr_to_lit
test in datafusion/proto/tests/cases/roundtrip_physical_plan.rs.

Are there any user-facing changes?

No.

Part of apache#22435. Adds try_to_proto to HashTableLookupExpr so it
participates in the expression-local serialization pattern introduced
in apache#21929.

HashTableLookupExpr holds a runtime Arc<Map> that cannot be serialized,
so try_to_proto replaces it with lit(true). This is safe because the
filter is a performance optimisation only — lit(true) passes all rows
and the join produces correct results either way.

The centralized arm in to_proto.rs remains as a fallback for now.
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 22, 2026
@AnuragRaut08
Copy link
Copy Markdown
Author

Thanks! CI is currently running. Please let me know if any adjustments are needed — happy to iterate quickly.

/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
pub fn try_to_proto(
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this must be moved to the impl PhysicalExpr for HashTableLookupExpr block to actually be used.

The way to test this code is being used is to delete the old paths in datafusion/datafusion/proto/src/physical_plan/to_proto.rs and datafusion/datafusion/proto/src/physical_plan/from_proto.rs.

Since this is a small PR, you may want to considering adding try_from_proto as well :)

/// - The filter is a performance optimisation, not a correctness requirement.
/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment comes from roundtrip_hash_table_lookup_expr_to_lit() in /datafusion/datafusion/proto/tests/cases/roundtrip_physical_plan.rs but I don't think it's very true.

IIUC, hash joins might build a HashTableLookupExpr during execution after the build side is done. These expressions get placed in the dynamic filter expr.

If you serialize before executing the plan, then there's no code path where there would be a HashTableLookupExpr in the plan today. If you deserialize and execute that plan, then the HashJoinExec may create a fresh HashTableLookupExpr for the dynamic filter. In this case, all the row pruning is preserved.

If you serialize after executing, then any potential HashTableLookupExpr would be replaced with lit(true). I don't think this has any impact. One must call reset_state to re-execute plans, in which case I would expect the HashTableLookupExpr to disappear. In this case, all the row pruning is preserved as well.

Maybe we can change the comment to explain these two cases? ^

Since HashTableLookupExpr is public it might be good to warn users that it does not get serialized. We could (a) file a ticket to track that HashTableLookupExpr are not serialized and (b) add a comment directly on HashTableLookupExpr.

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnuragRaut08
Thanks for working on this.
I agree with @jayshrivastava's point about moving this into the trait impl, and I also noticed a separate build issue around the proto wiring that looks blocking before merge.

/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
pub fn try_to_proto(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on Jay's comment, I think this needs to live in impl PhysicalExpr for HashTableLookupExpr rather than as an inherent HashTableLookupExpr::try_to_proto method.

The serializer calls expr.try_to_proto(&ctx) through &dyn PhysicalExpr, so this implementation is not reached. Serialization still depends on the centralized downcast fallback in datafusion/proto/src/physical_plan/to_proto.rs.

Could you move this into the PhysicalExpr impl with the same #[cfg(feature = "proto")] signature? That should make the expression-local serialization path actually take effect.

/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
pub fn try_to_proto(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also seeing a separate build issue here. This new signature references physical_expr::proto_encode, datafusion_proto_common, and datafusion_proto_models unconditionally, but datafusion-physical-plan does not expose those dependencies in a normal build.

cargo check -p datafusion-physical-plan currently fails with unresolved imports for proto_encode, datafusion_proto_common, and datafusion_proto_models.

Could you gate this consistently with the existing PhysicalExpr::try_to_proto API and wire the dependencies through the appropriate proto feature?

@github-actions github-actions Bot added the proto Related to proto crate label May 24, 2026
@AnuragRaut08
Copy link
Copy Markdown
Author

@jayshrivastava @kosiew — addressed the requested changes:

-moved try_to_proto into impl PhysicalExpr for HashTableLookupExprso serialization dispatches correctly through the trait object
-removed reliance on the centralized downcast serialization path
-wired the proto feature/dependencies through datafusion-physical-plan
-updated the serialization comment to explain the pre/post execution cases

Validated locally with:

-cargo check -p datafusion-physical-plan --features proto -cargo check -p datafusion-proto
-cargo test -p datafusion-proto roundtrip_hash_table_lookup_expr_to_lit`

The roundtrip test now passes through the trait-based serialization path.

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnuragRaut08
Thanks for the updates. No blocking findings from my follow-up review.

One small non-blocking suggestion below.

// This is safe because dynamic filtering is a performance optimisation
// only — lit(true) passes all rows so correctness is preserved.
// When a serialized plan is re-executed, HashJoinExec reconstructs
// fresh dynamic filters at runtime anyway.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is helpful and explains why lit(true) is correctness-safe. One small optional tweak would be to mirror Jay's pre-execution and post-execution wording a bit more directly, and maybe mention reset_state since that is where the runtime reconstruction behavior is tied together.

Not blocking from my side, just a readability suggestion.

@AnuragRaut08
Copy link
Copy Markdown
Author

Thanks for the suggestion @kosiew. I updated the comment to more clearly distinguish the pre-execution vs post-execution serialization behavior and added the reset_state() detail around runtime reconstruction.

@kosiew kosiew added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks May 26, 2026
@kosiew kosiew added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 26, 2026
@Dandandan Dandandan added this pull request to the merge queue May 26, 2026
Merged via the queue into apache:main with commit eb3c564 May 26, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants