Skip to content

Port HashTableLookupExpr to use try_to_proto / try_from_proto #22435

@adriangb

Description

@adriangb

Sub-issue of #22418. Port HashTableLookupExpr to the per-expression proto pattern from #21929. This one is special: it is encode-only.

HashTableLookupExpr holds a runtime Arc<dyn JoinHashMapType> (the build-side hash table) that cannot be serialized, so try_to_proto returns lit(true). There is no try_from_proto — decode just produces a Literal.

⚠️ HashTableLookupExpr lives in datafusion-physical-plan, which does not expose the proto crates (datafusion_proto_models, datafusion_proto_common, proto_encode) in a normal build. Gate the new code with #[cfg(feature = "proto")] and make sure cargo check -p datafusion-physical-plan (no proto feature) still compiles — a previous PR broke the default build by referencing those crates unconditionally.

Where it lives

datafusion/physical-plan/src/joins/hash_join/partitioned_hash_eval.rs (crate datafusion-physical-plan).

Steps

  1. Add try_to_proto inside the impl PhysicalExpr for HashTableLookupExpr block (trait override, #[cfg(feature = "proto")]), returning lit(true) as a PhysicalExprNode.
    ⚠️ Not an inherent impl method — the serializer dispatches through &dyn PhysicalExpr, so an inherent method is silently never called and the central downcast fallback keeps doing the work.
  2. In this same PR, delete the HashTableLookupExpr arm from to_proto.rs. (No decode arm to remove — decode already yields a Literal.) Deleting the encode arm is the only proof your hook is reached.
  3. Fix the explanatory comment. The current "safe because lit(true) passes all rows" comment was disputed in review (refactor: add try_to_proto to HashTableLookupExpr #22451). The accurate story: a HashTableLookupExpr only exists during execution (built after the build side completes, placed in the dynamic filter). A plan serialized before execution contains none; on deserialize+execute the join builds a fresh one, so all pruning is preserved. A plan serialized after execution loses only the optimization, never correctness. State both cases.
  4. HashTableLookupExpr is pub: add a doc comment warning that it is not serialized, and consider filing a follow-up ticket to track that.

Tests

  • The existing roundtrip_hash_table_lookup_expr_to_lit test must still pass; update its comment to match step 3.
  • cargo test -p datafusion-proto --test proto_integration stays green.
  • #[cfg(test)] mod tests at the end of the file (clippy items-after-test-module).

Before you push

  • Fill in every section of the PR template.
  • cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo check -p datafusion-physical-plan (no proto feature) must all pass.

Checklist

  • try_to_proto (→ lit(true)) in impl PhysicalExpr for HashTableLookupExpr (not inherent), #[cfg(feature="proto")]
  • central to_proto.rs encode arm deleted in this PR
  • explanatory comment corrected (both serialize-before / serialize-after cases)
  • doc comment on the pub type noting it isn't serialized
  • default build (cargo check -p datafusion-physical-plan) still compiles
  • fmt + clippy -D warnings clean; PR template filled in

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions