refactor: port InListExpr to use try_to_proto/try_from_proto hooks#22503
Conversation
ba148c1 to
6c749f3
Compare
- Implement PhysicalExpr::try_to_proto for InListExpr - Implement InListExpr::try_from_proto inherent method - Wire hooks in from_proto.rs and remove legacy logic in to_proto.rs - Add comprehensive unit tests with mocks to verify hooks in isolation - Pass all unit and integration tests with clean clippy
6c749f3 to
13308a8
Compare
|
I've refactored the unit tests in in_list.rs to align with the new testing patterns introduced in #22471 by @adriangb, utilizing the shared StubEncoder/StubDecoder utilities for Protobuf hook verification. I've also added several new test cases:
|
There was a problem hiding this comment.
Pull request overview
Ports InListExpr physical expression protobuf (de)serialization to the newer per-expression try_to_proto / try_from_proto hooks, continuing the ongoing effort to remove centralized downcast-based serialization logic from datafusion-proto.
Changes:
- Removed the centralized
InListExprdowncast arm fromdatafusion/proto/src/physical_plan/to_proto.rs. - Routed
ExprType::InListdecoding throughInListExpr::try_from_protoinfrom_proto.rs. - Added
InListExpr::try_to_proto,InListExpr::try_from_proto, and isolated unit tests using stub encode/decode contexts.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| datafusion/proto/src/physical_plan/to_proto.rs | Drops the centralized InListExpr serialization arm so the hook is exercised. |
| datafusion/proto/src/physical_plan/from_proto.rs | Dispatches ExprType::InList decoding to InListExpr::try_from_proto. |
| datafusion/physical-expr/src/expressions/in_list.rs | Implements the new proto hooks for InListExpr and adds targeted hook unit tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[cfg(all(test, feature = "proto"))] | ||
| mod proto_tests { | ||
| use super::*; |
| parse_required_physical_expr( | ||
| e.expr.as_deref(), | ||
| ctx, | ||
| "expr", | ||
| input_schema, | ||
| proto_converter, | ||
| )?, | ||
| parse_physical_exprs(&e.list, ctx, input_schema, proto_converter)?, |
There was a problem hiding this comment.
I wonder if we should port some of these utility methods onto PhysicalExprEncodeCtx:
parse_required_physical_exprparse_physical_exprsserialize_physical_exprs
?
…st/Like (apache#22513) ## Which issue does this PR close? - Part of apache#22418 (decentralizing `datafusion-proto` serialization onto the expressions themselves; follows apache#21835, apache#22471, apache#22503). ## Rationale for this change Expressions migrating to the `try_to_proto` / `try_from_proto` hooks keep hand-rolling the same boilerplate that the central `datafusion-proto` match already factors out via `parse_physical_exprs`, `parse_required_physical_expr`, and `serialize_physical_exprs`. Those free functions can't be reused by expression authors: they take `PhysicalProtoConverterExtension` / `PhysicalPlanDecodeContext`, which the `PhysicalExprEncodeCtx` / `PhysicalExprDecodeCtx` surfaces deliberately hide. This was raised in review on apache#22503 — rather than re-implement the list maps and "missing required field" checks in every migrated expression, expose the same shapes on the ctx structs. ## What changes are included in this PR? - **`datafusion-physical-expr-common`**: three thin convenience methods, built on the existing `encode_child` / `decode` primitives: - `PhysicalExprEncodeCtx::encode_children_expressions` - `PhysicalExprDecodeCtx::decode_required_expression` (also standardizes the `Missing required field "<name>"` error so each expression no longer spells its own) - `PhysicalExprDecodeCtx::decode_children_expressions` - **`datafusion-physical-expr`**: adopt them in `InListExpr` and `LikeExpr`, removing the hand-rolled list maps and per-field `ok_or_else` checks. ### Behavior note `decode_required_expression` couples the presence check with the decode, so `LikeExpr` now decodes children left-to-right rather than validating both required fields up front. The end result is unchanged (a missing required field still errors), but a present sibling is decoded before a later missing field is reported. The `try_from_proto_rejects_missing_pattern` unit test is updated to reflect this. ## Are these changes tested? Yes — covered by existing tests, no new ones needed: - The isolated `proto_tests` modules in `in_list.rs` and `like.rs` already exercise all three helpers (list encode/decode, required decode, and the missing-field + child-error paths) through the migrated `try_to_proto` / `try_from_proto`. - The `datafusion-proto` round-trip integration tests (`roundtrip_inlist`, `roundtrip_like`, `roundtrip_filter_with_not_and_in_list`, `test_tpch_part_in_list_query_with_real_parquet_data`, etc.) continue to pass. - `cargo clippy --all-targets --features proto -D warnings` is clean on the touched crates; `cargo fmt --all` applied. ## Are there any user-facing changes? No. The per-expression `missing required field` error wording is preserved: `LikeExpr` is byte-for-byte identical, and `InList` only changes its label from `InList` to `InListExpr` (the actual type name). The format now lives in one place (`decode_required_expression`) instead of being hand-spelled per expression. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…expressions Ports the existing `try_to_proto` / `try_from_proto` implementations onto the helpers introduced in the previous commit (`expect_expr_variant!`) and the wider use of `decode_required_expression` / `decode_children_expressions` / `encode_children_expressions` from apache#22513. Covers every expression already migrated under apache#22418: - `Column`, `BinaryExpr` (originally apache#21929) - `LikeExpr` (apache#22471) - `InListExpr` (apache#22503) - `NegativeExpr` (apache#22483) `BinaryExpr` additionally switches its `l`/`r` legacy-decode arms to `decode_required_expression`, removing two more hand-rolled "missing required field" strings, and runs its `operands` encode/ decode loops through `encode_children_expressions` / `decode_children_expressions`. One existing test changes assertion text — `InListExpr`'s rejected- variant message was the only one using the article "an" instead of the macro's article-free "a"; updated to match. No wire-format change; `cargo test -p datafusion-proto --test proto_integration` is green (173 / 173).
…hapes in apache#22418, port already-migrated exprs (apache#22596) ## Which issue does this PR close? Part of apache#22418 / follow-up to apache#22513. ## Rationale for this change Every PR migrating a `PhysicalExpr` to `try_to_proto` / `try_from_proto` under apache#22418 re-implements the same two shapes that don't fit the existing helpers from apache#22513: 1. The outer `match &node.expr_type { ... }` that opens every `try_from_proto`: ```rust let try_cast = match &node.expr_type { Some(protobuf::physical_expr_node::ExprType::TryCast(x)) => x.as_ref(), _ => return internal_err!("PhysicalExprNode is not a TryCastExpr"), }; ``` 2. The hand-rolled "missing required field 'X'" error for non-expression fields like `arrow_type` on `CastExpr` / `TryCastExpr`. Each shape leaks across the 7+ remaining open migration PRs. Adding small helpers in `physical-expr-common` keeps the per-expression diff minimal and the error messages consistent. ## What changes are included in this PR? **Commit 1 — `feat(physical-expr-common): add proto helpers ...`** Two new helpers in `datafusion-physical-expr-common`, both gated on `feature = "proto"`: - `expect_expr_variant!` macro (re-exported at crate root) — matches `Option<ExprType>`, returns inner payload, errors with `"PhysicalExprNode is not a {variant}"`. - `proto_decode::require_proto_field<T>(opt, expr_name, field)` — mirrors `decode_required_expression` for non-`PhysicalExprNode` fields. Five unit tests cover the helpers (success + the two reject paths for the macro). **Commit 2 — `refactor(physical-expr): adopt new proto helpers in already-migrated expressions`** Ports every expression already on the new hooks: - `Column`, `BinaryExpr` (originally apache#21929) - `LikeExpr` (apache#22471) - `InListExpr` (apache#22503) - `NegativeExpr` (apache#22483) `BinaryExpr` additionally adopts `decode_required_expression` for its legacy `l`/`r` arms and `encode_children_expressions` / `decode_children_expressions` for the linearized `operands` path, removing two more hand-rolled "missing required field" strings. One existing test changes assertion text — `InListExpr`'s rejected-variant message was the only one using the article "an" instead of "a"; the macro emits article-free "a {Variant}" uniformly. The two commits are stacked for review: commit 1 is the helper addition only; commit 2 is the adoption. Either can be reviewed in isolation. ## Are these changes tested? Yes: - `cargo test -p datafusion-physical-expr-common --features proto` — new helper unit tests pass. - `cargo test -p datafusion-physical-expr --features proto proto_tests` — 23 / 23 per-expression proto tests pass (1 assertion-string update in InList). - `cargo test -p datafusion-proto --test proto_integration` — 173 / 173 pass; no wire-format change. - `cargo clippy --all-targets --all-features -- -D warnings` clean on the touched crates. ## Are there any user-facing changes? No. New API surface in `datafusion-physical-expr-common` (helpers gated on `feature = "proto"`); no change to serialized output. The macro `expect_expr_variant!` is exported at the crate root.
Which issue does this PR close?
InListExprto usetry_to_proto/try_from_proto#22425Rationale for this change
This PR migrates InListExpr to use the new try_to_proto and try_from_proto hooks, following the modular serialization architecture established in #21835 . This moves serialization logic into the expression itself, improving encapsulation and decentralizing the datafusion-proto logic as part of the broader effort in #22418
What changes are included in this PR?
PhysicalExpr::try_to_proto for InListExpr.InListExpr::try_from_protoinherent method.from_proto.rsand removed the central downcast arm in to_proto.rs.in_list.rsusing mock drivers to verify roundtrips and error handling.Are these changes tested?
Yes, these changes are covered by both new and existing tests:
proto_teststo in_list.rs using mock drivers to verifytry_to_protoandtry_from_protoin isolation. These cover successful roundtrips, incorrect node types, and missing required fields.clippy --all-targets --all-featureswith zero warnings.Are there any user-facing changes?
No.