Skip to content

feat(rel): IR expression lowering to DataFusion Expr#691

Merged
DecisionNerd merged 2 commits into
mainfrom
feature/574-expr-lowering
Jun 1, 2026
Merged

feat(rel): IR expression lowering to DataFusion Expr#691
DecisionNerd merged 2 commits into
mainfrom
feature/574-expr-lowering

Conversation

@DecisionNerd

@DecisionNerd DecisionNerd commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

  • Implements ExprLowerer in gf-rel/src/expr.rs — pure, I/O-free transformation from ExprArena → DataFusion Expr
  • Adds VarMap (VarId → column name) and LoweringError (UnknownFunction, UnsupportedExpr, UnboundVar)
  • Covers all IrLiteral variants (Null, Bool, Int, Float, Str, Duration, DateTime)
  • All BinaryOpKind variants: comparison, logical, arithmetic (Add/Sub/Mul/Div/Mod/Pow), collection (In), string predicates (StartsWith/EndsWith/Contains/RegexMatch)
  • All UnaryOpKind variants: Not, Neg, IsNull, IsNotNull
  • FunctionCall via a built-in lookup table: toUpper/toLower/trim/ltrim/rtrim/concat, toString/toInteger/toFloat/toBoolean, abs/ceil/floor/round/sqrt/power, char_length
  • ParameterExpr::Placeholder; Case → DataFusion Case expr; ListLiteral/MapLiteralUnsupportedExpr (deferred)
  • 18 unit tests
  • ListLiteral is stubbed as UnsupportedExpr — full DataFusion array support deferred to rust: lower NodeScan and fixed-hop Expand to DataFusion LogicalPlan #576

Closes #574

🤖 Generated with Claude Code

Note

Add IR expression lowering to DataFusion Expr in gf-rel

  • Introduces ExprLowerer in crates/gf-rel/src/expr.rs to convert IR expressions from an ExprArena into executable DataFusion Expr values.
  • Supports literals, variable references, property access, binary/unary ops, CASE expressions, parameters, and a set of Cypher-style built-ins (string, cast, math functions).
  • Adds VarMap to map IR VarIds to DataFusion column name strings, and LoweringError for unbound variables, unknown functions, and unsupported IR variants.
  • Re-exports ExprLowerer, VarMap, and LoweringError from the gf-rel crate root for downstream consumers.
  • ListLiteral and MapLiteral are not supported and return UnsupportedExpr errors.

Macroscope summarized 110ea62.

Summary by CodeRabbit

  • New Features

    • Add end-to-end expression translation so query expressions (literals, variables, property access, unary/binary ops, function calls, parameters/placeholders, and CASE) are now converted into executable query expressions.
    • Better handling and reporting for unknown functions, unsupported constructs, and unbound variables.
  • Documentation

    • Milestone status noted to reflect progress on expression lowering and query planning.

- Add ExprLowerer<'a> with lower(ExprId) -> Result<DfExpr, LoweringError>
- Add VarMap: VarId → DataFusion column name string
- Add LoweringError: UnknownFunction, UnsupportedExpr, UnboundVar
- Mappings: all IrLiteral variants, VarRef, PropertyAccess, all
  BinaryOpKind variants (including Mod/Pow/In/StringOps/RegexMatch),
  all UnaryOpKind variants, FunctionCall (built-in table), Parameter
  (Expr::Placeholder), Case, ListLiteral stub, MapLiteral stub
- Built-in function table: toUpper/toLower/trim/concat, toString/
  toInteger/toFloat, abs/ceil/floor/round/sqrt/power, char_length
- Add datafusion + gf-ontology deps to gf-rel
- 18 unit tests covering all variants, compound predicates, unbound
  variable error, unknown function error

Closes #574

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Implements ExprLowerer in gf-rel to lower GF IR ExprArena expressions to DataFusion Exprs, adds VarMap and LoweringError, wires the module and dependencies, and includes comprehensive unit tests for literals, variables, operators, functions, parameters, and case expressions.

Changes

Expression Lowering Implementation

Layer / File(s) Summary
Wiring and exports
crates/gf-rel/Cargo.toml, crates/gf-rel/src/lib.rs
Adds gf-ontology path dependency and ensures datafusion uses workspace configuration. Declares expr module and re-exports ExprLowerer, LoweringError, and VarMap.
Type contracts and error types
crates/gf-rel/src/expr.rs
Adds VarMap mapping VarId → column name, LoweringError enum (UnknownFunction, UnsupportedExpr, UnboundVar), and ExprLowerer<'a> with constructor that prepares ontology-aware property names.
Core lowering and helpers
crates/gf-rel/src/expr.rs
Implements ExprLowerer::lower() and helpers: lower_binary, lower_unary, lower_case, resolve_prop_col, build_prop_names, lower_literal, and resolve_builtin mapping Cypher-like builtins to DataFusion scalar functions or casts. Emits errors for unsupported constructs like list/map literals and unknown functions.
Unit tests and scaffolding
crates/gf-rel/src/expr.rs
Adds test helper make_lowerer and unit tests covering literal lowering, var binding/unbound errors, unary/binary operators, compound predicates, function-call resolution and unknown-function error, and parameter placeholder lowering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive PR description provides comprehensive overview of changes, objectives, and implementation details, but lacks formal completion of the description template. Complete the PR description template by filling in Type of Change, Related Issues, Testing, and Checklist sections to meet repository standards.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(rel): IR expression lowering to DataFusion Expr' accurately and concisely summarizes the main feature added—conversion of IR expressions to DataFusion expressions.
Linked Issues check ✅ Passed The implementation successfully delivers all core requirements from #574: ExprLowerer with arena/ontology/var_map initialization, lower() method returning DfExpr, VarMap mapping, LoweringError enum, and 18 comprehensive unit tests covering literals, operators, functions, and compound predicates.
Out of Scope Changes check ✅ Passed All changes are in scope: Cargo.toml adds workspace dependencies (gf-ontology, datafusion), expr.rs implements the lowering logic specified in #574, and lib.rs exports the new public API. ListLiteral/MapLiteral deferral to #576 is intentional and documented.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/574-expr-lowering

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.08%. Comparing base (a27f8f5) to head (110ea62).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #691   +/-   ##
=======================================
  Coverage   97.08%   97.08%           
=======================================
  Files           2        2           
  Lines         274      274           
  Branches       41       41           
=======================================
  Hits          266      266           
  Misses          5        5           
  Partials        3        3           
Flag Coverage Δ
full-coverage 97.08% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
parser ∅ <ø> (∅)
planner ∅ <ø> (∅)
executor ∅ <ø> (∅)
storage ∅ <ø> (∅)
ast ∅ <ø> (∅)
types ∅ <ø> (∅)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a27f8f5...110ea62. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/gf-rel/src/expr.rs (1)

152-160: 💤 Low value

Comment doesn't match behavior.

The comment says elements are encoded "as a JSON-string literal for now," but the arm actually lowers the elements only to discard them and return UnsupportedExpr. Consider correcting the comment to reflect that this is a validation-then-reject path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gf-rel/src/expr.rs` around lines 152 - 160, The comment for the
IrExpr::ListLiteral arm is inaccurate: it claims elements are encoded "as a
JSON-string literal for now" while the code actually lowers each element (via
self.lower on ids), discards the result (elems?), then returns
LoweringError::UnsupportedExpr; update the comment to state this is performing
validation (lowering each id to ensure subexpressions are valid) and then
intentionally rejecting ListLiteral with an UnsupportedExpr error (reference
IrExpr::ListLiteral, ids, self.lower, elems, and LoweringError::UnsupportedExpr)
so the comment matches the implemented behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/gf-rel/src/expr.rs`:
- Around line 152-160: The comment for the IrExpr::ListLiteral arm is
inaccurate: it claims elements are encoded "as a JSON-string literal for now"
while the code actually lowers each element (via self.lower on ids), discards
the result (elems?), then returns LoweringError::UnsupportedExpr; update the
comment to state this is performing validation (lowering each id to ensure
subexpressions are valid) and then intentionally rejecting ListLiteral with an
UnsupportedExpr error (reference IrExpr::ListLiteral, ids, self.lower, elems,
and LoweringError::UnsupportedExpr) so the comment matches the implemented
behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cbc22d80-62d3-4df9-82b5-37ff11b13e87

📥 Commits

Reviewing files that changed from the base of the PR and between a27f8f5 and ac31c35.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
📒 Files selected for processing (3)
  • crates/gf-rel/Cargo.toml
  • crates/gf-rel/src/expr.rs
  • crates/gf-rel/src/lib.rs

Comment thread crates/gf-rel/src/expr.rs Outdated
Comment on lines +332 to +334
"size" | "length" | "char_length" => Some(
datafusion::functions::unicode::expr_fn::char_length(a.remove(0)),
),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect how size/length are produced in the IR to confirm argument typing.
rg -nP -C3 '\b(size|length)\b' --type=rust -g '*ir*' -g '!**/gf-rel/**'

Repository: DecisionNerd/graphforge

Length of output: 696


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find other occurrences of the "size"/"length" function mapping
rg -nP '\"(size|length)\"' crates/gf-rel -S

# 2) Inspect the implementation around gf-rel/src/expr.rs where the mapping exists
sed -n '300,380p' crates/gf-rel/src/expr.rs

# 3) See how `char_length` is used elsewhere (possible evidence of string-only routing)
rg -nP 'char_length\(' -S crates/gf-rel

# 4) Locate IR/function construction sites for openCypher functions `size`/`length`
rg -nP '\b(size|length)\b' --type=rust -S crates | head -n 200

# 5) If there are type-checking passes/IR nodes, search for "String"/"List"/"Path" in the same area
rg -nP 'String|list|path' crates/gf-rel -S | head -n 200

Repository: DecisionNerd/graphforge

Length of output: 5513


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where resolve_builtin is called (to see what IR/type constraints exist)
rg -n "resolve_builtin\(" crates/gf-rel/src/expr.rs

# Inspect the surrounding code paths where built-ins are lowered
sed -n '240,420p' crates/gf-rel/src/expr.rs

# Inspect the expression lowerer for any type checks/casts around function calls
sed -n '420,650p' crates/gf-rel/src/expr.rs

# Search in gf-ir (or plan/lowering) for the built-in function typing rules
rg -n "size\"|\"length\"|\"char_length" crates/gf-ir crates/gf-plan crates/gf-core crates/gf-rel -S

Repository: DecisionNerd/graphforge

Length of output: 15112


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect how FunctionCall lowering happens around the earlier resolve_builtin call.
sed -n '1,220p' crates/gf-rel/src/expr.rs

# Locate the IrExpr::FunctionCall definition (and any type info attached).
rg -n "enum IrExpr|IrExpr::FunctionCall|FunctionCall" crates/gf-ir -S

# Inspect gf-ir expr module around FunctionCall.
rg -n "FunctionCall" crates/gf-ir/src -S

Repository: DecisionNerd/graphforge

Length of output: 9898


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n --hidden --no-ignore -S 'char_length|\"size\"|\"length\"' crates/gf-ir crates/gf-plan crates/gf-ast crates/gf-cypher

Repository: DecisionNerd/graphforge

Length of output: 177


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n --hidden --no-ignore -S '"size"' crates/gf-ir crates/gf-plan crates/gf-cypher
rg -n --hidden --no-ignore -S '"length"' crates/gf-ir crates/gf-plan crates/gf-cypher
rg -n --hidden --no-ignore -S 'char_length' crates/gf-ir crates/gf-plan crates/gf-cypher
rg -n --hidden --no-ignore -S 'FunctionCall \{[^}]*name' crates/gf-ir/src/binder.rs
sed -n '520,620p' crates/gf-ir/src/binder.rs

Repository: DecisionNerd/graphforge

Length of output: 49


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Inspect the binder section that handles FunctionCall lowering (for any built-in remapping)
ls -la crates/gf-ir/src/binder.rs
sed -n '480,620p' crates/gf-ir/src/binder.rs

# 2) Search for openCypher built-in names in gf-ir tests/golden IR snapshots
rg -n -S -F '"size"' crates/gf-ir/tests crates/gf-*/tests || true
rg -n -S -F '"length"' crates/gf-ir/tests crates/gf-*/tests || true
rg -n -S -F 'char_length' crates/gf-ir/tests crates/gf-*/tests || true

# 3) Search in generated/golden snapshots for FunctionCall names
rg -n -S 'FunctionCall' crates/gf-ir/tests/ir_goldens || true
rg -n -S -F 'name: "size"' crates/gf-ir/tests crates/gf-*/tests || true
rg -n -S -F 'name: "length"' crates/gf-ir/tests crates/gf-*/tests || true

# 4) Search parser/AST handling for these function names
rg -n -S -F 'size' crates/gf-cypher/src crates/gf-ast/src || true
rg -n -S -F 'length' crates/gf-cypher/src crates/gf-ast/src || true
rg -n -S 'char_length' crates/gf-cypher crates/gf-ast || true

Repository: DecisionNerd/graphforge

Length of output: 8863


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect binder around FunctionCall handling (likely earlier in the file than the snippet we saw)
rg -n "Expr::FunctionCall\\(FunctionCall" crates/gf-ir/src/binder.rs
sed -n '400,620p' crates/gf-ir/src/binder.rs

# Direct string searches for built-in names across gf-ir / gf-cypher / gf-plan
rg -n -S -F 'char_length' crates/gf-ir crates/gf-plan crates/gf-cypher || true
rg -n -S -F '"size"' crates/gf-ir crates/gf-plan crates/gf-cypher || true
rg -n -S -F '"length"' crates/gf-ir crates/gf-plan crates/gf-cypher || true
rg -n -S -F 'size(' crates/gf-ir crates/gf-plan crates/gf-cypher || true
rg -n -S -F 'length(' crates/gf-ir crates/gf-plan crates/gf-cypher || true

# Look for any central built-in function / signature dispatch tables
rg -n -S -i 'builtin|built-in|function signature|signature|resolve_function|FunctionCall' crates/gf-ir/src crates/gf-plan/src crates/gf-rel/src | head -n 200

Repository: DecisionNerd/graphforge

Length of output: 9695


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where FunctionCall is handled outside gf-rel/expr.rs
rg -n "FunctionCall" crates/gf-plan crates/gf-*/src -S

# Look for built-in function name literals in the repo
rg -n -S -F '"size"' crates || true
rg -n -S -F '"length"' crates || true
rg -n -S -F 'char_length' crates || true

# Look for any function signature/type checking tables that might mention "size"/"length"/"char_length"
rg -n -S -i "builtin|built-in|function signature|signature|typecheck|type check|validate|semantics|cypher.*function" crates/gf-* || true

Repository: DecisionNerd/graphforge

Length of output: 7848


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look for special-casing of built-in functions in gf-cypher parser/binder
rg -n -S -i '\bsize\b' crates/gf-cypher/src crates/gf-ir/src || true
rg -n -S -i '\blength\b' crates/gf-cypher/src crates/gf-ir/src || true
rg -n -S -i 'char_length' crates/gf-cypher/src crates/gf-ir/src || true

# Also inspect gf-cypher parser expr around any FunctionCall handling that could rewrite size/length
rg -n -S 'FunctionCall' crates/gf-cypher/src/parser/expr.rs

Repository: DecisionNerd/graphforge

Length of output: 1339


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect gf-ir FunctionCall definition for any type/signature info
sed -n '280,380p' crates/gf-ir/src/expr.rs

# Inspect gf-ast FunctionCall struct for any typing hints
sed -n '540,640p' crates/gf-ast/src/ast.rs

# Inspect binder around the FunctionCall lowering site (approx where fn_name is joined)
sed -n '420,590p' crates/gf-ir/src/binder.rs

# Search for any typechecking/validation pass mentioning FunctionCall or built-in functions
rg -n -S -i 'typecheck|type check|signature|built-in|builtin|function call|FunctionCall' crates/gf-ir/src crates/gf-plan/src crates/gf-core/src || true

# Also search in gf-plan for any DataType/Uft8 usage
rg -n -S 'DataType::|Utf8' crates/gf-plan/src crates/gf-ir/src crates/gf-core/src crates/gf-rel/src || true

Repository: DecisionNerd/graphforge

Length of output: 14797


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Any tests or special-cases in gf-rel for size/length/char_length
rg -n -S -F '"size"' crates/gf-rel/src/expr.rs crates/gf-rel/src || true
rg -n -S -F '"length"' crates/gf-rel/src/expr.rs crates/gf-rel/src || true
rg -n -S -F 'char_length' crates/gf-rel/src || true

# 2) Search for any semantic/typechecking around FunctionCall in gf-ir/gf-plan
rg -n -S -i 'functioncall.*type|type.*functioncall|typecheck.*function|built-in.*function|signature.*function' crates/gf-ir/src crates/gf-plan/src crates/gf-core/src || true

# 3) Find any occurrence of size/length as raw function names in code (without requiring quotes)
rg -n -S -i '\bsize\s*\(' crates/gf-* || true
rg -n -S -i '\blength\s*\(' crates/gf-* || true
rg -n -S -i '\bchar_length\s*\(' crates/gf-* || true

Repository: DecisionNerd/graphforge

Length of output: 769


🌐 Web query:

DataFusion unicode expr_fn char_length expected input type

💡 Result:

In Apache DataFusion, the char_length function (also available as character_length and length) is designed to operate on string expressions [1][2]. The expected input type for these functions is a string-compatible type, primarily Utf8 or LargeUtf8 [1][3][2]. Key points regarding the input type and usage include: 1. Input Requirement: The function expects a text/string expression [1][2]. Attempting to use it on non-string types (such as binary types) without explicit casting may result in errors or unexpected behavior, as the function is specifically intended to count the number of characters in a string, not the number of bytes in a binary buffer [1][4][2]. 2. Function Aliases: char_length, character_length, and length are functional equivalents in DataFusion and perform the same operation [2]. 3. Implementation: As of recent versions, these functions are maintained within the unicode module of the datafusion-functions crate [5][6][7]. 4. Best Practice for Non-String Data: If you need to calculate the length of a binary (bytea) field, it is recommended to explicitly cast the input to text (e.g., URL::TEXT) if you require character length, or use octet_length if you require the byte count [4].

Citations:


Fix Cypher size/length lowering to be type-aware

crates/gf-rel/src/expr.rs currently maps "size" | "length" | "char_length" to DataFusion unicode::expr_fn::char_length(...) in resolve_builtin, and ExprLowerer forwards IrExpr::FunctionCall arguments without any typing/casts. This makes size()/length() incorrect for non-string inputs (e.g. Cypher collections/paths where size/length are element-count semantics) and can produce DataFusion type errors because char_length is defined for string (Utf8/LargeUtf8) expressions. Implement a type-aware dispatch: keep char_length for strings, and lower size/length to the appropriate collection/path-length semantics.

- Split char_length (string-only, keep mapped) from size/length
  (type-polymorphic: strings OR lists — defer to M13 type inference
  rather than silently mapping to char_length and producing DataFusion
  type errors on collection inputs)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@DecisionNerd

Copy link
Copy Markdown
Owner Author

Fixes Applied Successfully

Fixed 1 file based on 1 CodeRabbit feedback item.

Files modified:

  • crates/gf-rel/src/expr.rs

Change: Split the size/length mapping from char_length. char_length/character_length remain mapped to DataFusion's unicode char_length (unambiguously string). size()/length() now fall through to the wildcard arm (→ LoweringError::UnknownFunction) since they're type-polymorphic in openCypher and cannot be safely dispatched without M13 type inference.

Commit: 110ea62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rust: implement IR expression lowering to DataFusion Expr (gf-rel)

1 participant