expr: propagate MFP expression fallibility through filter pushdown by antiguru · Pull Request #36721 · MaterializeInc/materialize

antiguru · 2026-05-25T01:49:47Z

Motivation

Even with the monotonicity fixes in #36702 (merged) and #36706, workload-replay was still hitting persist filter pushdown correctness violation!. The audit log on that failure (database-issues#9656, u103) showed:

mfp = MapFilterProject {
  expressions: [CastStringToUuid(Column(3, "merchant_group_id"))],
  predicates:  [(2, Not(IsNull(Column(1, "checksum"))))],
  projection: [0, 1, 2, 4, 5, 6],
  input_arity: 6
}
upper_bounds: [cast_timestamp_tz_to_mz_timestamp(Coalesce(deleted_at, '9999-12-31'))]

stats: checksum lower="bar" upper="bar"; deleted_at = 2025-06-16 (single value)
result: Err((DataflowErrorSer(143), 1779629555000, +1))

The interpreter computed:

predicate NOT IsNull(checksum) → {True} (stats say non-null).
upper-bound check 2025-06-16 >= mz_now → {False} (frontier was past 2025-06-16).
AND({True}, {False}) = {False}, fallible = false.

→ may_keep=false, may_error=false, may_skip=true → Discard the part.

But the actual MFP runtime ran the predicate (True), then evaluated the cast_string_to_uuid("merchant_group_id_a") expression (NOT a valid UUID), which errored, and the whole row was emitted as Err. The discarded part actually had an error row to emit.

Description

This is a different bug class from the monotonicity fixes in #36702 / #36706 / #36708; it's independent and can land on its own. The runtime evaluator (SafeMfpPlan::evaluate_inner) runs every MFP expression once all preceding predicates pass; any expression error propagates as the row's error result. The abstract interpreter's mfp_filter / mfp_plan_filter, however, only ANDs together the predicates and temporal bounds — so the AND result misses the fallibility of any expression whose result column isn't referenced by a predicate or bound.

This PR overrides mfp_filter and mfp_plan_filter on ColumnSpecs to set the returned summary's fallible bit if any of the MFP expressions' specs are fallible. Conservative (we'll keep parts where a predicate could-but-doesn't-have-to fail, even though predicate short-circuiting would have prevented the expression from running) but sound and matches the runtime semantics.

The default trait-level implementation (used by Trace) is unchanged, since Trace is about pushdownability rather than soundness.

Verification

New regression test interpret::tests::test_mfp_unreferenced_fallible_expression: builds an MFP with one always-erroring expression (cast_string_to_uuid("not-a-uuid")) and one always-passing predicate, asserts that the interpreter's summary may_fail(). Fails on the pre-fix code; passes on the fix.
cargo test -p mz-expr --lib — all tests pass.
Workload-replay should stop hitting the audit panic on this class of MFP (cast/parse expression columns that aren't gated by a predicate).

Cost

Any MFP with a fallible expression in its expressions list will now keep parts that would previously have been discarded by temporal-bound checks. The tighter version would only mark fallibility when the predicates' spec admits True (so the expression would actually run at runtime); that's a follow-up if the conservative version costs too much in practice.

DAlperin · 2026-05-26T16:02:43Z

Could we add a proptest alongside test_timestamp_plus_interval_dynamic_monotone to verify the monotonicity claim directly against the function impl?

The runtime MFP evaluator (SafeMfpPlan::evaluate_inner) runs every expression once all preceding predicates pass, so an expression that errors on the actual data turns the whole row into an Err. The abstract interpreter's mfp_filter / mfp_plan_filter, however, only ANDs together the predicates and temporal bounds — so the AND result misses the fallibility of any expression whose result column isn't referenced by a predicate or bound. Persist filter pushdown then discards parts that actually produce error rows, tripping the audit panic in persist_source.rs. Concretely from the audit log on database-issues#9656: expressions: [cast_string_to_uuid(merchant_group_id)] predicates: [NOT IsNull(checksum)] upper_bounds: [cast_timestamp_tz_to_mz_timestamp(coalesce(deleted_at, ...))] Stats say checksum is non-null and deleted_at is in the past. The interpreter computed AND({True}, {False}) = {False} with fallible=false and discarded the part. The actual evaluator: predicate passes, cast_string_to_uuid is evaluated next, errors on the row's merchant_group_id value, and the whole row is emitted as Err. Audit catches the discrepancy. Override mfp_filter and mfp_plan_filter on ColumnSpecs so that the returned summary's fallible bit is set if any of the MFP expressions' specs are fallible. This is conservative (we'll keep parts where a predicate could-but-doesn't-have-to fail, even though predicate short-circuiting would have prevented the expression from running), but it's sound and matches the runtime semantics. Adds a regression test that builds an MFP with one always-erroring expression and one always-passing predicate, asserts that the interpreter's summary may_fail.

antiguru mentioned this pull request May 25, 2026

expr: fix non-monotone annotations on timestamp/date/interval functions #36702

Merged

antiguru force-pushed the claude/github-issue-9656-followup-pushdown branch from 6b3a33e to ecd05d3 Compare May 26, 2026 15:55

antiguru force-pushed the claude/github-issue-9656-mfp-expr-fallibility branch from 2d99c58 to 718e095 Compare May 26, 2026 15:56

antiguru force-pushed the claude/github-issue-9656-followup-pushdown branch from ecd05d3 to 2aab955 Compare May 26, 2026 17:06

antiguru force-pushed the claude/github-issue-9656-mfp-expr-fallibility branch from 718e095 to 94aec8d Compare May 26, 2026 17:07

antiguru force-pushed the claude/github-issue-9656-mfp-expr-fallibility branch from 94aec8d to 5c7a128 Compare May 26, 2026 17:08

antiguru changed the base branch from claude/github-issue-9656-followup-pushdown to main May 26, 2026 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expr: propagate MFP expression fallibility through filter pushdown#36721

expr: propagate MFP expression fallibility through filter pushdown#36721
antiguru wants to merge 1 commit into
mainfrom
claude/github-issue-9656-mfp-expr-fallibility

antiguru commented May 25, 2026 •

edited

Loading

Uh oh!

DAlperin commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

antiguru commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Verification

Cost

Uh oh!

DAlperin commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antiguru commented May 25, 2026 •

edited

Loading