expr: propagate MFP expression fallibility through filter pushdown#36721
Draft
antiguru wants to merge 1 commit into
Draft
expr: propagate MFP expression fallibility through filter pushdown#36721antiguru wants to merge 1 commit into
antiguru wants to merge 1 commit into
Conversation
6b3a33e to
ecd05d3
Compare
2d99c58 to
718e095
Compare
Member
|
Could we add a proptest alongside test_timestamp_plus_interval_dynamic_monotone to verify the monotonicity claim directly against the function impl? |
ecd05d3 to
2aab955
Compare
718e095 to
94aec8d
Compare
The runtime MFP evaluator (SafeMfpPlan::evaluate_inner) runs every
expression once all preceding predicates pass, so an expression that
errors on the actual data turns the whole row into an Err. The
abstract interpreter's mfp_filter / mfp_plan_filter, however, only
ANDs together the predicates and temporal bounds — so the AND result
misses the fallibility of any expression whose result column isn't
referenced by a predicate or bound. Persist filter pushdown then
discards parts that actually produce error rows, tripping the audit
panic in persist_source.rs.
Concretely from the audit log on database-issues#9656:
expressions: [cast_string_to_uuid(merchant_group_id)]
predicates: [NOT IsNull(checksum)]
upper_bounds: [cast_timestamp_tz_to_mz_timestamp(coalesce(deleted_at, ...))]
Stats say checksum is non-null and deleted_at is in the past. The
interpreter computed AND({True}, {False}) = {False} with fallible=false
and discarded the part. The actual evaluator: predicate passes,
cast_string_to_uuid is evaluated next, errors on the row's
merchant_group_id value, and the whole row is emitted as Err. Audit
catches the discrepancy.
Override mfp_filter and mfp_plan_filter on ColumnSpecs so that the
returned summary's fallible bit is set if any of the MFP expressions'
specs are fallible. This is conservative (we'll keep parts where a
predicate could-but-doesn't-have-to fail, even though predicate
short-circuiting would have prevented the expression from running), but
it's sound and matches the runtime semantics.
Adds a regression test that builds an MFP with one always-erroring
expression and one always-passing predicate, asserts that the
interpreter's summary may_fail.
94aec8d to
5c7a128
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Even with the monotonicity fixes in #36702 (merged) and #36706, workload-replay was still hitting
persist filter pushdown correctness violation!. The audit log on that failure (database-issues#9656,u103) showed:The interpreter computed:
NOT IsNull(checksum)→{True}(stats say non-null).2025-06-16 >= mz_now→{False}(frontier was past 2025-06-16).AND({True}, {False}) = {False},fallible = false.→
may_keep=false, may_error=false, may_skip=true→ Discard the part.But the actual MFP runtime ran the predicate (True), then evaluated the
cast_string_to_uuid("merchant_group_id_a")expression (NOT a valid UUID), which errored, and the whole row was emitted asErr. The discarded part actually had an error row to emit.Description
This is a different bug class from the monotonicity fixes in #36702 / #36706 / #36708; it's independent and can land on its own. The runtime evaluator (
SafeMfpPlan::evaluate_inner) runs every MFP expression once all preceding predicates pass; any expression error propagates as the row's error result. The abstract interpreter'smfp_filter/mfp_plan_filter, however, only ANDs together the predicates and temporal bounds — so the AND result misses the fallibility of any expression whose result column isn't referenced by a predicate or bound.This PR overrides
mfp_filterandmfp_plan_filteronColumnSpecsto set the returned summary'sfalliblebit if any of the MFP expressions' specs are fallible. Conservative (we'll keep parts where a predicate could-but-doesn't-have-to fail, even though predicate short-circuiting would have prevented the expression from running) but sound and matches the runtime semantics.The default trait-level implementation (used by
Trace) is unchanged, sinceTraceis about pushdownability rather than soundness.Verification
interpret::tests::test_mfp_unreferenced_fallible_expression: builds an MFP with one always-erroring expression (cast_string_to_uuid("not-a-uuid")) and one always-passing predicate, asserts that the interpreter's summarymay_fail(). Fails on the pre-fix code; passes on the fix.cargo test -p mz-expr --lib— all tests pass.Cost
Any MFP with a fallible expression in its
expressionslist will now keep parts that would previously have been discarded by temporal-bound checks. The tighter version would only mark fallibility when the predicates' spec admitsTrue(so the expression would actually run at runtime); that's a follow-up if the conservative version costs too much in practice.