Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions misc/python/materialize/parallel_workload/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,6 @@ def _missing_(cls, value):
ADDITIONAL_SYSTEM_PARAMETER_DEFAULTS = {
# Uses a lot of memory, hard to predict how much
"memory_limiter_interval": "0",
# TODO: Remove when https://github.com/MaterializeInc/database-issues/issues/9656 is fixed
"persist_stats_filter_enabled": "false",
# See https://materializeinc.slack.com/archives/CTESPM7FU/p1758195280629909, should reenable when it performs better
"enable_compute_logical_backpressure": "false",
# Allows the `Scenario.RepeatRow` scenario to call `repeat_row`. Having
Expand Down
140 changes: 140 additions & 0 deletions src/expr/src/interpret.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1763,6 +1763,146 @@ mod tests {
assert!(range_out.may_contain(Datum::Null));
}

/// Regression test for database-issues#9656.
///
/// Adding an `Interval` to a `Timestamp` is non-monotone in the interval
/// argument: the lex order of intervals (months, days, micros) does not
/// respect calendar-month arithmetic with day-clamping. The interpreter
/// must therefore not assume monotonicity, otherwise persist filter
/// pushdown can incorrectly conclude that a part has no matching rows.
#[mz_ore::test]
#[cfg_attr(miri, ignore)]
fn test_add_timestamp_interval_non_monotone() {
use chrono::NaiveDateTime;
use mz_repr::adt::interval::Interval;
use mz_repr::adt::timestamp::CheckedTimestamp;
use mz_repr::{Datum, Row};

let arena = RowArena::new();

// The setup: a timestamp literal `t = 2024-01-31 00:00:00`, and an
// interval column whose stats-range spans
// `[{0 months, 31 days, 0 us}, {1 month, 0 days, 0 us}]`. In lex order,
// the 31-day interval is the lower bound and the 1-month interval is
// the upper bound. The function values at the endpoints are:
// t + {0,31,0} = 2024-03-02
// t + {1, 0,0} = 2024-02-29
// But an *interior* interval like {0, 60, 0} maps to 2024-03-31, which
// lies far outside `[Feb 29, Mar 2]`. Under the (incorrect) monotone
// assumption, the interpreter would conclude the output is in that
// narrow window, and rule out predicates like `>= 2024-03-15`.
let ts_lit = |s: &str| {
let mut row = Row::default();
row.packer().push(Datum::Timestamp(
CheckedTimestamp::from_timestamplike(
NaiveDateTime::parse_from_str(s, "%Y-%m-%dT%H:%M:%S").unwrap(),
)
.unwrap(),
));
MirScalarExpr::Literal(Ok(row), ReprScalarType::Timestamp.nullable(false))
};
let interval = |months: i32, days: i32, micros: i64| {
Datum::Interval(Interval {
months,
days,
micros,
})
};

// Expression: `(timestamp_lit + interval_col) >= 2024-03-15`.
let expr = ts_lit("2024-01-31T00:00:00")
.call_binary(MirScalarExpr::column(0), AddTimestampInterval)
.call_binary(ts_lit("2024-03-15T00:00:00"), Gte);

let relation = ReprRelationType::new(vec![ReprScalarType::Interval.nullable(false)]);
let mut interpreter = ColumnSpecs::new(&relation, &arena);
interpreter.push_column(
0,
ResultSpec::value_between(interval(0, 31, 0), interval(1, 0, 0)),
);

let range_out = interpreter.expr(&expr).range;
// The actual data may include e.g. `{0, 60, 0}` → 2024-03-31, which
// satisfies `>= 2024-03-15`. The interpreter must admit `True` so that
// filter pushdown does not skip the part. Under the buggy
// `(true, true)` annotation, the output range would be
// `[Feb 29, Mar 2]`, all of which is `< Mar 15`, and the interpreter
// would (wrongly) admit only `False`.
assert!(
range_out.may_contain(Datum::True),
"interpreter incorrectly ruled out matching rows; \
add_timestamp_interval is not monotone in the interval argument",
);
}

/// Regression test for `date_bin_timestamp`, which is non-monotone in the
/// `stride` argument: a larger stride can bin a source timestamp to an
/// *earlier* result than a smaller stride, because the bin alignment to
/// the unix epoch depends on the stride magnitude rather than on lex order.
#[mz_ore::test]
#[cfg_attr(miri, ignore)]
fn test_date_bin_timestamp_non_monotone() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it actually fail without the PR? The QA LLM review claims not, but I didn't verify:

MEDIUM — test_date_bin_timestamp_non_monotone is a tautology and does not actually regression-test the fix

File: src/expr/src/interpret.rs:1844-1903

The test sets up date_bin(stride_col, 2024-01-01 12:00:00) >= 2024-01-01 00:00:00 with stride_col ∈ [{0,1,0}, {0,2,0}] and asserts that the interpreter admits both True and False. The endpoint evaluations are:

  • date_bin({0,1,0}, 2024-01-01 12:00) = 2024-01-01 00:00>= is True.
  • date_bin({0,2,0}, 2024-01-01 12:00) = 2023-12-31 00:00>= is False.

Because the endpoints already produce opposite boolean answers under the (buggy) (true, true) annotation, the interpreter's union of {False} and {True} already admits both outcomes. The test therefore passes regardless of whether date_bin_timestamp is marked (true, true) or (false, true).

Verified empirically: I reverted src/expr/src/scalar/func.rs:2057 to is_monotone = "(true, true)" while leaving the rest of the PR in place, and ran cargo test --lib -p mz-expr test_date_bin_timestamp_non_monotone — the test still passes.

In contrast, the companion test test_add_timestamp_interval_non_monotone is genuinely diagnostic — reverting the add_timestamp_interval annotation makes it fail with interpreter incorrectly ruled out matching rows.

The in-test comment is also self-contradictory: it asserts "both endpoint evaluations give the same boolean answer" while the lines just above explicitly list one endpoint satisfying >= and the other not.

Impact: The fix to date_bin_timestamp / date_bin_timestamp_tz is correct, but the test that's supposed to lock it in provides false assurance. A later refactor that re-promotes the annotation would not be caught by cargo test -p mz-expr --lib (the PR description points to this command as the verification gate). Given that filter-pushdown correctness bugs are P1/test-blocker (see #9656), losing the regression coverage is a real durability concern.

Suggested fix: Compare against a timestamp strictly between the two endpoint outputs so that the buggy lex-mapping rules out the matching-rows case. For example:

// Endpoint outputs are 2023-12-31 00:00 and 2024-01-01 00:00. An interior
// stride of `{0, 1 day, 12h-worth-of-micros}` bins source to 2024-01-01 12:00,
// which is outside the lex-endpoint box. Comparing against 2024-01-01 12:00:00
// forces the interpreter to (wrongly, under the buggy annotation) rule out
// True.
let expr = MirScalarExpr::column(0)
    .call_binary(ts_lit("2024-01-01T12:00:00"), DateBinTimestamp)
    .call_binary(ts_lit("2024-01-01T12:00:00"), Gte);

// ... same column setup ...

assert!(
    range_out.may_contain(Datum::True),
    "date_bin is not monotone in the stride argument; \
     interpreter must not rule out matching rows",
);

Under (true, true), the lex range [2023-12-31, 2024-01-01] is entirely < 2024-01-01 12:00:00, so the interpreter would only admit False and the may_contain(True) assertion would fail — making it a real regression test. Under the fix's (false, true), the stride flat-map yields anything(), both outcomes are reachable, and the assertion passes.

(Concretely, an interior stride such as {0, 1, 43200000000} — 1 day + 12 hours — does bin 2024-01-01 12:00:00 to itself, so the predicate is genuinely achievable. The test doesn't need to construct that value; it just needs to assert the interpreter doesn't rule out True.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! Fixed in the latest commit.

use chrono::NaiveDateTime;
use mz_repr::adt::interval::Interval;
use mz_repr::adt::timestamp::CheckedTimestamp;
use mz_repr::{Datum, Row};

let arena = RowArena::new();

let ts_lit = |s: &str| {
let mut row = Row::default();
row.packer().push(Datum::Timestamp(
CheckedTimestamp::from_timestamplike(
NaiveDateTime::parse_from_str(s, "%Y-%m-%dT%H:%M:%S").unwrap(),
)
.unwrap(),
));
MirScalarExpr::Literal(Ok(row), ReprScalarType::Timestamp.nullable(false))
};
let interval = |months: i32, days: i32, micros: i64| {
Datum::Interval(Interval {
months,
days,
micros,
})
};

// Expression: `date_bin(stride_col, 2024-01-01 12:00:00) > 2024-01-01 06:00:00`.
// stride_col ranges over `[1 day, 2 days]`.
//
// Endpoint evaluations:
// 1 day stride → bins to 2024-01-01 00:00:00
// 2 day stride → bins to 2023-12-31 00:00:00
//
// Interior strides produce results *outside* that endpoint box. For
// example, a 1.5-day stride (i.e. `{0 months, 1 day, 12 h micros}`,
// which sorts between the two endpoints in lex order) bins
// 2024-01-01 12:00:00 to exactly 2024-01-01 12:00:00 — well above the
// endpoint maximum of 2024-01-01 00:00:00. With the buggy
// `(true, true)` annotation, the interpreter narrows the output to
// `[Dec 31 00:00, Jan 1 00:00]`, both of which are `<= Jan 1 06:00`,
// so the predicate is wrongly proved `False`. With the non-monotone
// fix the output is `anything()`, so `True` is correctly admitted.
let expr = MirScalarExpr::column(0)
.call_binary(ts_lit("2024-01-01T12:00:00"), DateBinTimestamp)
.call_binary(ts_lit("2024-01-01T06:00:00"), Gt);

let relation = ReprRelationType::new(vec![ReprScalarType::Interval.nullable(false)]);
let mut interpreter = ColumnSpecs::new(&relation, &arena);
interpreter.push_column(
0,
ResultSpec::value_between(interval(0, 1, 0), interval(0, 2, 0)),
);

let range_out = interpreter.expr(&expr).range;
assert!(
range_out.may_contain(Datum::True),
"date_bin is not monotone in the stride argument; \
interior strides can produce outputs outside the endpoint-bounded \
box, so the interpreter must admit True for `>`-style predicates",
);
}

#[mz_ore::test]
fn test_trace() {
use super::Trace;
Expand Down
35 changes: 27 additions & 8 deletions src/expr/src/scalar/func.rs
Original file line number Diff line number Diff line change
Expand Up @@ -181,15 +181,21 @@ fn add_float64(a: f64, b: f64) -> Result<f64, EvalError> {
}
}

#[sqlfunc(is_monotone = "(true, true)", is_infix_op = true, sqlname = "+")]
// `Interval` is lex-ordered (months, days, micros), but adding an interval to a
// timestamp adds *calendar* months (with day-clamping) which does not respect
// that ordering: e.g. `i1 = {0 months, 31 days}` is lex-less than
// `i2 = {1 month, 0 days}`, but `2024-01-31 + i1 = 2024-03-02` is greater than
// `2024-01-31 + i2 = 2024-02-29`. Day-clamping plus preserved sub-day time also
// breaks monotonicity in the first argument near month boundaries.
#[sqlfunc(is_monotone = "(false, false)", is_infix_op = true, sqlname = "+")]
fn add_timestamp_interval(
a: CheckedTimestamp<NaiveDateTime>,
b: Interval,
) -> Result<CheckedTimestamp<NaiveDateTime>, EvalError> {
add_timestamplike_interval(a, b)
}

#[sqlfunc(is_monotone = "(true, true)", is_infix_op = true, sqlname = "+")]
#[sqlfunc(is_monotone = "(false, false)", is_infix_op = true, sqlname = "+")]
fn add_timestamp_tz_interval(
a: CheckedTimestamp<DateTime<Utc>>,
b: Interval,
Expand All @@ -212,15 +218,16 @@ where
Ok(CheckedTimestamp::from_timestamplike(T::from_date_time(dt))?)
}

#[sqlfunc(is_monotone = "(true, true)", is_infix_op = true, sqlname = "-")]
// See `add_timestamp_interval` for why this is not monotone.
#[sqlfunc(is_monotone = "(false, false)", is_infix_op = true, sqlname = "-")]
fn sub_timestamp_interval(
a: CheckedTimestamp<NaiveDateTime>,
b: Interval,
) -> Result<CheckedTimestamp<NaiveDateTime>, EvalError> {
sub_timestamplike_interval(a, b)
}

#[sqlfunc(is_monotone = "(true, true)", is_infix_op = true, sqlname = "-")]
#[sqlfunc(is_monotone = "(false, false)", is_infix_op = true, sqlname = "-")]
fn sub_timestamp_tz_interval(
a: CheckedTimestamp<DateTime<Utc>>,
b: Interval,
Expand Down Expand Up @@ -249,7 +256,12 @@ fn add_date_time(
Ok(CheckedTimestamp::from_timestamplike(dt)?)
}

#[sqlfunc(is_monotone = "(true, true)", is_infix_op = true, sqlname = "+")]
// Monotone in `date` (dates have no sub-day component, so day-clamping at month
// boundaries only causes results to collapse, never to reverse), but not in
// `interval`: e.g. `{0 months, 31 days}` is lex-less than `{1 month, 0 days}`,
// but adding the former to `2024-01-31` gives `2024-03-02` while the latter
// gives `2024-02-29`.
#[sqlfunc(is_monotone = "(true, false)", is_infix_op = true, sqlname = "+")]
fn add_date_interval(
date: Date,
interval: Interval,
Expand Down Expand Up @@ -852,8 +864,9 @@ fn sub_interval(a: Interval, b: Interval) -> Result<Interval, EvalError> {
.ok_or_else(|| EvalError::IntervalOutOfRange(format!("{a} - {b}").into()))
}

// See `add_date_interval` for why this is not monotone in `interval`.
#[sqlfunc(
is_monotone = "(true, true)",
is_monotone = "(true, false)",
is_infix_op = true,
sqlname = "-",
propagates_nulls = true
Expand Down Expand Up @@ -2036,7 +2049,12 @@ where
Ok(CheckedTimestamp::from_timestamplike(res)?)
}

#[sqlfunc(is_monotone = "(true, true)", sqlname = "bin_unix_epoch_timestamp")]
// Non-monotone in `stride`: the result is `origin + floor((source - origin) /
// stride) * stride`. For a fixed source like `2024-01-01 12:00:00`, a 1-day
// stride bins to `2024-01-01 00:00:00`, but a 2-day stride bins to
// `2023-12-31 00:00:00` — i.e. the lex-larger interval produces an earlier
// timestamp. Monotone in `source`.
#[sqlfunc(is_monotone = "(false, true)", sqlname = "bin_unix_epoch_timestamp")]
fn date_bin_timestamp(
stride: Interval,
source: CheckedTimestamp<NaiveDateTime>,
Expand All @@ -2047,7 +2065,8 @@ fn date_bin_timestamp(
date_bin(stride, source, origin)
}

#[sqlfunc(is_monotone = "(true, true)", sqlname = "bin_unix_epoch_timestamptz")]
// See `date_bin_timestamp` for why this is not monotone in `stride`.
#[sqlfunc(is_monotone = "(false, true)", sqlname = "bin_unix_epoch_timestamptz")]
fn date_bin_timestamp_tz(
stride: Interval,
source: CheckedTimestamp<DateTime<Utc>>,
Expand Down
2 changes: 0 additions & 2 deletions test/sqllogictest/filter-pushdown.slt
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,6 @@ Explained Query:

Source materialize.public.t
filter=(((#1{t} - case when (#0{x} = 0) then 1 day else 2 days end) < 2023-10-02 15:55:31.918))
pushdown=(((#1{t} - case when (#0{x} = 0) then 1 day else 2 days end) < 2023-10-02 15:55:31.918))

Target cluster: quickstart

Expand All @@ -349,7 +348,6 @@ materialize.public.mv9:

Source materialize.public.t
filter=((mz_now() > timestamp_to_mz_timestamp((#1{t} - case when (#0{x} = 0) then 1 day else 2 days end))))
pushdown=((mz_now() > timestamp_to_mz_timestamp((#1{t} - case when (#0{x} = 0) then 1 day else 2 days end))))

Target cluster: quickstart

Expand Down
Loading