workload-replay: Handle larger captured workloads#36686
Draft
def- wants to merge 8 commits into
Draft
Conversation
1b5bcea to
b20f0f1
Compare
Contributor
Author
|
Currently blocked by https://linear.app/materializeinc/issue/CLU-95/thread-coordinator-panicked-at-srccompute-clientsrcas-of & https://github.com/MaterializeInc/database-issues/issues/9656#issuecomment-4523776124 Edit: Rebased on top of potential fix, retrying: https://buildkite.com/materialize/release-qualification/builds/1250 |
`Interval` is lex-ordered by (months, days, micros), but adding an
interval to a timestamp or date adds *calendar* months with day-clamping
and then days as 24-hour periods. That arithmetic does not respect the
lex order:
t = 2024-01-31
i1 = {0 months, 31 days} → t + i1 = 2024-03-02
i2 = {1 month, 0 days} → t + i2 = 2024-02-29
In lex order `i1 < i2`, yet `t + i1 > t + i2`. For timestamps, the
day-clamping also collapses near-boundary inputs into the same date
while preserving sub-day time, so the first argument is non-monotone
too:
t1 = 2024-01-30 23:59:59, i = {1 month} → 2024-02-29 23:59:59
t2 = 2024-01-31 00:00:00, i = {1 month} → 2024-02-29 00:00:00
For dates the first argument *is* monotone (no sub-day precision means
clamping only collapses, never reverses), but the interval argument has
the same problem.
These annotations are consumed by the abstract interpreter that drives
persist filter pushdown. Marking these functions monotone meant the
interpreter computed the output range by evaluating the function only
at the endpoints of the input interval range — and a stats range like
`[{0m,31d}, {1m,0d}]` would yield the narrow output range
`[2024-02-29, 2024-03-02]` even though interior intervals
(e.g. `{0m, 60d}`) actually produce timestamps far outside that window.
Filter pushdown could then incorrectly conclude a part had no matching
rows, tripping the `persist filter pushdown correctness violation!`
audit in `persist_source.rs`.
Companion fix to b607993, which corrected the analogous annotations
for `add_time_interval`, `sub_time_interval`, `mul_interval`, and
`div_interval`.
Fixes database-issues#9656.
The override was added to suppress the audit panic from database-issues#9656; with the monotonicity annotations on timestamp/date + interval corrected, filter pushdown should be sound again. Removing this lets CI re-exercise the audit and surface any remaining latent causes.
- Drop useless .into() in the new interpret.rs regression test (clippy). - Update filter-pushdown.slt: sub_timestamp_interval is no longer pushdownable now that it's marked non-monotone, so the expected pushdown= lines for 'timestamp - INTERVAL day' queries are gone. This is the known tradeoff for soundness — even day-only intervals share the non-monotonic annotation since the abstract interpreter can't distinguish them statically.
date_bin(stride, source) = origin + floor((source - origin) / stride) * stride. For a fixed source like 2024-01-01 12:00:00, a 1-day stride bins to 2024-01-01 00:00:00 but a 2-day stride bins to 2023-12-31 00:00:00 — i.e. the lex-larger stride produces an earlier output. Same class of bug as the timestamp/date + interval monotonicity issues. Demotes date_bin_timestamp and date_bin_timestamp_tz from (true, true) to (false, true). Still monotone in source.
b20f0f1 to
8408777
Compare
Postgres-style `age(a, b)` is annotated `is_monotone = "(true, true)"`,
but it is non-monotone in either argument.
First argument: the carry logic in the calendar-aware difference borrows
a whole month's worth of days when the day field goes negative, which
breaks the lex order of `Interval` (months, days, micros) at month
boundaries:
age(2024-03-31, 2024-02-15) = {1 month, 16 days}
age(2024-04-01, 2024-02-15) = {1 month, 15 days} ← lex-smaller
age(2024-05-01, 2024-02-15) = {2 months, 15 days}
Second argument: after the algorithm's final sign-revert, the same
day-borrow makes the result dip non-monotonically as `b` crosses a
month boundary:
age(2024-02-29, 2024-03-30) = {-1 month, -1 day}
age(2024-02-29, 2024-03-31) = {-1 month, -2 days} ← lex-smaller
age(2024-02-29, 2024-04-01) = {-1 month, -1 day}
Demotes both `age_timestamp` and `age_timestamp_tz` to
`(false, false)`. Adds regression tests in `interpret.rs` exercising
the abstract interpreter via a `>=` predicate, mirroring the existing
tests for the analogous `add_timestamp_interval` and
`date_bin_timestamp` bugs.
Same bug class as 5257b0d and 51e86ed.
8408777 to
7701761
Compare
The runtime MFP evaluator (SafeMfpPlan::evaluate_inner) runs every
expression once all preceding predicates pass, so an expression that
errors on the actual data turns the whole row into an Err. The
abstract interpreter's mfp_filter / mfp_plan_filter, however, only
ANDs together the predicates and temporal bounds — so the AND result
misses the fallibility of any expression whose result column isn't
referenced by a predicate or bound. Persist filter pushdown then
discards parts that actually produce error rows, tripping the audit
panic in persist_source.rs.
Concretely from the audit log on database-issues#9656:
expressions: [cast_string_to_uuid(merchant_group_id)]
predicates: [NOT IsNull(checksum)]
upper_bounds: [cast_timestamp_tz_to_mz_timestamp(coalesce(deleted_at, ...))]
Stats say checksum is non-null and deleted_at is in the past. The
interpreter computed AND({True}, {False}) = {False} with fallible=false
and discarded the part. The actual evaluator: predicate passes,
cast_string_to_uuid is evaluated next, errors on the row's
merchant_group_id value, and the whole row is emitted as Err. Audit
catches the discrepancy.
Override mfp_filter and mfp_plan_filter on ColumnSpecs so that the
returned summary's fallible bit is set if any of the MFP expressions'
specs are fallible. This is conservative (we'll keep parts where a
predicate could-but-doesn't-have-to fail, even though predicate
short-circuiting would have prevented the expression from running), but
it's sound and matches the runtime semantics.
Adds a regression test that builds an MFP with one always-erroring
expression and one always-passing predicate, asserts that the
interpreter's summary may_fail.
7701761 to
d094b5c
Compare
The previous commits in this stack marked add/sub_timestamp_interval (and the tz variants) as non-monotone in both arguments to preserve correctness in the presence of calendar-month / day-clamping arithmetic. But that conservative annotation costs pushdown for common temporal-filter predicates like 't_col - INTERVAL ''1'' day < literal', which were a motivating use case for filter pushdown in the first place. Recovers that pushdown by extending SpecialBinary with a DynamicMonotone handler: when the interval argument is a literal with months == 0, the operation reduces to adding a fixed number of microseconds, which is monotone in both arguments. In that case the interpreter computes a tight output range; in all other cases it falls back to anything(), preserving the soundness fix. The SpecialBinary's pushdownable hint is set to (true, false) so the Trace pass routes 't_col +/- INTERVAL_lit' predicates through pushdown regardless of months — at runtime the dynamic check decides whether to narrow.
cdc64ce to
6c2255b
Compare
Captured workloads from production environments often have features the local replay did not yet support; replaying them previously failed at various points before useful test work could happen. * config: add source_ingestion_*, balanced_*, and highcpu_* cluster replica sizes so CREATE CLUSTER REPLICA against these production size names is accepted locally. * objects: accept quoted database names with non-identifier characters (e.g. hyphens) in CREATE CONNECTION; parse double-quoted REFERENCE / EXTERNAL REFERENCE parts so dotted references like `"db-name".schema.table` are split correctly. * objects: upstream CREATE TABLE statements now use IF NOT EXISTS — the same physical Postgres/MySQL/SQL Server table can back multiple MZ sources and we were failing the second time. * objects: strip options from captured SQL that don't survive local replay: empty `TEXT COLUMNS = ()` (planner panic), `EXCLUDE COLUMNS` (names upstream columns we don't recreate), Kafka `START OFFSET = (…)` (local topics have 1 partition), `VERSION = N` on CREATE SINK (internal-only option). * column: extend type coverage so `<scalar>[]` arrays, `interval`, `oid`, `real`, and Materialize-specific list/record/aclitem variants produce sensible non-NULL values; `character` (no size) now emits a single char so it fits CHAR(1). * column: function-call defaults like `pg_catalog.now()` are skipped in COPY and Kafka paths, where they would be inserted as literal strings and rejected. * column: avro_type falls back to `string` for non-primitive types so fastavro can parse the generated schema. * util / mzcompose: workload files can now be stored compressed as `.yml.zst` and are loaded transparently via the new `load_workload` helper. Glob `*.yml` also picks up the compressed variants. Large captured workloads compress ~30× with zstd, which keeps them under GitHub's 100 MiB per-file limit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6c2255b to
9c02d48
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Captured workloads from production environments often have features the local replay did not yet support; replaying them previously failed at various points before useful test work could happen.
"db-name".schema.tableare split correctly.TEXT COLUMNS = ()(planner panic),EXCLUDE COLUMNS(names upstream columns we don't recreate), KafkaSTART OFFSET = (…)(local topics have 1 partition),VERSION = Non CREATE SINK (internal-only option).<scalar>[]arrays,interval,oid,real, and Materialize-specific list/record/aclitem variants produce sensible non-NULL values;character(no size) now emits a single char so it fits CHAR(1).pg_catalog.now()are skipped in COPY and Kafka paths, where they would be inserted as literal strings and rejected.stringfor non-primitive types so fastavro can parse the generated schema..yml.zstand are loaded transparently via the newload_workloadhelper. Glob*.ymlalso picks up the compressed variants.