Skip to content

WMR soft limit#18966

Merged
ggevay merged 3 commits into
MaterializeInc:mainfrom
ggevay:wmr-limit2
May 12, 2023
Merged

WMR soft limit#18966
ggevay merged 3 commits into
MaterializeInc:mainfrom
ggevay:wmr-limit2

Conversation

@ggevay

@ggevay ggevay commented Apr 26, 2023

Copy link
Copy Markdown
Contributor

This PR implements the WMR soft limit in the form that we agreed on with @aalexandrov. (I'll update the design doc tomorrow.) The "soft" means that we don't error out when we reach the limit, but just stop iterating, and consider the current state as the final result.

The default limit is infinite, i.e., no limit.

We postponed the hard limit (erroring out when reaching the limit), because we now have proper dataflow cancellation for WMR queries. We can still consider a hard limit later (which would be easy to add after this PR).

We moved away from system/session variables, based on feedback on the design doc from Surfaces and Frank.

Instead of system/session variables, users can specify the limit using new SQL syntax (Edit: I'm in the process of changing this to a more standard options clause (but leave it at the same place)):
WITH MUTUALLY RECURSIVE MAXITERATIONS 42
This syntax allows for separate limits for each WMR block, which is important when a query has multiple WMR blocks.

The first commit is just changing iteration numbers to use u64 instead of usize, based on a discussion in the office hours. The second commit runs cargo fmt, which I will squash into the first one, of course. (Just wanted to separate the diff for review.)

The second commit is the meat.

@aalexandrov, is the EXPLAIN format ok? (HIR, MIR, LIR) Note that I'm not currently testing linear_chains, because it doesn't seem to work for WMR (and nobody seems to be using it). We can discuss whether to fix it or deprecate it. (I'll open an issue tomorrow.)

Keyword

The keyword is tentative; we can still decide that before merging, but I like MAXITERATIONS. (@ggnall) Some possible alternatives:

  • MAXDEPTH 42: users might think of recursive function calls, which is not what's happening here.
  • LIMIT 42: It's not clear what are we limiting: iterations, records, time, ... Also, it could be confused with ORDER BYs LIMIT.
  • (ITERATE 42 TIMES): This one would also be ok I guess, but I'd vote for simplicity.
  • RECURSIONLIMIT: Might be ok as well, but I wanted to avoid (internal) confusion with our pub const RECURSION_LIMIT, which is for something totally different.
  • MAXRECURSION: Edit: The MySQL option with the same keyword would correspond to a hard limit, so let's not align our soft limit's keyword with that.
  • ITERATIONS: (from Jan) We don't have to call it a limit, since running to an exactly specified number of iterations is the same as stopping when either a limit is reached or fixpoint is reached. But I'm a bit worried that some users won't make this extra mental step of realizing that it might execute less iterations if a fixpoint is reached earlier, and might get worried that it will take more steps than necessary.

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated design doc: WMR limits design doc #18538
    • (The design doc currently doesn't reflect the move away from session/system variables as well as the hard limit's importance going down with the dataflow shutdown. I'll update the design doc a bit later.)
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
    • We do have a protobuf change for the LIR LetRec, but I asked Jan whether we need to be backwards-compatible, and he said "Nope, we are not durably storing serialized IRs anywhere, I’m pretty sure. So no need to be backwards-compatible in the protobufs"
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@ggevay ggevay added A-optimization Area: query optimization and transformation A-compute Area: compute labels Apr 26, 2023
@ggevay ggevay requested a review from a team April 26, 2023 10:22
@ggevay ggevay requested a review from a team as a code owner April 26, 2023 10:22
@ggevay ggevay marked this pull request as draft April 26, 2023 10:23
@ggevay ggevay force-pushed the wmr-limit2 branch 7 times, most recently from 1fa57d7 to ac23eef Compare April 26, 2023 15:57
Comment thread src/sql-parser/src/ast/defs/query.rs Outdated
@ggevay ggevay added the T-proto Theme: `$T ⇔ Proto$T` conversions and `*.proto` files label Apr 26, 2023
@ggevay ggevay requested a review from aalexandrov April 26, 2023 18:54
@ggevay ggevay marked this pull request as ready for review April 26, 2023 18:54
@ggevay

ggevay commented Apr 26, 2023

Copy link
Copy Markdown
Contributor Author

Ready for review! @MaterializeInc/surfaces, @aalexandrov, @vmarcos (rendering).

@ggevay ggevay requested a review from a team April 26, 2023 19:07
Comment thread src/sql-parser/src/ast/defs/query.rs Outdated
}
CteBlock::MutuallyRecursive(list) => {
for cte in list.iter() {
CteBlock::MutuallyRecursive(MutRecBlock {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside (don't need to do anything in this PR): having logic in the parser is unfortunate and it'd be nice for it to not be here.

Comment thread src/sql-parser/src/parser.rs Outdated

@philip-stoev philip-stoev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the tests -- I could not figure out any additional ones to add.

The feature held under manual experimentation and I could not get it to become wedged .

The diff looks less formidable with whitespace ignored.

@philip-stoev

Copy link
Copy Markdown
Contributor

Actually here is a test that you could push -- it confirms that the number of iterations is reset at every timestamp:

CREATE TABLE t1 (f1 INTEGER);
CREATE MATERIALIZED VIEW v1 AS WITH MUTUALLY RECURSIVE MAXITERATIONS 2 cnt (f1 INTEGER) AS (SELECT f1 FROM t1 UNION ALL SELECT f1+1 AS f1 FROM cnt) SELECT * FROM cnt;
INSERT INTO t1 VALUES (1);
SELECT * FROM v1;
UPDATE t1 SET f1 = 2;
SELECT * FROM v1;

@teskje teskje left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General compute parts LGTM, I just have a couple stylistic comments. I'll defer to @aalexandrov for the transform changes.

Comment thread src/compute-client/src/explain/text.rs Outdated
Comment thread src/compute-client/src/plan/mod.rs Outdated
Comment thread src/compute/src/render/mod.rs Outdated
Comment thread src/compute/src/render/mod.rs Outdated
Comment thread src/compute-client/src/plan.proto Outdated
repeated mz_expr.id.ProtoLocalId ids = 1;
repeated ProtoPlan values = 2;
repeated uint64 max_iters = 4;
repeated bool max_iters_present = 5;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there is no repeated optional uint64 that would make this second field unnecessary?

Relatedly, but not really relevant to this PR, I have been wondering why LetRec is represented with separate lists for ids and values (and now max_iters). ISTM that instead having a single bindings list with (id, value, max_iter) entries would be more convenient since it doesn't require ensuring that the lists have the same length all the time. But I'm probably missing something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that this might be a better representation but decided to not change it as part of the ongoing epic. I think it must be a skunkworks project or something similar, unless we find ~3-4 days of in-band work to do this as part of addressing tech debt.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

Regarding max_iters_present, Alex mentioned in another comment:

One general comment: we should absolutely prohibit people setting the MAX_ITERATIONS for some binding to 0, otherwise we run the risk of all sorts of incorrect optimizations for those blocks (or unnecessary complicated transform code).

Doing that would allow us to drop max_iters_present here and just use 0 to mean "no limit set".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm throwing an error for 0. (while creating the HIR from the AST)

I thought about that, but I thought the code is cleaner this way. Special values are always a little bit scary; maybe somebody decides to suddenly allow 0. But in this case the risk would be low, so I can change it if that is preferred.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer it, since it statically removes the possibility that the two lists might become inconsistent (e.g. have different lengths), so there is one thing less to check at runtime. But your argument about special values is valid too, so I don't oppose leaving things as they are.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it.

Comment thread src/expr/src/explain/text.rs Outdated
Comment thread src/expr/src/explain/text.rs Outdated
Comment thread src/transform/src/normalize_lets.rs Outdated

@vmarcos vmarcos left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rendering changes look good to me; it would be ideal if we'd include test(s) for nested WMR blocks with different limits also in execution (test/sqllogictest/with_mutually_recursive.slt), though, not only in planning.

Get::PassArrangements l1
raw=true
With Mutually Recursive
cte [MaxIterations None] l1 =

@aalexandrov aalexandrov Apr 27, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not print out [MaxIterations None] (which I assume would be the default most of the time)?

@aalexandrov aalexandrov Apr 27, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stylistically parameters of AST nodes have been printed in snake_case elsewhere and key value pairs use $key=$val, so I think

cte [max_iterations=10] l1 =

is a bit more consistent with the rest of the format.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed, thx!

@aalexandrov

aalexandrov commented Apr 27, 2023

Copy link
Copy Markdown
Contributor

One general comment: we should absolutely prohibit people setting the MAX_ITERATIONS for some binding to 0, otherwise we run the risk of all sorts of incorrect optimizations for those blocks (or unnecessary complicated transform code). There are some NonZero~ types in https://doc.rust-lang.org/stable/std/num/index.html and I guess the most precise way to enforce this constraint is to use one of those.

@ggevay

ggevay commented Apr 27, 2023

Copy link
Copy Markdown
Contributor Author

Thanks for the reviews! In addition to addressing the inline comments, I made the following changes:

I added the test from @philip-stoev.

I added the test that @vmarcos suggests.

I changed the LIR EXPLAIN to not print it if None, as @aalexandrov suggested.

we should absolutely prohibit people setting the MAX_ITERATIONS for some binding to 0, otherwise we run the risk of all sorts of incorrect optimizations for those blocks (or unnecessary complicated transform code).

Yes, I'm throwing an error for 0. (while creating the HIR from the AST)

There are some NonZero~ types

Hmm, nice! I changed the code to use this. (I didn’t properly implement MzReflect for NonZeroU64, but I hope that’s ok. We are not planning to use lowertest for WMR I guess, since we are moving away from lowertest anyway. Edit: Sorry, it's failing the tests. I'll fix it.)

Comment thread src/compute-client/src/plan/mod.rs Outdated
ids: ids.into_proto(),
max_iters: max_iters
.into_iter()
.map(|d| match d {

@aalexandrov aalexandrov May 1, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: can you add this

impl RustType<u64> for Option<NonZeroU64> {
    fn into_proto(&self) -> u64 {
        match self {
            Some(d) => d.get(),
            None => 0,
        }
    }

    fn from_proto(proto: u64) -> Result<Self, TryFromProtoError> {
        Ok(NonZeroU64::new(proto))
    }
}

after this line

rust_type_id![bool, f32, f64, i32, i64, String, u32, u64, Vec<u8>];

and then use max_iters.into_proto() and proto.max_iters.into_rust()?? This will allow other people that want to encode Option<NonZeroU64>> without the boilerplate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, done

@aalexandrov aalexandrov self-requested a review May 1, 2023 21:18

@aalexandrov aalexandrov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, from my original suggestions I think the only thing missing is changing the output format of the new attribute from

[MaxIterations $x]

to

[max_iterations=$x]

@ggevay ggevay marked this pull request as draft May 5, 2023 20:14
@ggevay ggevay force-pushed the wmr-limit2 branch 4 times, most recently from ea7a9ba to 8192f22 Compare May 9, 2023 18:16
@ggevay ggevay marked this pull request as ready for review May 9, 2023 19:27
@ggevay

ggevay commented May 9, 2023

Copy link
Copy Markdown
Contributor Author

I've addressed all comments, including changing the syntax based on the SQL design principles notion doc and SELECT's expected group size option. An example for the syntax:

WITH MUTUALLY RECURSIVE (ITERATION LIMIT 6)
  cnt (i int) AS (
    (WITH MUTUALLY RECURSIVE (ITERATION LIMIT = 3)
       cnt (i int) AS (
         SELECT 1 AS i
         UNION
         SELECT i+1 FROM cnt)
       SELECT i FROM cnt
    )
    UNION
    SELECT i+100 FROM cnt)
SELECT i FROM cnt;

We can decide the exact keywords later, after we finalize the keywords for WITH MUTUALLY RECURSIVE itself. (One option that came up was WITH REPEATEDLY, which doesn't have RECURSIVE in it, so the word ITERATION would be ok for it.)

@benesch, @ggnall, could you please take a quick look at the syntax?

The surfaces code changed because of the different option parsing. @mjibson, could you please check the new parsing and planning?

I also updated the design doc.

@def- def- left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interesting spots from code coverage report marked inline: https://buildkite.com/materialize/coverage/builds/89

Comment thread src/compute/src/render/mod.rs
tokens.insert(object.id, object_token);
// Import declared indexes into the rendering context.
for (idx_id, idx) in &dataflow.index_imports {
let export_ids = dataflow.export_ids().collect();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting block to test

@ggevay ggevay May 9, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed not covered, thanks! I'll add a test for this. (Although this PR makes only a trivial change to this part, but it's still somewhat new code. It was introduced with Frank's initial implementation of WMR.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test in aebe703

Comment thread src/compute/src/render/mod.rs
Comment thread src/expr/src/explain/text.rs Outdated

if ctx.config.linear_chains {
writeln!(f, "{}With Mutually Recursive", ctx.indent)?;
write!(f, "{}With Mutually Recursive", ctx.indent)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire block is untested it seems?

@ggevay ggevay May 9, 2023

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because the linear chains option currently doesn't work with WMR, see https://github.com/MaterializeInc/materialize/issues/19012.

@madelynnblue madelynnblue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surfaces parts lgtm

@ggevay ggevay force-pushed the wmr-limit2 branch 4 times, most recently from 8b6d024 to 037dd79 Compare May 10, 2023 14:09
@benesch

benesch commented May 11, 2023

Copy link
Copy Markdown
Contributor

New syntax looks much better, nice! I signal boosted in #devex (https://materializeinc.slack.com/archives/C015RHB3LDR/p1683779552919369) for additional feedback. "Iteration limit" sounds a little funny to my ear, but no strong feelings.

@morsapaes

morsapaes commented May 11, 2023

Copy link
Copy Markdown
Contributor

We can decide the exact keywords later, after we finalize the keywords for WITH MUTUALLY RECURSIVE itself. (One option that came up was WITH REPEATEDLY, which doesn't have RECURSIVE in it, so the word ITERATION would be ok for it.)

Any user that is familiar with recursion in SQL will have a mental mapping for WITH RECURSIVE, so it doesn't sound right to use an entirely different term for it (like WITH REPEATEDLY). Are we forced to use the MUTUALLY keyword, or would it be possible to relax this to WITH RECURSIVE, and spell out how we depart from the SQL standard idiom?

For the same reason, my preference for the iteration limit would also be for a keyword that is used in other systems, like MAXRECURSION or MAXDEPTH. AFAIU, these refer to the maximum depth level, though (which seems to be how most databases limit recursive queries?), so if that's a fundamentally wrong way to think about it, it might be safer to use MAXITERATIONS.

@ggevay

ggevay commented May 11, 2023

Copy link
Copy Markdown
Contributor Author

Any user that is familiar with recursion in SQL will have a mental mapping for WITH RECURSIVE, so it doesn't sound right to use an entirely different term for it (like WITH REPEATEDLY).

True!

Are we forced to use the MUTUALLY keyword, or would it be possible to relax this to WITH RECURSIVE, and spell out how we depart from the SQL standard idiom?

From my point of view, simply using WITH RECURSIVE sounds ok, but Frank said that Nikhil might not want that. The only potential issue that I can see is that we have a semantic difference, i.e., some recursive queries have different results between Postgres and Materialize. However,

  • These cases are a bit exotic. Hopefully not many people are relying on the weirder parts of Postgres' recursion semantics.
  • Besides the keyword, we also have another syntactic difference: The user has to specify the types for each recursive CTE explicitly. So the user has to pause for a moment and look at our docs when porting a recursive query from Postgres to Materialize, and then in the docs we can place a prominent warning about the semantic difference.

What do you think, @frankmcsherry, @benesch, @aalexandrov?

For the same reason, my preference for the iteration limit would also be for a keyword that is used in other systems, like MAXRECURSION or MAXDEPTH. AFAIU, these refer to the maximum depth level, though (which seems to be how most databases limit recursive queries?), so if that's a fundamentally wrong way to think about it, it might be safer to use MAXITERATIONS.

MySQL errors out when reaching MAXRECURSION, which would correspond to our hard limit. In contrast, the soft limit (this PR) simply produces the current state as the final result when reaching the limit. For this reason, I think we shouldn't align the keyword.

RECURSION LIMIT sounds ok to me, though.

@benesch

benesch commented May 11, 2023

Copy link
Copy Markdown
Contributor

From my point of view, simply using WITH RECURSIVE sounds ok, but Frank said that Nikhil might not want that. The only potential issue that I can see is that we have a semantic difference, i.e., some recursive queries have different results between Postgres and Materialize.

Yeah, I want to keep the door open for supporting the SQL standard's semantics for WITH RECURSIVE. It may be important for a customer one day.

@petrosagg had the take of: why bother with the RECURSIVE keyword at all? Just allow CTEs in a normal WITH block to refer to one another. I think that's more plausible, because that's a strict extension to the SQL spec, rather than

MySQL errors out when reaching MAXRECURSION, which would correspond to our hard limit. In contrast, the soft limit (this PR) simply produces the current state as the final result when reaching the limit. For this reason, I think we shouldn't align the keyword.

Oh, interesting! That makes sense for debugging (you want to watch it progress), but kind of scary if you were running in production. Should we be very explicit with our keyword and include something like SOFT in the name? SOFT ITERATION LIMIT, for example? Although I'm not sure it's exactly a "soft limit"—I think of a "soft" vs "hard" as "changeable upon request" vs "enforced", whereas this is about "errors or not if limit reached." Naming is hard!

Just noodling:

WITH ... (ITERATION LIMIT = 256, REQUIRE FIXPOINT = false)
WITH ... (ITERATION LIMIT = 256, REQUIRE FIXPOINT = true)
WITH ... (ITERATION LIMIT = 256) -- REQUIRE FIXPOINT defaults to true

I think to avoid blocking this PR any longer we should move forward with either ITERATION LIMIT or RECURSION LIMIT, but let's sync again on the syntax holistically once this is closer to stabilization.

@teskje

teskje commented May 12, 2023

Copy link
Copy Markdown
Contributor

Another idea I had was to just use the keyword ITERATIONS without calling it a limit: #18538 (comment).

@ggevay

ggevay commented May 12, 2023

Copy link
Copy Markdown
Contributor Author

I like Petros' and Jan's simplifying ideas:

  • Simply WITH, no RECURSIVE is needed.
  • Simply ITERATIONS, no LIMIT is needed.

I'm not sure about the soft vs. hard limit. I think if the docs are very explicit about not erroring but simply stopping by default, then it's ok to simply stop by default.

Merging now with ITERATION LIMIT, and then let's decide these things as a follow-up, to already let people use the feature (internally), and also to avoid rebasing this PR again and again.

@ggevay ggevay merged commit f2bfd87 into MaterializeInc:main May 12, 2023
@ggevay ggevay mentioned this pull request May 12, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-compute Area: compute A-optimization Area: query optimization and transformation T-proto Theme: `$T ⇔ Proto$T` conversions and `*.proto` files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants