WMR limits design doc#18538
Conversation
f1c41f2 to
6ba9e63
Compare
|
|
||
| For use case 1., B. has the usability advantage over a system variable that it avoids the danger that multiple users of the same system don't know about each other's limit needs, and one of them sets a lower limit than what the other would want. However, I guess users will typically just set an infinite limit whenever they touch the limit at all, so there won't be any significant conflicts. Also note that a usability advantage of the system variable approach is that it allows power users to opt out of the hard limit once and for all, i.e., avoid having to type it for every query. Taking into account both of these considerations, I would vote for a system variable. | ||
|
|
||
| For use case 2., B. has the slight usability advantage over a session variable that it avoids the danger that someone forgets to reset the variable after a debugging session. However, I don't think this slight advantage would justify adding a new clause to the parser. **@MaterializeInc/surfaces**, would you agree? |
There was a problem hiding this comment.
I'm personally not a huge fan of having a session variable affect the semantics of a query. I know there are other existing variables that can similarly affect the result of a query (search_path, database, transaction_isolation), but none of them fundamentally changes the behavior of how a query is executed.
It's also not entirely clear to me how the number of iterations is being set for views with WMR queries. Is it just the default value of the session variable? If so, that value can change in between versions, which means that a version upgrade can alter the behavior of an existing view which also feels wrong.
I would feel better if the number of iterations were explicitly in the query itself instead of derived from a session variable.
There was a problem hiding this comment.
It's also not entirely clear to me how the number of iterations is being set for views with WMR queries. Is it just the default value of the session variable?
The session variable is for the debugging use case, which I meant mainly for one-off selects, during the development of a query.
If so, that value can change in between versions, which means that a version upgrade can alter the behavior of an existing view which also feels wrong.
Hmm, yes, this sounds indeed a bit weird. Although the session variable's default value would be infinite in all versions, because that is for the debugging use case, but the system variable's default (which is for the "stopping divergent queries" use case) might indeed change between versions.
I would feel better if the number of iterations were explicitly in the query itself instead of derived from a session variable.
I see your point!
|
I strongly prefer the SQL syntax to limit the number of iterations. Both because multiple WMR queries and sessions are a thing, because you want the behavior of your query to be represented in its text, and because there are independent reasons to want to limit the number of iterations (beyond debugging). I'm not sure I understand the value of the soft limit vs the hard limit, if they are both settable. |
|
Given that these variables can change the behavior of the query, including both its correct output and its error states, how certain are we that those change will not befoul things like replicas that are meant to produce identical output? |
We need both because we have conflicting requirements for what needs to happen when we reach the limit:
|
I was planning to bake the value into the dataflow already in LIR. |
c89f209 to
23e409d
Compare
|
Ok, you've convinced me that the session variable / system variable approach is just too hacky! |
dbe4bad to
1e46606
Compare
7f3bbe9 to
5fed142
Compare
|
There are still several options on how exactly to extend SQL. Specifically, the next question to decide is whether we would be ok with a limit setting that is scoped to an entire query, or we definitely want to scope it to a WMR clause right now. I have a usability discussion on this here. Scoping to a WMR clause sounds appealing from the user point of view, but before too quickly voting for this, please read the "implementation difficulties" section of the design doc. @frankmcsherry, Could you please check that I didn't miss any more needed changes in (I've made minor adjustments to the design doc, but I have not yet moved the "Extending SQL" section from the future work section to the main part. I'll do a big restructuring once we decide on the above question.) |
| [motivation]: #motivation | ||
|
|
||
| I can imagine 3 use cases, of which the first 2 seem urgent to me: | ||
| 1. _Stopping divergent queries ([#16800](https://github.com/MaterializeInc/materialize/issues/16800)):_ It is very easy to accidentally write a WMR query that runs forever (actually, until OOM in most cases, but that would often take a very long time). This, coupled with the problem that Ctrl+C (or dropping a view) currently doesn't cancel a dataflow ([#2392](https://github.com/MaterializeInc/materialize/issues/2392)), is an unpleasant user experience, as the user has to manually DROP the entire cluster. |
There was a problem hiding this comment.
as the user has to manually DROP the entire cluster.
Nit: They have to drop all replicas in the cluster, not the cluster itself.
There was a problem hiding this comment.
Another unpleasantry coming from this is that once you've re-created your replicas you they have to rehydrate their long-running dataflows, which might take time during which the output from these dataflows is unavailable.
|
|
||
| We should think about how bad it is when a user has a divergent WMR query running unstoppably. It is certainly a bit annoying due to having to then drop the entire cluster. However, it usually shouldn't be catastrophic, since most of these cases [will happen not on production clusters, but just on a cluster specifically started for experimental queries under development](https://materializeinc.slack.com/archives/C02PPB50ZHS/p1680178620218979?thread_ts=1680177489.365979&cid=C02PPB50ZHS). Note that it can conceivably happen that whether a query diverges depends on input data (e.g., whether there is a cycle in a graph), and then a user might not realize that it can be divergent before going to production. But I guess we don't have to be too afraid of this, because in general, it is advisable to put each critical and/or risky production dataflow on its own cluster anyway. | ||
|
|
||
| I think we should focus on making the initial experience of WMR as smooth as possible, because learning the concept itself is scary enough without having to know technical tricks such as "if it looks stuck, drop your cluster in addition to Ctrl+C". I would argue that giving an annoying user experience to a large portion of initial WMR users is not worth it to avoid a bit of inconvenience in some complex use cases that will probably be much less common, especially at the beginning. |
There was a problem hiding this comment.
I would agree with that, particularly because it is easy to see when you run into the hard limit (you get an error), while it is not easy to know that a dataflow you have created is stuck counting up or is still running after you think you have already dropped it.
There was a problem hiding this comment.
We have proper dataflow cancellation now!
| - The hard limit would still avoid OOMing for divergent dataflows in some cases. | ||
| - Without the hard limit, users would often wonder what's going on: is each iteration taking a long time for some reason, or do I have an infinite loop? | ||
| - We find that having proper dataflow cancellation is hard, and decide that we won't have it for a while. In this case, the default will probably have to stay a finite number for the foreseeable future. | ||
| - We find that users (or ourselves) often run into the limit not by a buggy query, but due to the logic of their query and the characteristics of their (correct) input data. In this case, we either should raise the default to a bigger number, or make it infinite. To track how often this happens, we could build a tool at some point that examines production catalogs and gathers how many has a non-default limit. |
There was a problem hiding this comment.
We should have a feature flag!
There was a problem hiding this comment.
For changing the default value without a new release. (@antiguru agrees.)
There was a problem hiding this comment.
(Hard limit postponed.)
| # Unresolved questions | ||
| [unresolved-questions]: #unresolved-questions | ||
|
|
||
| - It would be good to know how soon we'll have proper dataflow cancellation (see [#2392](https://github.com/MaterializeInc/materialize/issues/2392), plus [here](https://materializeinc.slack.com/archives/C02PPB50ZHS/p1680179894283039?thread_ts=1680177489.365979&cid=C02PPB50ZHS) and [here](https://materializeinc.slack.com/archives/C02PPB50ZHS/p1680199694602249)). However, I think we should already implement a basic solution for both use cases 1. and 2., and later revisit the default setting of the hard limit when we have proper dataflow cancellation. |
There was a problem hiding this comment.
I'm optimistic, but I also don't think it will be soon. Sufficient people have raised concerns about the unknown impact drop_dataflow has on timely and MZ, so we at least need to test this internally for a couple weeks. And then all bets are off regarding how long fixing the findings will take.
There was a problem hiding this comment.
I think there is some new information on that front that should be reflected in the doc.
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks for writing this up! I think we need to:
- Have a small list of open questions at the end of the doc.
- Seek input from @frankmcsherry and @antiguru and maybe @ggnall for that list and agree on some answers.
Based on (2) we can update the doc and refine the list in (1). It might take 2-3 iterations to do it this way, but it would probably be more productive than putting all options of the decision tree on the table at once and ask for a complete decision path from everybody.
| I can imagine 3 use cases, of which the first 2 seem urgent to me: | ||
| 1. _Stopping divergent queries ([#16800](https://github.com/MaterializeInc/materialize/issues/16800)):_ It is very easy to accidentally write a WMR query that runs forever (actually, until OOM in most cases, but that would often take a very long time). This, coupled with the problem that Ctrl+C (or dropping a view) currently doesn't cancel a dataflow ([#2392](https://github.com/MaterializeInc/materialize/issues/2392)), is an unpleasant user experience, as the user has to manually DROP the entire cluster. | ||
| 2. _Debugging state between iterations ([#18362](https://github.com/MaterializeInc/materialize/issues/18362)):_ WMR queries are not so obvious to write, so users often need to debug their queries during development. In such cases, it can be enlightening to inspect the intermediate states between iterations. If we had an option to limit the number of iterations, the user could just run the query repeatedly with successively larger limits. Multiple people expressed their wish for this feature, e.g., [here](https://materializeinc.slack.com/archives/C02CB7L4TCG/p1678266977745819?thread_ts=1678266846.169599&cid=C02CB7L4TCG). | ||
| 3. (_Making the actual logic of the query stop at a fixed number of iterations:_ Some algorithms are unintuitive to express as a loop running until a fixpoint, and instead need a loop that runs a specific number of times. Note that in the above two use cases, we don't expect to run into the limit in production under normal operation, but in this use case a limit will be part of the logic of production queries. Therefore, this use case would have more stringent stabilization requirements than the above two.) |
There was a problem hiding this comment.
What is written here under (3) is actually meant to be tracked by #18362, but #18362 is referenced under (2).
There was a problem hiding this comment.
I changed the text a bit to make it more clear that 3. is about algorithms that require a fixed number of steps.
|
|
||
| ## Rendering | ||
|
|
||
| The rendering part of the design is simple and [uncontroversial](https://materializeinc.slack.com/archives/C041KPFCR98/p1679655472232159?thread_ts=1679653114.252349&cid=C041KPFCR98). We will use `branch_when` to access the pointstamps containing the iteration numbers (after [the consolidation of the result of an iteration](https://github.com/MaterializeInc/materialize/blob/26f23fd7271cf92c7c8b0dc4ba4711999b45999a/src/compute/src/render/mod.rs#L643)). This will give us two streams: rows that are within the limit, and rows that are outside. The latter one we can either throw away or route into the error stream, depending on whether we want to just gracefully stop at the limit with the current state as the final result, or error out when reaching the limit. |
There was a problem hiding this comment.
The latter one we can either [...] route into the error stream
Is this necessary. We might end up consuming a lot of memory just to communicate an message like
The fixpoint wasn't reached within (3) iterations
I can imagine that some derived statistics for the diff in each WMR binding will be useful, but allocating resources proportional to all records that contribute to those non-empty diff does not seem necessary.
If I see such an error, I would most probably want to tweak using the WMR debugging features to get a sense of what exactly is going wrong.
There was a problem hiding this comment.
I'd reduce it down to one record before actually putting it into the error stream.
| # Unresolved questions | ||
| [unresolved-questions]: #unresolved-questions | ||
|
|
||
| - It would be good to know how soon we'll have proper dataflow cancellation (see [#2392](https://github.com/MaterializeInc/materialize/issues/2392), plus [here](https://materializeinc.slack.com/archives/C02PPB50ZHS/p1680179894283039?thread_ts=1680177489.365979&cid=C02PPB50ZHS) and [here](https://materializeinc.slack.com/archives/C02PPB50ZHS/p1680199694602249)). However, I think we should already implement a basic solution for both use cases 1. and 2., and later revisit the default setting of the hard limit when we have proper dataflow cancellation. |
There was a problem hiding this comment.
I think there is some new information on that front that should be reflected in the doc.
| [unresolved-questions]: #unresolved-questions | ||
|
|
||
| - It would be good to know how soon we'll have proper dataflow cancellation (see [#2392](https://github.com/MaterializeInc/materialize/issues/2392), plus [here](https://materializeinc.slack.com/archives/C02PPB50ZHS/p1680179894283039?thread_ts=1680177489.365979&cid=C02PPB50ZHS) and [here](https://materializeinc.slack.com/archives/C02PPB50ZHS/p1680199694602249)). However, I think we should already implement a basic solution for both use cases 1. and 2., and later revisit the default setting of the hard limit when we have proper dataflow cancellation. | ||
| - How large is the iteration overhead on a real cluster with multiple worker threads? If small, maybe we should set the default hard limit to 10000 instead of 1000? |
There was a problem hiding this comment.
After reading the doc, I'm enumerating the list of open questions that I can see:
- Are people OK with the provided use-case distinction. At first glance, to me it seems that (3) is just the (possibly only viable) mechanism to implement (2). We can reduce the discussion surface if we make this assumption. What are some good reasons to keep use cases (2) and (3) separate? Beyond that, I would be good to get confirmation from @frankmcsherry that at least (1) and ({2, 3}) present different goals and it's OK to not aim for a single solution for both.
- Assuming Q1 is cleared, I guess there are enough arguments to steer away from modeling the hard limit for (1) as a system variable. The remaining options are a query-level limit or a per WMR-clause limit. The first alternative is strictly simpler. Do we have strongly arguments for per WMR-clause hard limits? If not, what are the semantics of the global limit (do we sum up the
PointStampcomponents w.r.t. this global limit)? - Assuming Q1 is cleared, do we want to agree on a solution for ({2, 3}) as part of this doc? I think a per-WMR limit there makes more sense.
There was a problem hiding this comment.
- and 3. are similar, but the main difference is that in the debugging use case (2.), you only want to temporarily change the value, i.e., during a debugging session, so including it in the SQL definition doesn't matter much. This means that a session variable might have been ok for it. (This point is not so important anymore, because we moved away from the session/system variables approach, because a system variable was deemed not suitable for use case 1., and I'd say if we have a SQL extension for 1., then it's easy to also have that for 2.)
There was a problem hiding this comment.
Another difference is that a cross join is kind of ok when just debugging, but it would be bad for 3., so a broadcast join would be needed for 3. if we want to simulate the counting in the dataflow itself rather than using the timestamp.
| - D. in a WHERE clause with a special, unmaterializable function, e.g., `WHERE iteration_number() < 100`. | ||
|
|
||
| First of all, note that A. is not good: | ||
| - If we simply want to set a limit for an entire query, i.e., set the same limit for all WMR clauses inside a single query, then B. is better than A., because a SELECT might appear as a subquery. |
There was a problem hiding this comment.
Can't we just say that outer limits override inner limits?
There was a problem hiding this comment.
The user might want to set separate limits. Maybe she wants to debug the outer WMR, for which the inner WMR needs to have no limit.
|
|
||
| The B., C., and D. possibilities differ in usability for queries that have multiple WMR clauses. In these situations, a per-WMR limit (C. or D.) definitely wins in usability over having just one limit for the entire query (B.). | ||
|
|
||
| Note that having separate limits for separate WMR clauses in one query is especially important for the **compositionality** of views. Let's say that Alice writes a view that has just a single WMR. Then Bob writes another query that uses Alice's view, and has a WMR itself. The view is inlined, so Bob's query now has two WMRs. But Bob shouldn't need to care about the internals of Alice's view, and might not even know that the view has a WMR, so he thinks his WMR is the only one. And then he sets a limit for the entire query, which will unexpectedly mess up Alice's WMR that is inlined from inside the view. This problem is somewhat relevant for all the three use cases, but especially for 3. |
There was a problem hiding this comment.
I think most of the people will agree that having separate limits for separate WMR clauses is the best option. I'm also pretty confident that this is the only viable option for solving (3) that does not require manual fix listed in #18362. I'm not sure if we want to allow two separate options for each WMR clause if we want to have different solutions for (1) and ({2, 3}).
There was a problem hiding this comment.
We went with separate limits for separate WMR clauses in the end.
Whether to have also a hard limit is postponed.
|
|
||
| #### C. | ||
|
|
||
| For C., the main difficulty is that `NormalizeLets` merges *sequenced* or *independent* WMR clauses (and sometimes *nested* ones as well, but only when the inner one is not really recursive, I think) into one WMR clause. Setting just one limit of this merged WMR clause would be wrong. One solution would be to have a per-let-binding limit in MIR and LIR (and in rendering), so that different let bindings that came from different WMR clauses with differing limits could retain their differing limits. This needs some improvements in `NormalizeLets`: |
There was a problem hiding this comment.
only when the inner one is not really recursive
If this is correct we can probably just propagate the limit from the other block (or maybe limit + 1, I'm not sure).
There was a problem hiding this comment.
I'm planning to set it to infinite.
|
(The current state of the doc doesn't yet reflect the move away from system/session variables to the SQL approach. I was planning to do a big rewrite/refactoring of the doc once we decide on some open questions within the SQL approach.) |
|
Important new development: it seems that we will actually have dataflow cancellation for WMR dataflows (between iterations) when the user drops a view or hits Ctrl+C. I'll rewrite the design doc: we'll probably postpone implementing the hard limit, and just implement the soft limit for now. |
5fed142 to
680a079
Compare
c919d86 to
087acb2
Compare
|
I rewrote the design doc:
|
teskje
left a comment
There was a problem hiding this comment.
LGTM, provided @MaterializeInc/surfaces agrees with the SQL changes.
Thanks for making the text more concise!
| - We accept an optional `=` before the number. | ||
| - We use the `generate_extracted_config!` macro to process the options. | ||
|
|
||
| The exact keywords are not final yet, and also depends on what the exact keyword will be for WITH MUTUALLY RECURSIVE. |
There was a problem hiding this comment.
Bike-shedding thought: We could frame the soft limit not as a limit but just a fixed number of iterations (e.g. WITH MUTUALLY RECURSIVE (ITERATIONS = 100). The user would think of this in terms of "I want to run 100 iterations" rather than "I want to run until fixpoint or until 100 iterations are reached". Of course both are the same things in our case but the former is perhaps a bit more simple. Also, not using the term "limit" here would free us to use it for a hard limit, if we decided to implement one in the future. Otherwise we may have a hard time finding a good keyword for the hard limit that's not confusing to users.
There was a problem hiding this comment.
Hmm, sounds good, I'll keep this in mind for the final keyword decision process, thanks!
087acb2 to
b1ca241
Compare
|
The soft limit PR is merged, so merging this design doc also. |
Edit: I completely rewrote the design doc. Originally, it was proposing two limits, a hard limit and a soft limit, but then proper dataflow cancellation was implemented (#18718), so now it only proposes the soft limit.
It would be useful to limit the number of iterations in WITH MUTUALLY RECURSIVE, primarily for debugging purposes. This is a design doc for this.
Rendered.
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-protolabel.