[WMR] `NormalizeLets` refreshes types until fixpoint by ggevay · Pull Request #18595 · MaterializeInc/materialize

ggevay · 2023-04-04T18:53:53Z

This takes approach 1. from https://github.com/MaterializeInc/database-issues/issues/5487: Run the key inference in a pessimistic fixpoint loop in refresh_types in NormalizeLets. It seems to me that this covers a lot of cases.

Note that the new fixpoint loop runs not just the key inference, but our entire type inference, including nullability analysis. Therefore, the nullability of types changed in many tests. This should be fine.

@philip-stoev, RQG testing would be really great to do here. I added a fixpoint loop that runs our type inference repeatedly, which is a lot of code, so anything might happen... I also added some non-trivial asserts. If possible, please include exotic scalar functions to exercise all corners of type inference.
https://m.media-amazon.com/images/I/41+f7nqnB3L.jpg

Motivation

This PR adds a known-desirable feature. https://github.com/MaterializeInc/database-issues/issues/5487

Tips for reviewer

First commit is from #18554, so please review it there.

Before reviewing the new comment in ColumnKnowledge, I suggest looking at the new test that I added in column_knowledge.slt as well as the test change in normalize_lets.slt.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

philip-stoev · 2023-04-05T10:03:20Z

I am on it.

philip-stoev · 2023-04-05T10:43:50Z

+                // `refresh_types` might diverge.
+                // (Note that `prior` can't come from a shadowed binding, because there is no
+                // shadowing.)
+                assert!(prior.iter().all(|prior| {


That is quite the mouthful. I am hitting this assertion . will try to simplify the query now.

Test case

CREATE TABLE t1 (f1 INTEGER PRIMARY KEY , f2 INTEGER NOT NULL, f3 INTEGER); INSERT INTO t1 VALUES (1,1,1); EXPLAIN WITH MUTUALLY RECURSIVE c1 (f1 INTEGER, f2 INTEGER, f3 INTEGER) AS ( SELECT * FROM t1 UNION ALL SELECT f3 + 3 + 1 AS f1 , f1 + 7 + 1 AS f2 , f2 + f3 + 1 AS f3 FROM c1 WHERE f3 < 100 ) SELECT * FROM c1;

Please add to SLT once fixed.

@ggevay I keep hitting that assertion no matter how much I dumb down the generated WMR queries. Could it be that it triggers on all queries that have the + operator? Unfortunately by using increment I ensure that the queries I run are halting, so it needs to work for testing correctness.

EDIT: nevermind, it happened on a query without the +

Thanks for this test! The problem is that we run the new fixpoint loop not from Top, but from what a previous run of NormalizeLets gives us, which means that our nullability analysis would have to infer at least the same non-nullability now that it did in that earlier run of NormalizeLets, which seems almost impossible to ensure. For example, what happens in this query is that the earlier NormalizeLets operated on this:

Union Get u1 Project (#3..=#5) Map (((#2 + 3) + 1), ((#0 + 7) + 1), ((#1 + #2) + 1)) Filter (#2 < 100) Get l0

where non-nullability of #2 inside the Map comes from the Filter, but in a later run of NormalizeLets we have the Filter and Map in the opposite order (due to a CanonicalizeMfp I guess):

Union Get u1 Project (#3..=#5) Filter (#2 < 100) Map (((#2 + 3) + 1), ((#0 + 7) + 1), ((#1 + #2) + 1)) Get l0

where we would need to propagate non-null knowledge backwards to infer the same non-nullability.

We could fix this by always starting the fixpoint loop from scratch (from Top) instead of what we currently have in the Get types, but @aalexandrov pointed out an other issue with the pessimistic approach, so I might completely rewrite the PR to the optimistic approach instead (i.e., 2. from https://github.com/MaterializeInc/materialize/issues/18553).

ggevay · 2023-04-05T13:09:00Z

Turns out that the optimistic approach (i.e., 2. from https://github.com/MaterializeInc/database-issues/issues/5487) would be better for several reasons:

(Philip's query shows that the pessimistic approach would have to start from Top, complicating the initialization.)
It is hard to be sure that our nullability analysis is actually monotonic.
As @aalexandrov pointed out, our unique key analysis might very well be non-monotonic, because in some situations it might decide to omit some keys to avoid an exponential blowup when, e.g., all subsets of the columns would be keys.
In general, the optimistic approach is more precise. The pessimistic approach would be kind of ok for the key inference (with accounting for non-monotonicity by unioning the sets of keys at every iteration), but we are also inferring nullability here, which is better to do optimistically, as agreed in the ColumnKnowledge PR.

I'll rewrite the PR.

This means that the key and nullability info in the types will propagate among all bindings, even recursive ones. Closes #18553

ggevay · 2023-04-12T14:22:06Z

I've put rewriting this on hold. I'll do the WMR limits first, and then will get back to this. It's surprisingly tricky.

ggevay · 2023-05-24T19:50:06Z

Closing this for now, as it's super complicated. It would need at least several days, maybe more than a week to properly do this. I suggest getting back to this when we have

more WMR users, and/or
more examples where the weak key inference is actually hurting some optimization.

I added some thoughts on the issue about how we could approach this, but there are some open questions.

ggevay added A-optimization Area: query optimization and transformation A-compute Area: compute labels Apr 4, 2023

ggevay force-pushed the wmr-keys branch 8 times, most recently from 523710a to faf449e Compare April 5, 2023 09:55

ggevay marked this pull request as ready for review April 5, 2023 09:58

ggevay requested review from a team, aalexandrov and philip-stoev April 5, 2023 09:58

philip-stoev reviewed Apr 5, 2023

View reviewed changes

ggevay force-pushed the wmr-keys branch 8 times, most recently from 2a0d46f to 8944879 Compare April 6, 2023 17:35

ggevay marked this pull request as draft April 6, 2023 18:12

ggevay added 4 commits April 6, 2023 20:15

[WMR] NormalizeLets refreshes types until fixpoint

cb1225d

This means that the key and nullability info in the types will propagate among all bindings, even recursive ones. Closes #18553

wip

f6a543d

wip

3922eff

debugging

3b27f38

ggevay added 3 commits April 6, 2023 20:15

debugging

2d0bba6

run CI

29f229a

wip

5711405

ggevay force-pushed the wmr-keys branch from 8944879 to 5711405 Compare April 6, 2023 18:16

Update inside binding loop

18aa418

ggevay closed this May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WMR] `NormalizeLets` refreshes types until fixpoint#18595

[WMR] `NormalizeLets` refreshes types until fixpoint#18595
ggevay wants to merge 8 commits into
MaterializeInc:mainfrom
ggevay:wmr-keys

ggevay commented Apr 4, 2023 •

edited

Loading

Uh oh!

philip-stoev commented Apr 5, 2023

Uh oh!

philip-stoev Apr 5, 2023

Uh oh!

philip-stoev Apr 5, 2023

Uh oh!

philip-stoev Apr 5, 2023 •

edited

Loading

Uh oh!

ggevay Apr 5, 2023

Uh oh!

ggevay commented Apr 5, 2023

Uh oh!

ggevay commented Apr 12, 2023

Uh oh!

ggevay commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggevay commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewer

Checklist

Uh oh!

philip-stoev commented Apr 5, 2023

Uh oh!

philip-stoev Apr 5, 2023

Choose a reason for hiding this comment

Uh oh!

philip-stoev Apr 5, 2023

Choose a reason for hiding this comment

Uh oh!

philip-stoev Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggevay Apr 5, 2023

Choose a reason for hiding this comment

Uh oh!

ggevay commented Apr 5, 2023

Uh oh!

ggevay commented Apr 12, 2023

Uh oh!

ggevay commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggevay commented Apr 4, 2023 •

edited

Loading

philip-stoev Apr 5, 2023 •

edited

Loading