-
Notifications
You must be signed in to change notification settings - Fork 510
dataflow,sinks: produce periodic updates for TAIL with progress=true #7272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
aljoscha
merged 2 commits into
MaterializeInc:main
from
aljoscha:7257-periodic-tail-progress
Jul 7, 2021
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| # Copyright Materialize, Inc. and contributors. All rights reserved. | ||
| # | ||
| # Use of this software is governed by the Business Source License | ||
| # included in the LICENSE file at the root of this repository. | ||
| # | ||
| # As of the Change Date specified in that file, in accordance with | ||
| # the Business Source License, use of this software will be governed | ||
| # by the Apache License, Version 2.0. | ||
|
|
||
| # | ||
| # Make sure that TAIL WITH (progress=true) emits periodic progress | ||
| # messages even if there's no new data. | ||
| # | ||
|
|
||
| $ set-regex match=\d{13} replacement=<TIMESTAMP> | ||
|
|
||
| > CREATE TABLE t1 (f1 INTEGER); | ||
|
|
||
| > INSERT INTO t1 VALUES (123); | ||
|
|
||
| > BEGIN | ||
|
|
||
| > DECLARE c CURSOR FOR TAIL t1 WITH (progress=true); | ||
|
|
||
| # Verify there is a progress row available in the first batch. | ||
| > FETCH 2 c | ||
| <TIMESTAMP> false 1 123 | ||
| <TIMESTAMP> true <null> <null> | ||
|
|
||
| # Now ask (and possibly wait) for another progress row. | ||
| > FETCH 1 c | ||
| <TIMESTAMP> true <null> <null> | ||
|
|
||
| > COMMIT |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is called unconditionally below, why do we need to create a progress event here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that there will be a progress statement in the same "batch" of data sent back to the TAIL client. With this (which is also the previous behavior that some tests ensure) you get:
Without this, you would get:
The reason (for the behaviour, not for why the client and tests want it like this) is that the input frontier is not yet "caught up" to the upper of the batch, when processing a batch. Only on the next invocation of the sink would the input frontier be at the upper of the batch, which is why we get that progress update as a separate batch when we don't emit it based on the batch upper right away.
This affects what a
FETCH ALL cwould return on the client, which is also affected by aTIMEOUT, if any.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just feels wrong to declare the upper of the batch as a new frontier if timely hasn't explicitly told us that the input hasn't advanced. Also, I don't think there are any guarantees as to how the records are broken up when you FETCH a cursor. So long as we send data with the same semantics as before we should be good. Maybe @frankmcsherry can comment more on how to handle this
EDΙT:
FETCH ALL cwould block forever on a cursor with TAIL, no?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy to drop that extra code! 👍 It's just that this is the current behaviour that tests expect.
I think the
ALLjust means "all available", because the tests use this.See here for reference: https://github.com/MaterializeInc/materialize/blob/main/src/materialized/tests/sql.rs#L247
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is guaranteeing that if we see a batch with some upper then we won't see any timestamp less than that in the future (I'm not sure what guarantees this, but we document it in the tail docs), so this should be safe. FETCH for TAIL is special and will return up to the requested number of rows as long as some are available. If no rows are available it'll return back to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the arrangement implementation that guarantees it. Change batches are only handed out once the frontier passes their upper. The arrangement operator will start emitting those batches but the frontier itself will not propagate to the downstream operator yet.
In a way, the upper of the batch gives us a quicker update than waiting for the frontier update.