persist,dataflow: verify expected as_of when replaying streams#9659
Conversation
2172325 to
0d0ae6f
Compare
danhhz
left a comment
There was a problem hiding this comment.
I'm happy to deal with the merge conflicts. I was holding that PR until our dec benchmarking stuff is done, anyway, so I'll also make sure to hold it for this.
| Ok(snapshot) => { | ||
| let snapshot_since = snapshot.since(); | ||
|
|
||
| if PartialOrder::less_than(&as_of_frontier, &snapshot_since) { |
There was a problem hiding this comment.
oh interesting, i was imagining that we'd have to thread this into the snapshot call. i don't see why this wouldn't work though
| fn replay( | ||
| &self, | ||
| snapshot: Result<DecodedSnapshot<K, V>, Error>, | ||
| as_of_frontier: &Antichain<u64>, |
There was a problem hiding this comment.
doesn't this mean the contract of the operator needs to be that it forwards all data to this frontier before emitting it? I don't see where that happens. we should also add a test for this
same q for the listen_source operator.
the persisted_source operator shouldn't need to do any forwarding itself, but probably wants a debug assert (and maybe also a test?)
There was a problem hiding this comment.
it's not 100% clear to me wether it needs to do that. We have code in the source rendering code that does that:
. Maybe @frankmcsherry has an opinion on this: should the replay source advance times on the data it emits to theas_of, or should it just verify that its since/compaction frontier is not beyond the as_of and rely on the rest of the source pipeline to advance the timestamp?
I'll add the tests for sure, though!
|
@danhhz Well, this is interesting! https://buildkite.com/materialize/tests/builds/25477#5dc126e5-1b90-4aea-b32a-17ba8a2cccd5/126-663 |
I think either is fine. I anticipate the Ideally platform will soon start us on the path of discussing the interface, and what things COMPUTE knows that it can tell STORAGE to allow it to push more stuff down. E.g. MFPs, and snapshot/no, tail/no. What worked great for MFPs so far was an API that allowed the callee to extract any of the constraints to indicate that they will accept responsibility for them (e.g. peel off parts of a MFP, leaving the remaining work in the MFP). If we could do the same with the other arguments, it might make for a fine negotiation protocol, in which the default implementation does not remove anything, until it reaches a point where it is confident that it can guarantee the work is done. |
4bc531f to
714b09e
Compare
|
@dan i added:
|
714b09e to
10b3a4d
Compare
danhhz
left a comment
There was a problem hiding this comment.
LGTM! though I didn't look at the coord changes closely
10b3a4d to
53ed22a
Compare
|
I'm just seeing this now - will look at this after lunch |
|
|
||
| // NOTE: Tables are not sources, but to a large part of the system they look | ||
| // like they are, e.g. they are rendered as a SourceConnector::Local. | ||
| self.sources.insert(entry.id(), frontiers); |
There was a problem hiding this comment.
Does this mean that update_upper has been getting messages for both a table and its index, but has been ignoring the non-index message? Or does the since setting happen only once here? Can you explain a bit more (or add a test) that illustrates the problem this is solving?
There was a problem hiding this comment.
@matt: My understanding is that:
-
Right now only the table's primary index sends back upper information. We use that information to derive a
sincefor both that index, and the raw persisted data (which we set inpersisted_table_allow_compactionto allow the persisted data to also compact up to thatsince) -
The persisted data is compacted by a separate thread, and on restart is only valid up to some
since. -
On restart, we reload the table's primary index with the persisted data, but we don't set its
sinceto reflect that the data is only valid after some times.
@aljoscha I'm not sure that line 499 is doing anything, because unlike a regular source i don't think you can actually use the table the way you can use a unmaterialized source (for now at least). I believe this instead needs to set the since of the table's primary index. I don't have high confidence in this analysis yet though
There was a problem hiding this comment.
Ok, that's useful context. I agree with your recommendation.
There was a problem hiding this comment.
@mjibson There are existing tests for this, basically all tests under test/persistence/user-tables (see https://github.com/MaterializeInc/materialize/tree/main/test/persistence/user-tables). The idea of the tests is that they do a thing, then materialized is stopped and restarted and then the after tests are run to verify.
Persistent user tables are like sources in a way, a source that reads from persistence is created to read from the underlying persistent stream. What I did, is change the API to force dataflow to hand the expected as_of_frontier to the replay() operator (this is the thing that replays persisted data), and to verify that the since (aka compaction frontier) of the persisted stream is compatible with the requested as_of_frontier. This failed all the tests, because we're not correctly calculating an as_of_frontier, because we're not taking the since of the persisted stream into account. The first commit in this PR fixes that, to make the tests pass again.
@ruchirK It is a valid place to put it: with that line, we correctly determine a since and tests pass, without it they don't.
Here is where the since that we put in when bootstrapping is picked up:
materialize/src/coord/src/coord.rs
Line 4118 in b4e6726
materialize/src/coord/src/coord.rs
Line 4143 in b4e6726
The workflow (when restarting is roughly this):
- coordinator bootstraps its metadata, which now includes reading the persistence since for tables and registering it with the source frontiers
- coordinator notices that it needs to render a dataflow for the table
- the code I linked to above is consulted to determine a since
- the index frontier of the arrangement that is about to be rendered is registered
- the "source" is being rendered.
So, as I mentioned above, to a lot of the system the table looks like a source. Not sure that's good, though... 😅
There was a problem hiding this comment.
That makes sense. This construction is a little bit nonintuitive to me though because we use the table's since on startup to initialize the primary index's since and then never update it again. That happens to work implicitly because we know once we've rendered the primary index, we'll never use the table's since again, but it feels very action-at a distance.
It seems like it is more work to directly initialize the index's since frontier, so I'm fine with this, but perhaps would you consider also updating the table source's since frontier when we allow compaction on the index/persisted data and setting the table source's since frontier to zero when we first create it? The way the code is structured now, we only keep an entry in self.sources for a table id after a restart which i suspect is correct right now, but might lead to surprising effects in the future
There was a problem hiding this comment.
I didn't do that initially because of the weird nature of Table "sources", but I will do that! I think it might even fix a bug with disabling/enabling user-indexes. I will have to pick that up again in the new year, though.
53ed22a to
57c7285
Compare
|
@danhhz I had to revert the change that forwards the timestamps of replayed records by the One case where this happens is when restarting with To me, this indicates that we need to forward the timestamp only at the boundary of the source, that is internally we work with the timestamps we get from persistence and then forward at the boundary to the rest of the dataflow. This is what we're currently doing (before this change). So this change would only check that the expected ((I can elaborate in a call if needed.)) |
|
From persist's correctness perspective, I think the |
It sounds like there may be a bug to shake out here about what sinks assume about the data shot at them? It doesn't seem like this is a persistence glitch, even if undoing it prevents naughty behavior elsewhere. I'm interested .. either the call or reading about your conclusions. If we need to file something about sinks, or about getting a better |
|
Oh boy 🙈 , I had a very important typo in there: when I wrote |
a8b3c50 to
f4cb6e7
Compare
|
@ruchirK I updated this with what we talked about last year. Could you please take another look? We're now maintaining a since frontier for the persistence source, even though no-one will currently look at that. It feels a bit more correct to do it, though. |
|
@ruchirK Gentlest of pings. 😅 (I know you've been OoO, just making sure you're aware, because it blocks some things.) |
|
reviewing this one now! |
ruchirK
left a comment
There was a problem hiding this comment.
I looked through the coordinator bits and this all seemed reasonable to me. I left a few small comments.
I'm trying to reason about "are there any other use cases that might be affected by this change" (e.g. exactly once sinks?) and it seems to me that the answer is no, because ... the table always has an undeleteable primary index, and so any query / view definition / sink definition done on the table uses the upper, since of its primary index. is that rationale correct?
| if let Some(df) = df { | ||
| let since_ts = { | ||
| match &table.persist { | ||
| Some(persist) => Some(persist.since_ts), |
There was a problem hiding this comment.
have not yet read where persist.since_ts gets set, but why would this be nonzero when creating a new table?
There was a problem hiding this comment.
read it, and it still seems like new tables should get a since of 0 (because new tables will be newly persisted collections?)
There was a problem hiding this comment.
This is for the case where coordinator bootstraps from scratch but we still have the persist data directory. Probably not very likely today, but maybe more likely when persistence stores to S3? I'm happy to simplify this to just always initialize to [0], though.
| /// TODO: In the future the coordinator should perhaps track a table's upper and | ||
| /// since frontiers directly as it currently does for sources. | ||
| fn persisted_table_allow_compaction(&self, since_updates: &[(GlobalId, Antichain<Timestamp>)]) { | ||
| fn persisted_table_allow_compaction( |
There was a problem hiding this comment.
is the todo above this line now stale?
There was a problem hiding this comment.
I think maybe not? But I also don't know exactly what the comment was aiming at. As it is, we still need this specialized code for tracking the sinces of the backing source, since they are not covered yet by the regular tracking logic. We still have the situation that tables sources always have an index stuck to them, and there can never be a bare "table source".
Before, it could happen that a dataflow was being rendered at an expected `as_of` but the persistent streams that were replayed as part of the dataflow had compacted beyond that. This would manifest in subtle ways, see for example: https://github.com/MaterializeInc/materialize/issues/8606. Now, we thread through the expected `as_of_frontier` and give a meaningful error message when the expected `as_of` cannot be served. Note: it's not possible to provoke this new error message from tests. It will only surface if/when we have a bug in the logic that determines the as_of frontier (on the coordinator) and forwards it to dataflow rendering. Fixes #8608
f4cb6e7 to
ad3524f
Compare
Before, it could happen that a dataflow was being rendered at an
expected
as_ofbut the persistent streams that were replayed as partof the dataflow had compacted beyond that. This would manifest in subtle
ways, see for example:
https://github.com/MaterializeInc/database-issues/issues/2634.
Now, we thread through the expected
as_of_frontierand give ameaningful error message when the expected
as_ofcannot be served.Note: it's not possible to provoke this new error message from tests. It
will only surface if/when we have a bug in the logic that determines the
as_of frontier (on the coordinator) and forwards it to dataflow
rendering.
Fixes MaterializeInc/database-issues#2635
@danhhz This will have conflicts with #9615.
Checklist
This change is