Skip to content

clear durability frontier on source drop#11945

Closed
guswynn wants to merge 1 commit into
MaterializeInc:mainfrom
guswynn:dur
Closed

clear durability frontier on source drop#11945
guswynn wants to merge 1 commit into
MaterializeInc:mainfrom
guswynn:dur

Conversation

@guswynn

@guswynn guswynn commented Apr 20, 2022

Copy link
Copy Markdown
Contributor

The SourceToken's created by import_source/create_source are ultimately held in the capture timely operator:

&client_token,
move |event| {
WorkerResponse::Response(Response {
subscription_id,
data: Ok(event),
})
},
));
err.inner.capture_into(UnboundedEventPusher::new(
self.worker_tx.clone(),
token,

Apparently, for those to be dropped, the durability_capability of the source operator needs to be exhausted (advanced to &[]. This capability, as far as I can tell, is derived from the data capability of the source, and can lag behind.

Curiously, SimpleSource sources always advance this cap, and are correctly dropped when DROP SOURCE is used. SourceReader sources (the majority, including kafka), continue to advance the durability cap to the LAST durability_frontier of the timestamp binding box"

// Maintain a capability set that tracks the durability frontier.
durability_cap.downgrade(timestamp_histories.durability_frontier());
. This pr causes us to exhaust that frontier when we remove a source.

Motivation

  • This PR fixes a previously unreported bug.

Tips for reviewer

I could be misunderstanding some subtle ordering issue here, and its unclear to me if advancing the durability frontier causes the system to make some sort of misguided claim that "the source is finished, forever". If that is the case, then we will likely need to change the durability_frontier api to return an Option<Antichain<Timestamp>> instead, to indicate that the current state of the source is "dead". However, this may not matter, as dropped sources have id's who are never reused, so in some sense, the entirety of the source as we know it is durably persisted. One subtlety here is that sources can live and produce more data for a short time after they are dropped, as the system realizes they are dead (well, i may be wrong about this).

Separately, @petrosagg ts_source_mapping looks like it is NEVER read, and should probably be removed?

Testing

  • This PR has adequate test coverage / QA involvement has been duly considered.

Release notes

This PR includes the following user-facing behavior changes:

@guswynn

guswynn commented Apr 22, 2022

Copy link
Copy Markdown
Contributor Author

closing in favor at #11987, with possible adjustments

@guswynn guswynn closed this Apr 22, 2022
@guswynn guswynn deleted the dur branch May 18, 2022 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant