Skip to content

persist: take persistence since into account when calculating source since#9656

Merged
aljoscha merged 1 commit into
MaterializeInc:mainfrom
aljoscha:is-9655-consider-persistence-since
Jan 11, 2022
Merged

persist: take persistence since into account when calculating source since#9656
aljoscha merged 1 commit into
MaterializeInc:mainfrom
aljoscha:is-9655-consider-persistence-since

Conversation

@aljoscha
Copy link
Copy Markdown
Contributor

@aljoscha aljoscha commented Dec 17, 2021

Before, all sources started out with a since frontier of [0] when
created, without looking at the since frontier (aka compaction
frontier) of persisted streams that are involved in a persistent
source.

Now, we look at the since of the primary stream when starting to read from a
source. We only need to look at the primary stream because this is what
carries the actual data of the source that we replay when starting up.

This was not strictly a bug, because we never compact persistent
sources. We do have to resolve this issue, though, in order to
enable compaction.

Fixes MaterializeInc/database-issues#2940

Note: I'll have to find someone from the coordinator team to also take a look at this because I'm messing with sinces and that often has the potential to introduce incorrectness.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any user-facing behavior changes.

This change is Reviewable

@aljoscha aljoscha requested review from danhhz and ruchirK December 17, 2021 13:44
Copy link
Copy Markdown
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but we should get a coord person to look at it

Copy link
Copy Markdown
Contributor

@ruchirK ruchirK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread src/coord/src/persistcfg.rs Outdated
// because `seal` is `seal(u64)` but the return value suggests there could be more. Also: if
// the upper is a true multi-dimensional frontier in the future, the logic that determines a
// common upper seal timestamp will become a bit more complicated.
// TODO: We have a mismatch here: we know that these frontiers always contains only one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: now it should be "these frontiers always contain" (don't bother redoing ci just for this)

Comment thread src/coord/src/coord.rs
SourceConnector::External { persist, .. } => {
persist.as_ref().map(|p| p.primary_stream.since_ts)
}
_ => None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like you could save yourself the unwrap and make this Some(0) for the default case (also below)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I thought about that, but I wanted to keep it that way because the Some(since_ts) really is a weird way of turning the one since_ts into an iterator. I wanted to keep that signalling, though I can understand why it's weird.

@aljoscha aljoscha force-pushed the is-9655-consider-persistence-since branch from b220bf8 to adc4fb9 Compare January 6, 2022 14:29
Comment thread src/coord/src/coord.rs Outdated
self.sources.insert(entry.id(), frontiers);
}
CatalogItem::Table(table) => {
println!("bootstrapping table: {:?}", table);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional or debugging info?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this one's not meant for merging. I have to clean this up once #9659 is in because this PR partially contains changes from that other one.

Comment thread src/dataflow-types/src/types.rs Outdated
/// And when we restart, we start from step 1., at which time we are guaranteed not to have a
/// source running already.
pub upper_seal_ts: u64,
// TODO: This duplicates the above description but seems fine?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah seems fine, let's remove the TODO :)

…since

Before, all sources started out with a since frontier of [0] when
created, without looking at the since frontier (aka compaction
frontier) of persisted streams that are involved in a persistent
source.

Now, we look at the since of the primary stream when starting to read
from a source. We only need to look at the primary stream because this
is what carries the actual data of the source that we replay when
starting up.

This was not strictly a bug, because we never compact persistent
sources. We do have to resolve this issue, though, in order to
enable compaction.

Fixes #9655
@aljoscha aljoscha force-pushed the is-9655-consider-persistence-since branch from adc4fb9 to 39cca42 Compare January 11, 2022 10:18
@aljoscha aljoscha merged commit 99616c2 into MaterializeInc:main Jan 11, 2022
@aljoscha aljoscha deleted the is-9655-consider-persistence-since branch January 11, 2022 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants