Skip to content

storage: read debezium transactional metadata from persist#12798

Closed
benesch wants to merge 2 commits into
MaterializeInc:mainfrom
benesch:txn-reading
Closed

storage: read debezium transactional metadata from persist#12798
benesch wants to merge 2 commits into
MaterializeInc:mainfrom
benesch:txn-reading

Conversation

@benesch

@benesch benesch commented Jun 1, 2022

Copy link
Copy Markdown
Contributor

Resolve a TODO to read the Debezium transactional metadata source out of
persist, rather than re-rendering the source.

This PR will unblock creating a pod per source (#12770), but it is
blocked on reverting (#12082), which is no longer necessary now that TCP
boundary has been removed.

Motivation

  • This PR adds a known-desirable feature.

Testing

  • This PR has adequate test coverage / QA involvement has been duly considered.

Tips for reviewer

  • Go commit by commit.
  • @guswynn @frankmcsherry the first commit is for you two, since it essentially reverts Correctly shutdown sources on Drop #12082 because Petros's PR has made Correctly shutdown sources on Drop #12082 unnecessary.
  • @petrosagg @cjubb39 the second commit is for you two. It is a lot scarier than it looks. The tl;dr is that I parameterized the DebeziumTransactionMetadata struct so that it either stores the global ID of the transaction metadata source or the persist shard ID, depending on where in the codebase we are. This matches the approach taken for DataflowDesc. The reason the diff is so large is because converting from SourceDesc<GlobalId> to SourceDesc<CollectionMetadata> involves a massive amount of boilerplate since the DebeziumTransactionMetadata struct is nested so deeply. I factored the boilerplate into map_storage_metadata methods on each type, so it should be very mechanical to review.

Release notes

This PR includes the following user-facing behavior changes:

  • n/a

benesch added 2 commits June 1, 2022 00:13
PR MaterializeInc#12082 converted source tokens to thread-safe `Arc`s to be compatible
with the TCP storage/compute boundary, but since MaterializeInc#12216 replaced the TCP
boundary with persist we can go back to Rcs.
Resolve a TODO to read the Debezium transactional metadata source out of
persist, rather than re-rendering the source.

This PR will unblock creating a pod per source (MaterializeInc#12770), but it is
blocked on reverting (MaterializeInc#12082), which is no longer necessary now that TCP
boundary has been removed.
@benesch benesch marked this pull request as ready for review June 1, 2022 04:18
@guswynn

guswynn commented Jun 1, 2022

Copy link
Copy Markdown
Contributor

first commit looks good to me, but note that it can be minimized by just changing the Arc<SyncActivator> in the SourceToken into an Activator, and using Rc::upgrade over Arc::upgrade. This has the advantage of avoiding extra interior-mutability, but is likely immeasurable in practice

@petrosagg

Copy link
Copy Markdown
Contributor

Copying my comment from the other PR:

Yikes, that recursive struct mapping introduced quite a bit of boilerplate. I forgot that we rely on the storaged instance having the metadata in its local state.

I explored an alternative path that makes storage ingestion descriptions work just like dataflow descriptions where they must carry with them a source_imports map with the metadata of any dependent sources. PR here #12805 What do you think?

@benesch

benesch commented Jun 1, 2022

Copy link
Copy Markdown
Contributor Author

Superseded by #12805.

@benesch benesch closed this Jun 1, 2022
@benesch benesch deleted the txn-reading branch June 1, 2022 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants