Skip to content

storage: create a separate service for each source#12770

Merged
benesch merged 1 commit into
MaterializeInc:mainfrom
benesch:pod-per-source
Jun 1, 2022
Merged

storage: create a separate service for each source#12770
benesch merged 1 commit into
MaterializeInc:mainfrom
benesch:pod-per-source

Conversation

@benesch

@benesch benesch commented May 31, 2022

Copy link
Copy Markdown
Contributor

Create a separate orchestrator service for each source. This means a
process per source when running locally and a pod per source when
running in Kubernetes. This is our storage scalability story for
Materialize Platform.

This will cause some fallout. As written, this creates a process for
every system table, which is hefty.

Fix MaterializeInc/cloud#2708.

Motivation

  • This PR adds a known-desirable feature.

Tips for reviewer

Review just the last commit! The first two commits are from #12798 and will be rebased away after that PR merges.

Testing

  • This PR has adequate test coverage / QA involvement has been duly considered.

Release notes

This PR includes the following user-facing behavior changes:

  • n/a

@benesch benesch requested review from necaris and petrosagg May 31, 2022 06:40
ports: vec![
ServicePort {
name: "controller".into(),
port_hint: 2100,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be assigned.ports["controller"], and similarly for below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nope! assigned is only in scope on L420-429. The idea is that a service requests a particular port and you may or may not actually get that port. With the process orchestrator, for example, not every storaged process can get port 2100. But in Kubernetes since each pod is isolated you can hand out 2100 to every pod.

memory_limit: None,
scale: NonZeroUsize::new(1).unwrap(),
labels: HashMap::new(),
availability_zone: None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to block this for a prototype but something we should be careful about, if we care about co-locating storaged pods in an AZ for network traffic reasons (ingress / egress, latency)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we'll need to think through how we want to allow users to constrain their sources to AZs!

@necaris necaris left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far, per the existing orchestrator semantics in Kubernetes!

benesch added a commit to benesch/materialize that referenced this pull request May 31, 2022
Since MaterializeInc#12216, tables are now entirely handled by the controller. There
is no longer a need to send a `CreateSourceCommand` to the `storaged`
process for table sources.

We should eventually adjust the types here to enforce this statically,
but this quick fix will unblock MaterializeInc#12770.
benesch added a commit that referenced this pull request May 31, 2022
Since #12216, tables are now entirely handled by the controller. There
is no longer a need to send a `CreateSourceCommand` to the `storaged`
process for table sources.

We should eventually adjust the types here to enforce this statically,
but this quick fix will unblock #12770.
benesch added a commit to benesch/materialize that referenced this pull request Jun 1, 2022
Resolve a TODO to read the Debezium transactional metadata source out of
persist, rather than re-rendering the source.

This PR will unblock creating a pod per source (MaterializeInc#12770), but it is
blocked on reverting (MaterializeInc#12082), which is no longer necessary now that TCP
boundary has been removed.
benesch added a commit to benesch/materialize that referenced this pull request Jun 1, 2022
Resolve a TODO to read the Debezium transactional metadata source out of
persist, rather than re-rendering the source.

This PR will unblock creating a pod per source (MaterializeInc#12770), but it is
blocked on reverting (MaterializeInc#12082), which is no longer necessary now that TCP
boundary has been removed.
benesch added a commit to benesch/materialize that referenced this pull request Jun 1, 2022
Resolve a TODO to read the Debezium transactional metadata source out of
persist, rather than re-rendering the source.

This PR will unblock creating a pod per source (MaterializeInc#12770), but it is
blocked on reverting (MaterializeInc#12082), which is no longer necessary now that TCP
boundary has been removed.
@benesch benesch marked this pull request as ready for review June 1, 2022 04:24
@benesch

benesch commented Jun 1, 2022

Copy link
Copy Markdown
Contributor Author

@petrosagg this is ready for review! I expect all tests to pass and will rebase after #12798 merges.

@petrosagg

Copy link
Copy Markdown
Contributor

Yikes, that recursive struct mapping introduced quite a bit of boilerplate. I forgot that we rely on the storaged instance having the metadata in its local state.

I explored an alternative path that makes storage ingestion descriptions work just like dataflow descriptions where they must carry with them a source_imports map with the metadata of any dependent sources. PR here #12805 What do you think?

@benesch

benesch commented Jun 1, 2022

Copy link
Copy Markdown
Contributor Author

I explored an alternative path that makes storage ingestion descriptions work just like dataflow descriptions where they must carry with them a source_imports map with the metadata of any dependent sources. PR here #12805 What do you think?

Looks great!

@benesch benesch enabled auto-merge (rebase) June 1, 2022 14:27

@petrosagg petrosagg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

Create a separate orchestrator service for each source. This means a
process per source when running locally and a pod per source when
running in Kubernetes. This is our storage scalability story for
Materialize Platform.

This will cause some fallout. As written, this creates a process for
every system table, which is hefty.

Fix MaterializeInc/cloud#2708.
@benesch benesch merged commit 184975c into MaterializeInc:main Jun 1, 2022
@benesch benesch deleted the pod-per-source branch June 1, 2022 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants