compute: move mz_dataflow_initial_output_duration_seconds metric to the controller#21533
Merged
teskje merged 5 commits intoSep 5, 2023
Conversation
0d940d3 to
fc6a422
Compare
|
This PR has higher risk. In addition to having a knowledgeable reviewer, it may be useful to add observability and/or a feature flag. What's This?
|
vmarcos
approved these changes
Sep 5, 2023
vmarcos
left a comment
Contributor
There was a problem hiding this comment.
Looks fine to me, thanks!
fc6a422 to
2c9beb6
Compare
The # character is considered a comment in verbose mode
This commit removes the `CollectionMetric` infrastructure from the compute replica code. There was only a single per-collection metric, `mz_dataflow_initial_output_duration`, which we want to have as a controller export instead. No other per-collection metrics are currently planned on the replica-side, so there is no need to keep `CollectionMetric` around.
This commit performs some cleanup work on the `ReplicaTask` logic. This mostly consists of factoring out logic into methods and accessing needed state via `self` rather than passing it around in parameters. This commit also introduces `observe_command` and `observe_response` methods, which will be the place where the `initial_output_duration_seconds` metric is updated.
This commit adds the `mz_dataflow_initial_output_duration_seconds` metric back in, this time on the controller side. The metric is per-replica so maintaining it within the replica tasks is easiest. For this the replica task needs to start observing compute commands and responses more closely, to get to know about new collections being created and their frontiers advancing beyond their `as_of`s.
2c9beb6 to
3261d30
Compare
def-
approved these changes
Sep 5, 2023
Contributor
There was a problem hiding this comment.
Based on https://buildkite.com/materialize/coverage/builds/214 seems to be covered in tests. Edit: Actually let me rerun, I think there was a problem.
Contributor
Author
Looks like it finished, but I'm not sure how to interpret the results. |
Contributor
|
Rerun seems fine: https://buildkite.com/materialize/coverage/builds/219 |
Contributor
Author
|
TFTRs! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR moves the
mz_dataflow_initial_output_duration_secondsmetric from being a replica export to a controller export. The main benefit we get from this is that theworker_idlabel is removed, which both makes the metric easier to use and greatly reduces its cardinality.As part of this change, this PR:
Motivation
Part of https://github.com/MaterializeInc/database-issues/issues/5547.
Design doc: #19717.
Tips for reviewer
Look at the commits separately!
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.