compute: mz_dataflow_initial_output_duration_seconds#20126
Conversation
5fddd8b to
b66a0a7
Compare
This commit introduces a (still empty) `CollectionMetrics` type that will contain per-collection metrics, and adds it to the `CollectionState` of non-transient collections. No metrics are collected for transient collections, to reduce metric cardinality.
b66a0a7 to
963721c
Compare
| /// | ||
| // TODO(teskje): Now that we explicitly track the collection's `as_of`, we might be able to | ||
| // simplify this to an `Antichain<Timestamp>` again. | ||
| reported_frontier: ReportedFrontier, |
There was a problem hiding this comment.
Made private to make it more likely that the different paths that update the reported frontier also call observe_snapshot_produced. Unfortunately, most of these paths are in the same module and there is nothing but discipline keeping us from modifying reported_frontier directly.
antiguru
left a comment
There was a problem hiding this comment.
Thanks, I think to looks fine. I wonder if we can improve the testing of this PR. If I recall correctly, we don't have a way to test the metrics exported by a replica, so it's not easy to do this at the moment. Maybe we should invest a bit more in this? (Or, file an issue so we don't forget!)
8f5ca68 to
abc06cc
Compare
|
Added an mzcompose-based cluster test. And it found a bug, so huge thanks for insisting on a test @antiguru! |
This commit introduces a new replica metric, `mz_dataflow_initial_output_duration_seconds`, that tracks the time from the installation of a compute collection until it first produced any outputs. This is not necessarily the time until a dataflow has been fully hydrated (depending on your definition of 'hydration'), but might be a good stand-in.
This commit adds a new mzcompose test to verify that the `mz_dataflow_initial_output_duration_seconds` metric works as expected. Future dataflow metrics can be covered in the same test.
abc06cc to
f220133
Compare
vmarcos
left a comment
There was a problem hiding this comment.
This will be extremely helpful, thanks!
|
TFTRs! |
This PR introduces a new replica metric,
mz_dataflow_initial_output_duration_seconds, that tracks the time from the installation of a compute collection until it first produced any outputs. This is not necessarily the time until a dataflow has been fully hydrated (depending on your definition of 'hydration'), but might be a good stand-in.From the design doc (#19717):
Demo
You can try the metric out by running these commands in psql:
Then check the metrics endpoint and you should see an entry with a time close to the duration of that final SELECT:
When you drop the index, the metrics entry will go away too.
Motivation
Part of MaterializeInc/database-issues#5547.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.