compute: introduce a CollectionState#20090
Merged
Merged
Conversation
2b5b529 to
fd82450
Compare
6 tasks
vmarcos
reviewed
Jun 30, 2023
vmarcos
left a comment
Contributor
There was a problem hiding this comment.
This change is pushing the code in a healthy direction, so thanks for investing the time here! The first commit is pretty uncontroversial IMO, and I had a couple of smaller comments on the second.
6b5645e to
63604ee
Compare
Contributor
Author
|
Alright, this is RFAL! The changes I made since you commented, @vmarcos:
|
This commit just moves some code around, to slightly improve readability. * Add a `ComputeState` constructor method. This allows us to make two of the `ComputeState` fields immediately private and we will likely be able to tighten the interface further in the future. * Factor out collection dropping code from `handle_allow_compaction` into a `drop_collection` method.
74101ac to
bcaf8a1
Compare
antiguru
approved these changes
Jul 20, 2023
antiguru
left a comment
Member
There was a problem hiding this comment.
This seems like a sane thing to do! Thank you
Prior to this commit, the per-collection state tracked by compute was spread over several `BTreeMap`s in `ComputeState`: * `sink_tokens` * `sink_write_frontiers` * `flow_control_probes` * `reported_frontiers` This makes it hard to keep track of the collection state as a whole and invites bugs caused by inconsistencies (e.g. we might forget to drop all of the state for a collection). This commit refactors the `ComputeState` by replacing the four per-collection fields mentioned above with a single `collections` map. Values in this map are `CollectionState` objects that each contain the entire state specific to a single collection. An exception is trace state, which is kept in the `TraceManager` and pulling that apart seemed too much of a hassle.
bcaf8a1 to
bae5597
Compare
Contributor
Author
|
TFTRs! |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prior to this PR, the per-collection state tracked by compute was spread over several
BTreeMaps inComputeState:*
sink_tokens*
sink_write_frontiers*
flow_control_probes*
reported_frontiersThis made it hard to keep track of the collection state as a whole and invited bugs caused by inconsistencies (e.g. we might forget to drop all of the state for a collection).
This PR refactors the
ComputeStateby replacing the four per-collection fields mentioned above with a singlecollectionsmap. Values in this map areCollectionStateobjects that each contain the entire state specific to a single collection. An exception is trace state, which is kept in theTraceManagerand pulling that apart seemed too much of a hassle.Motivation
As part of implementing the proposed
mz_dataflow_initial_output_duration_secondscompute metric (draft PR: #20126), I noticed that with the current code that would mean adding two new per-collectionComputeStatefields, one for keeping the the delete-on-drop collection metrics and one for remembering the time at which the collection was installed. I figured it was worth to invest some time to clean up our collection tracking.Tips for reviewer
This PR includes a minor cleanup commit that I already added in #18442. It would be nice if we could finally land that too.
Most of the changes are pretty mechanical, replacing a direct field access on
ComputeStatewith an access tocollectionsand then one of its fields. A notable exception ifreport_compute_frontierswhere I replaced the previous logic of iterating over trace frontiers then sink frontiers by a single iteration over all collections. I then inlined the frontier update closure which was only needed to deduplicate code. I suggest you look at both implementations side-by-side to convince yourself that they do the same thing, rather than trying to decipher the diff.Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.