Skip to content

Add BlueprintZoneFilter to all_omicron_zones#5348

Merged
andrewjstone merged 15 commits into
mainfrom
sunshowers/spr/wip-add-blueprintzonefilter-to-all_omicron_zones
Apr 11, 2024
Merged

Add BlueprintZoneFilter to all_omicron_zones#5348
andrewjstone merged 15 commits into
mainfrom
sunshowers/spr/wip-add-blueprintzonefilter-to-all_omicron_zones

Conversation

@sunshowers

@sunshowers sunshowers commented Mar 29, 2024

Copy link
Copy Markdown
Contributor

I'm not actually sure these are quite correct, but putting these up there for
an early look.

TODO:

  • Add some tests, there are a bunch of decisions we made here.
  • Are all these decisions correct?
  • Also need to do this filtering for direct joins against the bp_omicron_zone table -- but that will be easier once the zones are a simple column on [WIP] [reconfigurator] start removing expunged zones #5211.

Created using spr 1.3.6-beta.1
@sunshowers sunshowers marked this pull request as draft March 29, 2024 21:55
@andrewjstone

Copy link
Copy Markdown
Contributor

This is closer, I added a single test so far that ensures that filtering around external dns works. A few more tests may be warranted. With #5438, @sunshowers added the disposition field and filtered the one join on bp_omicron_zones we found. I also think the decisions made here are correct, so checked that box above as well.

@andrewjstone andrewjstone changed the title [WIP] add BlueprintZoneFilter to all_omicron_zones Add BlueprintZoneFilter to all_omicron_zones Apr 10, 2024
@andrewjstone andrewjstone marked this pull request as ready for review April 10, 2024 16:43
@andrewjstone

Copy link
Copy Markdown
Contributor

I think this is good to go.

@andrewjstone andrewjstone requested a review from jgallagher April 10, 2024 16:44

@jgallagher jgallagher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the PR where we're also supposed to fix up DataStore::vpc_resolve_to_sleds() (which implicitly uses all_omicron_zones, sorta, by directly querying the bp_omicron_zone table, but doesn't currently consider the zone disposition)? Nevermind! The vpc_resolve_to_sleds() fix landed in #5238

Comment thread dev-tools/omdb/src/bin/omdb/db.rs Outdated
Comment thread nexus/reconfigurator/execution/src/dns.rs Outdated
Comment thread nexus/types/src/deployment.rs Outdated
) -> impl Iterator<Item = (Uuid, &OmicronZoneConfig)> {
self.blueprint_zones.iter().flat_map(|(sled_id, z)| {
z.zones.iter().map(|z| (*sled_id, &z.config))
self.blueprint_zones.iter().flat_map(move |(sled_id, z)| {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • Can we remove all_blueprint_zones() entirely now? (Last time I looked, every caller immediately mapped the results of that function to just the OmicronZoneConfig, but was calling it because this function didn't take a filter yet.)
  • If not, could this function be implemented in terms of it instead? self.all_blueprint_zones(filter).map(|(sled_id, z)|(sled_id, &z.config)) I think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I have a wip change that needs all_blueprint_zones.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll go with door number 2 then - can we rewrite the implementation of this to use all_blueprint_zones()?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Done in 81df651

Comment thread nexus/types/src/deployment.rs Outdated
Self::InService => match filter {
BlueprintZoneFilter::All => true,
BlueprintZoneFilter::SledAgentPut => true,
BlueprintZoneFilter::Crucible => true,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this name. This:

all_omicron_zones(BlueprintZoneFilter::Crucible)

looks like it's filtering to the zones that have zone type Crucible, to me. I'm not even 100% sure what we're saying here, exactly. I would understand SledFilter::ShouldHaveCrucibleZones, but what does BlueprintZoneFilter::Crucible mean when applied to a non-crucible zone?

(I have similar concerns about ::External and ::SledAgentPut, but this seemed like the easiest one on which I could state the concern.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would understand SledFilter::ShouldHaveCrucibleZones, but what does BlueprintZoneFilter::Crucible mean when applied to a non-crucible zone?

I feel like this comment somewhat questions the overall premise of the BlueprintZoneFilter. We could indeed have SledFilter filter out any unnecessary zones by looking at zone disposition, but that would likely be somewhat incoherent as the sled is now filtering zones based on disposition disposition. I totally get what you are saying here though, and I think you have a point. You are talking about filtering on behaviors or what should happen as a result of a filter matched with a disposition, rather than individual zones, since a filter for an individual zone doesn't do anything for other zones. However, crucible zones are kinda special and they do go on all disks, unless the disposition is expunged. I wonder if instead of specifying crucible, we could cover other zones with this similar property, such as SledAgentPut, and even VpcFirewall. Those all get deployed unless expunged. The downside of that is lack of specificity in the caller.

I'm kinda ranting here, but I wonder if there's a larger change to be made...

@sunshowers sunshowers Apr 10, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... the main reason we introduced BlueprintZoneFilter, versus for example passing in a callback that returns a boolean, is to ensure that we have an accounting about every place that is making decisions about which zones it's considering.

In this case, BlueprintZoneFilter::Crucible is used to identify the set of zones for which we expect active dataset records to exist.

But you bring a really important point up: that BlueprintZoneFilter::Crucible can return non-Crucible zones in a manner that is somewhat surprising. So... what if BlueprintZoneFilter::Crucible also filtered based on zone kind? The first thing ensure_crucible_dataset_records_exist does is to not consider non-Crucible zones, so it would fit right in.

Also, I think it may be worth renaming this to CrucibleDatasetRecords or similar.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth changing the name as well. I will go ahead and do that. As for SledAgentPut, I think that one is actually extremely useful. It provided a solid mechanism for limiting which zones actually get included in the put message to sled agent.

Right now, External really only maps to external DNS, unless I missed something. Maybe it's actually worth making that one more specific?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retract my statement on External. I like how it covers multiple things. How about renaming it to ExternalResources though?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, or we could change the name of External to ExternallyReachable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up changing the name to ExternallyReachable in 514ff8c

I'm happy to change it again if you all disagree.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also changed Crucible to CrucibleDatasets

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you bring a really important point up: that BlueprintZoneFilter::Crucible can return non-Crucible zones in a manner that is somewhat surprising. So... what if BlueprintZoneFilter::Crucible also filtered based on zone kind? The first thing ensure_crucible_dataset_records_exist does is to not consider non-Crucible zones, so it would fit right in.

Yeah, I like this, although I think I'm still getting hung up on the fact that this primarily looks at the zone disposition. We want this call:

.all_omicron_zones(BlueprintZoneFilter::SomethingSomething)

to return (a) all crucible zones that (b) are still running in some capacity (either in service or quiesced). ::Crucible sounds like it's just returning all the crucible zones; CrucibleDatasets sounds like either "zones that provide datasets" or "zones that need datasets", but neither of those feel like they capture the "still running in some capacity" part of it.

I'm not sure where that leave me, other than vaguely unsatisfied with the names (with my apologies!). I feel like I want a word for "in service or quiesced". Running doesn't seem right, but if it were, I'd propose something like:

  • ::RunningCrucible (only non-expunged zones with type ::Crucible)
  • ::Running (all non-expunged zones, which clients can additionally filter by type if they want to)

Ooh, or we could change the name of External to ExternallyReachable.

I like this name a lot more; I feel like it captures the intent of what we want (i.e., we filter by this before applying any changes related to making zones reachable from outside the rack).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes that @jgallagher, @sunshowers, and I discussed in a video chat.

Comment thread nexus/types/src/deployment.rs Outdated
Comment thread nexus/types/src/deployment.rs Outdated

@jgallagher jgallagher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this - I really like how the Shoud... names read at call sites.

@andrewjstone andrewjstone merged commit cc182ee into main Apr 11, 2024
@andrewjstone andrewjstone deleted the sunshowers/spr/wip-add-blueprintzonefilter-to-all_omicron_zones branch April 11, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants