initial inventory for automated update#4291
Conversation
This commit matches commit 581e902 in branch dap/nexus-inventory. I've just rebased the changes onto "main" here.
I first read that as "chicken sandwiches" and was very confused. I'm not even that hungry! |
| // if we try to ask MGS about it, we have to wait for MGS to time out | ||
| // its attempt to reach it (currently several seconds). This choice | ||
| // enables inventory to complete much faster, at the expense of not | ||
| // being able to identify this particular condition. |
There was a problem hiding this comment.
I wonder if we should still try to query the SPs ignition says aren't present, but on some lower frequency (maybe even a separate background task entirely? and/or in a subsystem that's more related to faults than inventory, since "I can talk to an SP that ignition says isn't there" is definitely abnormal?). I'm nervous about baking in blind spots.
There was a problem hiding this comment.
I think that makes sense. I'd like to defer it for now. I don't think making this choice now makes it any harder to do that in the future.
There was a problem hiding this comment.
No argument on deferring it. Maybe create an issue after this lands so we don't lose track? Seems like the kind of thing that would only happen on an already-bad day.
| PRIMARY KEY (inv_collection_id, hw_baseboard_id) | ||
| ); | ||
|
|
||
| CREATE TYPE IF NOT EXISTS omicron.public.caboose_which AS ENUM ( |
There was a problem hiding this comment.
We chatted extensively about this; I'll attempt to summarize here:
- We don't love this enum; it feels a little goofy.
- One alternative is to instead have
caboose_slot_0/caboose_slot_1foreign keys ininv_sp/inv_rotthat refer to rows ininv_caboose. This is a 1-to-at-most-1 relationship, but it allows us to encode in the schema that either all or none of the caboose data is available. - Currently,
inv_caboosedoesn't have a single primary key, so adding a foreign key to it is awkward at best. We could either add an artificial primary key toinv_caboose, or try to shift things a bit to usesw_caboose_idas a FK.
There was a problem hiding this comment.
I went ahead and did this in 58c010f. Honestly, I could go either way on the result. CabooseWhich does feel janky. But it also reflects exactly what we're getting from MGS: that is, each row in this table reflects one response from the get-caboose endpoint, and that essentially represents the parameters to that request. It's kind of a nice property that no row in an inv_* table represents data from multiple collection requests:
- it guides the schema design -- there's one table for each kind of observation (with maybe additional tables if there are a bunch of fields that can be present or absent together)
- it makes it easy to map the collection request responses to database rows
- it makes it easy to have the uniform set of (inv_collection_id, time_collected, source) fields. We do have those here, but it's arguably misleading because the source and time_collected fields on
inv_service_processorandinv_root_of_trustdon't apply to the caboose fields - This example sounds awfully specific but I feel like there's something general here: imagine if we wanted in the future to update the database as we collect data instead of all at once at the end. We'd have this unfortunate situation of having to either insert an
inv_service_processorrecord and then update it later or else hang onto it (don't insert it) until we've tried to collected all the things that might go into it.
I don't think these are big deals for this particular case. Rather, I came to this after exploring much different ways to structure this -- like an inv_sled table that might include pieces of information from both the sled agent (like the current host OS) and the SP (like the current host flash contents), etc. I really disliked this because in the face of partial failures you have all these partial rows and then everything has to be NULL-able. That's how I got to the "rows should not include data from multiple sources" rule.
Put differently: I don't think this specific violation of that rule is that bad, but without that rule, I found myself spinning in circles for a long time about how to design the schema. It's pretty compelling to just say "each source of observation is a table; each observation is a row; then apply the usual database normalization rules".
All that said, I'm kind of ambivalent in the end. I think I slightly prefer the previous thing with caboose_which but I'm interested in your thoughts.
There was a problem hiding this comment.
All the properties you describe about caboose_which make sense. I think I still slightly-to-moderately prefer the changes in 58c010f, but the more I look at it the more I think it's largely a superficial preference. If you start to feel more strongly that you want to go back to caboose_which I could certainly live with that, even if it's just to maintain the schema design guidance.
This feels clearest to me on this issue in particular:
imagine if we wanted in the future to update the database as we collect data instead of all at once at the end. We'd have this unfortunate situation of having to either insert an inv_service_processor record and then update it later or else hang onto it (don't insert it) until we've tried to collected all the things that might go into it.
I think it's the same either way. With caboose_which, partway through a collection insertion we could have rows in inv_sp that do not have corresponding rows in inv_caboose, which feels functionally the same as inv_sp having NULL slotN_inv_caboose_id foreign keys. The latter does mean when we do collect a caboose we have to do an insert+update instead of just an insert, but from a query / data representation point of view, either way you don't know whether the caboose is missing because we couldn't collect it or if it just hasn't been collected yet (absent other info like the presence of an error).
There was a problem hiding this comment.
I think it's the same either way.
Yeah, from a representation perspective, that's true. I had in my mind an implicit rule that we wouldn't want to write a record and then update it later in the same operation. But that's somewhat arbitrary, too. (Not doing this does make it more complicated to infer anything from partially-inserted collections, and to measure progress based on what's present, but now we're talking about several layers of hypotheticals that aren't worth dealing with now.)
There was a problem hiding this comment.
As I was reading through inv_root_of_trust and inv_service_processor I was wondering where the references to the cabooses were, and then reached this comment thread and the remaining tables. Thinking about this a bit, I think we should stiick with the caboose_which and inv_caboose tables as they are now rather than embedding fields in the sp and rot tables which would require a write + update.
I don't think the slight convenience or aesthetically pleasing look of the sp and rot tables is strong enough to violate the rule of "one collection per source = one row in one table". That's a really powerful thing to allow us to reason about the system and my gut is telling me we'll be happy to have that later.
| name: c.name, | ||
| // The MGS API uses an `Option` here because old SP versions did not | ||
| // supply it. But modern SP versions do. So we should never hit | ||
| // this `unwrap_or()`. |
There was a problem hiding this comment.
Should we modify MGS to remove this Option altogether before (or as a part of) this PR? I'm inclined to say "yes"; it's a trivial change in MGS's sp_component_caboose_get endpoint.
There was a problem hiding this comment.
Sure, I'll take a swing at that.
| /// with separate records, even though they might come from the same source | ||
| /// (in this case, a single MGS request). | ||
| /// | ||
| /// We make heavy use of maps, sets, and Arcs here because many of these things |
There was a problem hiding this comment.
This part of the comment makes me nervous, but I think unnecessarily so. If we actually have Arcs pointing to each other, we can end up with undroppable cycles, but after reading over the structs I don't think we do, right? The Arc<T> types in Collection are:
BaseboardId(does not contain anyArcs)Caboose(does not contain anyArcs)
and then CabooseFound keeps an Arc<Caboose> (which is also fine).
There was a problem hiding this comment.
I'll reword this a bit to clarify that no two objects point at "each other". It's more about the fact that some objects are pointed-to by many other things within the Collection.
| // `inv_service_processor` using an explicit list of columns | ||
| // and values. Without the following statement, If a new | ||
| // required column were added, this would only fail at | ||
| // runtime. |
There was a problem hiding this comment.
Big 👍 on this comment (and the solution it's describing). Very clear what's going on here.
|
|
||
| opctx.authorize(authz::Action::Modify, &authz::INVENTORY).await?; | ||
|
|
||
| loop { |
There was a problem hiding this comment.
Can we get all the collection IDs to delete in a single query instead of looping and having to delete one at a time? This is grossly oversimplified (in particular, I'm putting the error count in directly as a column), but given
create table coll (id int primary key, started timestamp, nerrors int);a query like this should return all IDs we need to prune:
select id from coll where id not in (
-- keep the 3 most recent collections...
(select id from coll order by started desc limit 3)
union
-- and the single most recent collection that had no errors (if it wasn't
-- already saved by the "3 most recent" above)
(select id from coll where nerrors = 0 order by started desc limit 1)
);There was a problem hiding this comment.
I think this is a promising approach. But I'm a little worried it'll take a while to make this real (first real SQL, then real Diesel code), and also that when we do, we may find the database winds up doing a table scan (or rejecting the SQL because we've configured it to disallow that). As an example: one of the stated assumptions is that the number of collections here could be huge. In that case, the highest-level subquery here will produce only 3 rows, but the query itself will be trying to select any collections not in that set, which will in turn return a very large result set. So we'll a LIMIT there. We also want to start with the oldest ones, so we'll want an ORDER BY timestamp. At this point, there are enough variables here that I'm not sure what query plan Cockroach will use. I think ideal would be to do the subquery, then scan the index (by timestamp) and just skip over any rows that are in the subquery and stop when we reach the limit. I hope it will do that but it's hard to be sure until we do that work.
I think this is probably all solvable (or else we'll find out why it's not), but if the current code is at least correct and not pathological, I'd rather defer this than spend the time now to work all this out.
| // break it up if these transactions become too big. But we'd need a | ||
| // way to stop other clients from discovering a collection after we | ||
| // start removing it and we'd also need to make sure we didn't leak a | ||
| // collection if we crash while deleting it. |
There was a problem hiding this comment.
Do we also need to prune no-longer-referenced hw_baseboard_id or sw_caboose rows? It's a little hard to imagine hw_baseboard_id getting "too big" since it only gets a row for each physical component the rack sees, but maybe in a large/long-lived multirack deployment? sw_caboose gets a few new rows for every update, so probably grows somewhat faster but still not all that quickly.
There was a problem hiding this comment.
In principle, eventually, yes. I think this is not urgent and we will notice before it becomes so.
| } | ||
|
|
||
| impl diesel::query_builder::QueryFragment<diesel::pg::Pg> for InvCabooseInsert { | ||
| fn walk_ast<'b>( |
There was a problem hiding this comment.
I have no useful suggestion here, but would like to register my complaint that this is much harder to read than the equivalent query written out in raw SQL would be.
This reverts commit 58c010f.
|
@jgallagher I've made a bunch of changes since your review, but hopefully no surprises. Besides the stuff that came up in your review:
That's all I've got planned so I think this is ready for re-review. |
|
Looks like the helios deploy test failure is legit (or at least related to the changes): |
andrewjstone
left a comment
There was a problem hiding this comment.
@davepacheco This looks great. I only skimmed most of the DB transaction stuff, but the overall picture makes sense to me. I'll leave it to John to approve due to my skimming and his expertise.
| generation INT8 NOT NULL | ||
| ); | ||
|
|
||
| /* |
| PRIMARY KEY (inv_collection_id, hw_baseboard_id) | ||
| ); | ||
|
|
||
| CREATE TYPE IF NOT EXISTS omicron.public.caboose_which AS ENUM ( |
There was a problem hiding this comment.
As I was reading through inv_root_of_trust and inv_service_processor I was wondering where the references to the cabooses were, and then reached this comment thread and the remaining tables. Thinking about this a bit, I think we should stiick with the caboose_which and inv_caboose tables as they are now rather than embedding fields in the sp and rot tables which would require a write + update.
I don't think the slight convenience or aesthetically pleasing look of the sp and rot tables is strong enough to violate the rule of "one collection per source = one row in one table". That's a really powerful thing to allow us to reason about the system and my gut is telling me we'll be happy to have that later.
| /// Prune inventory collections stored in the database, keeping at least | ||
| /// `nkeep`. | ||
| /// | ||
| /// This function removes as many collections as possible while preserving |
There was a problem hiding this comment.
The latest nkeep are provided by timestamps, which aren't really global. Right now collection is at 10 min intervals, so as long as only 1 nexus performs a collection per interval this should be fine. However, I could see problems arising around order, although quite unlikely due to our 500ms limitation around syncing.
I don't really think this is something worth worrying about but figured I'd ask for completeness sake. Are there autoincrementing IDs we could use for collections rather than UUIDs as foreign keys and then sort by those? Would this present other issues with dueling Nexuses?
There was a problem hiding this comment.
Yeah, using timestamps is definitely a little fuzzy. In this case though I think that reflects the reality that collections are not atomic and they don't have a total order. Two Nexus instances could totally run collections concurrently that have start/done times that overlap (and I think that's fine). Consumers can decide if they want the most-recently-started or most-recently-finished (or even the-one-containing-the-most-recent-collection-time-for-the-specific-item-that-I-care-about). We could potentially use a sequence to assign a total order to these but I don't think it would have a useful semantic meaning -- at best it'd be a proxy for "which one committed to the database first" and I'm not sure that's useful.
Okay so my argument is basically "report the facts (the start/done timestamp) and let consumers decide what they want". But that just punts your question to "okay, well, which ones should we keep when we're pruning them?". And I think the answer here is to tune both the frequency and nkeep such that it doesn't really matter if we choose "wrong" -- i.e., if two collections start/finish at about the same time but for some reason a consumer might reasonably want either one, we should probably just keep both. But my expectation here is that all consumers for now would probably want the same thing, which is the latest "time_started" one, and as long as "nkeep" is more than 1 then it doesn't matter which ones we keep if two overlap because there's always a newer one which is what consumers actually want.
There was a problem hiding this comment.
Ah, ok. That makes sense. I was actually thinking that we could eliminate the overlapping collections to a degree by having each nexus check that there isn't a collection currently running - or rather that one hasn't started within some bound (say collection_interval / 2) before kicking off another. With that, collections should be very close to totally ordered by time if not always so.
|
Thanks for taking a look @andrewjstone! |
jgallagher
left a comment
There was a problem hiding this comment.
This looks great! Just a handful of small nits.
| pub serial_number: String, | ||
| } | ||
|
|
||
| impl<'a> From<&'a BaseboardId> for HwBaseboardId { |
There was a problem hiding this comment.
Tiny nit / question - if we have to .clone() all the fields of BaseboardId, should this be impl From<BaseboardId> instead, and push the clone to the callsite? I'm not sure how this is used, but that might avoid some clones, if there are be cases where a caller has a BaseboardId that they want to convert into a HwBaseboardId and not use again.
Similar question about other From<&T> impls in this file.
There was a problem hiding this comment.
Agreed. Fixed in a55216d. I changed this one and the SwCaboose one. I did not change the InvCollection one because in that case, the source object (a Collection) is potentially huge.
| resolver: internal_dns::resolver::Resolver, | ||
| creator: String, | ||
| nkeep: u32, | ||
| disable: bool, |
There was a problem hiding this comment.
Just making sure I understand: this is a setting at the Nexus config level (i.e., the TOML file baked into the Nexus zone) and cannot change at runtime, right? If we needed to flip this switch, how would we?
There was a problem hiding this comment.
That's right. To my knowledge we do not yet have a way to apply dynamic config at runtime. The intent here is that if we really needed to, we could modify the TOML file inside each Nexus zone to disable this task. Then we'd restart Nexus. It's obviously not great but I've frequently found these sorts of facilities essential in the mitigation of production incidents in the past. (A step up might be a support API for pausing any background task in a particular Nexus instance by name. But that wouldn't survive Nexus restart without storing that config somewhere.)
| datastore.clone(), | ||
| ); | ||
|
|
||
| // Nexus starts our very background task, so we should find a collection |
There was a problem hiding this comment.
Nit typo - "starts our very background"
There was a problem hiding this comment.
Not a typo, but poorly written. I reworded it in a55216d.
|
|
||
| // Background task: inventory collector | ||
| let task_inventory_collection = { | ||
| let watcher = inventory_collection::InventoryCollector::new( |
There was a problem hiding this comment.
Nit / question - is this variable misnamed? Looks like register takes watchers as its last arg, but this is the task implementation itself, right?
| RotSlotB baseboard part "FAKE_SIM_SIDECAR" serial "SimSidecar1": board "SimSidecarRot" | ||
|
|
||
| errors: | ||
| error: MGS "http://[100::1]:12345": listing ignition targets: Communication Error: error sending request for url (http://[100::1]:12345/ignition): error trying to connect: tcp connect error: Network is unreachable (os error <<redacted>>): error sending request for url (http://[100::1]:12345/ignition): error trying to connect: tcp connect error: Network is unreachable (os error <<redacted>>): error trying to connect: tcp connect error: Network is unreachable (os error <<redacted>>): tcp connect error: Network is unreachable (os error <<redacted>>): Network is unreachable (os error <<redacted>>) |
There was a problem hiding this comment.
Ug, sorry for this; I really need to clean up the duplicated: duplicated: duplicated: errors from MGS
| let index = u16::try_from(i).map_err(|e| { | ||
| Error::internal_error(&format!( | ||
| "failed to convert error index to u16 (too \ | ||
| many errors in inventory collection?): {}", |
There was a problem hiding this comment.
Trivial nit - rustfmt won't line up split strings
There was a problem hiding this comment.
Ugh. This keeps happening and I don't notice. I'm not sure why. I wonder if it happens when some other change (like a symbol rename) causes this block to be reformatted when I'm not actually working on it. Anyway, fixed in a55216d.
| } | ||
| } | ||
|
|
||
| /// A SQL common table expression (CTE) used to insert into `inv_caboose` |
There was a problem hiding this comment.
I think this block comment is referencing code that no longer exists, right?
|
I think I've addressed the outstanding feedback and I intend to land this once the repo re-opens after the latest customer update. |
The RoT can report four different 512-byte pages (CMPA, and CFPA active/inactive/scratch). Given multiple RoT artifacts that are viable (match the right board, etc.) but are signed with different keys, these pages are required to identify which archive was signed with a key that the RoT will accept. This PR adds collection of these pages to the inventory system added in #4291. The implementation here is fairly bulky but very mechanical, and is implemented almost identically to the way we collect cabooses: there's an `rot_page_which` to identify which of the four kinds of page it is, and a table for storing the relatively small number of raw page data values. Most of the changes in this PR resulted from "find where we're doing something for cabooses, then do the analogous thing for RoT pages". There are a couple minor quibbles in the unit tests that I'll point out by leaving comments below. The RoT pages now show up when viewing a collection through omdb (note that the quite long base64 string is truncated; there's a command line flag to override the truncation and show the full string): ```console $ omdb db inventory collections show e2f84867-010d-4ac3-bbf3-bc1e865da16b > x.txt note: database URL not specified. Will search DNS. note: (override with --db-url or OMDB_DB_URL) note: using database URL postgresql://root@[::1]:43301/omicron?sslmode=disable note: database schema version matches expected (11.0.0) collection: e2f84867-010d-4ac3-bbf3-bc1e865da16b collector: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (likely a Nexus instance) started: 2023-11-14T18:51:54.900Z done: 2023-11-14T18:51:54.942Z errors: 0 Sled SimGimlet00 part number: FAKE_SIM_GIMLET power: A2 revision: 0 MGS slot: Sled 0 (cubby 0) found at: 2023-11-14 18:51:54.924602 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimGimletSp SimGimlet 0.0.1 ffffffff SpSlot1 SimGimletSp SimGimlet 0.0.1 ffffffff RotSlotA SimGimletRot SimGimlet 0.0.1 eeeeeeee RotSlotB SimGimletRot SimGimlet 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa Z2ltbGV0LWNtcGEAAAAAAAAAAAAAAAAA... CfpaActive Z2ltbGV0LWNmcGEtYWN0aXZlAAAAAAAA... CfpaInactive Z2ltbGV0LWNmcGEtaW5hY3RpdmUAAAAA... CfpaScratch Z2ltbGV0LWNmcGEtc2NyYXRjaAAAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - Sled SimGimlet01 part number: FAKE_SIM_GIMLET power: A2 revision: 0 MGS slot: Sled 1 (cubby 1) found at: 2023-11-14 18:51:54.935038 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimGimletSp SimGimlet 0.0.1 ffffffff SpSlot1 SimGimletSp SimGimlet 0.0.1 ffffffff RotSlotA SimGimletRot SimGimlet 0.0.1 eeeeeeee RotSlotB SimGimletRot SimGimlet 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa Z2ltbGV0LWNtcGEAAAAAAAAAAAAAAAAA... CfpaActive Z2ltbGV0LWNmcGEtYWN0aXZlAAAAAAAA... CfpaInactive Z2ltbGV0LWNmcGEtaW5hY3RpdmUAAAAA... CfpaScratch Z2ltbGV0LWNmcGEtc2NyYXRjaAAAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - Switch SimSidecar0 part number: FAKE_SIM_SIDECAR power: A2 revision: 0 MGS slot: Switch 0 found at: 2023-11-14 18:51:54.904 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimSidecarSp SimSidecar 0.0.1 ffffffff SpSlot1 SimSidecarSp SimSidecar 0.0.1 ffffffff RotSlotA SimSidecarRot SimSidecar 0.0.1 eeeeeeee RotSlotB SimSidecarRot SimSidecar 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... CfpaActive c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... CfpaScratch c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - Switch SimSidecar1 part number: FAKE_SIM_SIDECAR power: A2 revision: 0 MGS slot: Switch 1 found at: 2023-11-14 18:51:54.915680 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimSidecarSp SimSidecar 0.0.1 ffffffff SpSlot1 SimSidecarSp SimSidecar 0.0.1 ffffffff RotSlotA SimSidecarRot SimSidecar 0.0.1 eeeeeeee RotSlotB SimSidecarRot SimSidecar 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... CfpaActive c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... CfpaScratch c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - ``` There's also a new `omdb` subcommand to report the RoT pages (which does not truncate, but if we think it should that'd be easy to change): ```console $ omdb db inventory rot-pages note: database URL not specified. Will search DNS. note: (override with --db-url or OMDB_DB_URL) note: using database URL postgresql://root@[::1]:43301/omicron?sslmode=disable note: database schema version matches expected (11.0.0) ID DATA_BASE64 099ba572-a978-4592-ae7a-452629377904 c2lkZWNhci1jZnBhLWluYWN0aXZlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 0e9dc5b0-b190-43da-acb6-84450fdfdb94 c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 80923bac-fbcc-46e0-b861-9dba906c14f7 Z2ltbGV0LWNmcGEtaW5hY3RpdmUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 98cc4225-a791-4092-99c6-81e27e8d8ffa c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= a32eaf95-a20e-4570-8860-e0fb584a2ff1 c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= c941810a-1c6a-4dda-9c71-41a0caf62ace Z2ltbGV0LWNtcGEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= e96042d0-ae8a-435c-9118-1b71e8a9a651 Z2ltbGV0LWNmcGEtYWN0aXZlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= fdc27064-4338-4cbe-bfe5-622b11a9afbc Z2ltbGV0LWNmcGEtc2NyYXRjaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
This PR implements the first round of hardware/software inventory for automated update. See RFD 433 for background. There's a summary of the new data model in dbinit.sql.
I'm sorry this change is so big. Here are the key pieces:
nexus/types: type related to software inventory (used in a few places)schema/crdbandnexus/db-model: database schema/model described in RFD 433nexus/db-queries: datastore queries to insert or delete an entire inventoryCollectionnexus/inventory: new crate with Collector and builder interface. This crate only collects inventory -- it doesn't do anything with the database.nexus/src/app/background: a new background task that uses these other pieces to collect inventory, write it to the database, and clean up old collectionsomdbsupport for showing inventory data from the databaseWhat's not here (and will be in future PRs, not this one):
Some other stuff came along for the ride here. I'm happy to separate these if that's useful but they're each pretty small:
omicron-dev run-all, as well as all tests that set up aControlPlaneTestContext, now run a Management Gateway Service backed by the same simulated SPs used in the existing MGS tests. This was easy to do, convenient for future inventory work, and it was necessary to test theomdbchanges.omdbdoes not callusdt::register_probes(), so we don't have (for example) the diesel-dtrace probes inomdb. I added a call inpool.rsto cover these. This isn't quite once per process, but it's close, and ensures that anybody who uses our database layer will get these probes. This was a one line change.pool_authorized()because it was non-puband there was only one caller and I found the name confusing.Here's some example output. I used
omicron-dev run-allto get everything going:Here's
omdb nexus background-tasks showfor the new "inventory" task:Here's using
omdbto poke around the inventory data: