Skip to content

Add internal API and omdb command to expunge a sled#5234

Merged
jgallagher merged 12 commits into
mainfrom
john/expunge-sled-via-omdb
Mar 13, 2024
Merged

Add internal API and omdb command to expunge a sled#5234
jgallagher merged 12 commits into
mainfrom
john/expunge-sled-via-omdb

Conversation

@jgallagher

Copy link
Copy Markdown
Contributor

The omdb interface might be overly cautious; feedback welcome. You can try this against omicron-dev run-all; it looks roughly like:

note: using Nexus URL http://[::1]:12221
note: using database URL postgresql://root@[::1]:42285/omicron?sslmode=disable
note: database schema version matches expected (39.0.0)
WARNING: sled b6d65341-167c-41df-9b5c-41cded99c229 IS PRESENT in the most recent inventory collection; are you sure you want to mark it expunged?
WARNING: This operation will PERMANENTLY and IRRECOVABLY mark sled b6d65341-167c-41df-9b5c-41cded99c229 (sim-b6d65341) expunged. To proceed, type the sled's serial number.
sled serial〉sim-b6d65341
expunged sled b6d65341-167c-41df-9b5c-41cded99c229 (previous policy: InService(Provisionable))

This handles the "internal" half of #5134

Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs

@sunshowers sunshowers left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good overall, thanks for doing it! Just a few comments.

Comment thread dev-tools/omdb/src/bin/omdb/db.rs
log: &slog::Logger,
) -> anyhow::Result<Arc<DataStore>> {
let db_url = self.resolve_pg_url(omdb, log).await?;
eprintln!("note: using database URL {}", &db_url);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the logger?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's supposed to, but I'm not positive. Probably a question for @davepacheco; I think the logger is only passed into all the omicron types that need one, and output from omdb itself goes to stdout or stderr.

@sunshowers sunshowers Mar 11, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I think one way to handle this would be to have some kind of tag passed in as a kv to indicate that the message is user-facing, and in the log handler do filtering based on that. (You can also show such messages in a different manner, e.g. "warning: <text>" instead of [timestamp WARN] text.)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of very many things that get emitted directly to stdout/stderr. That's basically how omdb works today. There are definitely tradeoffs to doing it that way but I don't think it makes sense to change just this one and rethinking this approach seems a lot broader than this PR.

Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs
Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs
Comment thread dev-tools/omdb/src/bin/omdb/main.rs
Comment on lines +1097 to +1101
// This is an extremely dangerous and irreversible operation. We put a
// couple of safeguards in place to ensure this cannot be called without
// due consideration:
//
// 1. We'll require manual input on stdin to confirm the sled to be removed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine for now, but I kind of expect we're going to want to write automated tests around this sort of thing (maybe not using omdb, though).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. @benjaminleonard made some suggestions about how we might design the external API to mimic this kind of confirmation flow, but I did not implement any of that for the internal API. I put these guards on omdb, but (a) we can relax them (or provide flags to skip them) if needed, or (b) tests may be able to hit the internal API directly.

let opctx = OpContext::for_tests(log.clone(), datastore.clone());
let opctx = &opctx;

// First, we need to look up the sled so we know its serial number.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it won't work for PCs? Or we just won't necessarily be able to do the inventory collection check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I don't know enough about how PC-based racks are set up to answer this question. I think PCs still have a serial number in the sled table, right? If their sled-agent reports the same serial number, I think both this and the inventory check will work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still work on PCs. This is how we are able to add and remove sleds on the a4x2 testbed.

Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/nexus.rs Outdated
Comment on lines +1111 to +1116
let (_authz_sled, sled) = LookupPath::new(opctx, &datastore)
.sled_id(args.sled_id)
.fetch()
.await
.with_context(|| format!("failed to find sled {}", args.sled_id))?;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only reason we need the database connection? What about having an API instead to fetch sled info? (This is not a big deal either way.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it both for this and fetching the latest inventory, which doesn't have APIs either (but could). Using the database connection in this subcommand seems a little sketchy to me, but I wasn't sure about adding sled info + inventory internal API commands just to support guard rails. I'll keep this as-is for now but it's easy to revisit (or replace if/when we develop those APIs for some other consumers).

@jgallagher jgallagher merged commit 4f101be into main Mar 13, 2024
@jgallagher jgallagher deleted the john/expunge-sled-via-omdb branch March 13, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants