Better error message when there's no default IP pool#4880
Conversation
| /// a specific id was requested | ||
| ById(Uuid), | ||
| /// a session token was requested | ||
| BySessionToken(String), |
There was a problem hiding this comment.
This is not used. What's weird is it was introduced in #814, but it is also not used there.
There was a problem hiding this comment.
Thanks for removing this. I'm not sure how much sense it made in this context to begin with.
| /// (caller summarizes it) | ||
| ByCompositeId(String), | ||
| /// object selected by criteria that would be confusing to call an ID | ||
| ByOther(String), |
There was a problem hiding this comment.
A catchall name like this is surely a code smell, but I can't think of a better name.
There was a problem hiding this comment.
It's okay but I'm just not 100% about it. I know it may be somewhat redundant, but I do kind of like ByQuery as it stays general but also guides the consumer into giving it something query like.
| LookupType::ByCompositeId(label) => { | ||
| format!("{} with id \"{}\"", t, label) | ||
| } | ||
| LookupType::ByOther(msg) => msg, |
There was a problem hiding this comment.
This is the point of all this — I need a variant that doesn't stick all that extra structure in the message.
| let lookup_type = | ||
| LookupType::ByOther("default IP pool for current silo".to_string()); |
There was a problem hiding this comment.
Other than stripping the ID part of the string (which is helpful because it'll not longer look like an error) I don't know that this provides that much more actionable context. Something like this might be more helpful in the logs:
LookupType::ByQuery(format!("ip_pool where ip_pool_resource { resource_id: {}, is_default: true }", authz_silo_id));It doesn't have to be exactly that. I know this doesn't communicate in plain english the error (as you've done above), but it does communicate some specifics of what tables to look at if something unexpected is happening.
cc @internet-diglett do you have thoughts?
There was a problem hiding this comment.
I considered options more like that, giving the stuff we select by, but to a certain extent I think it’s revealing too much about how the concept of default pool is implemented. It feels weird for an API to be talking about a join table by name.
There was a problem hiding this comment.
How about combining the two? As with other errors in the system, there's both an internal and external message here. Internally we want enough context to know what to dig into if something's gone wrong. Externally we want a general message for the users to know what's failed at a high level.
There was a problem hiding this comment.
True, interesting idea. Mostly only the 500-level errors have internal messages, except InvalidValue is a 400-level one used for things like disk size being too big and it does have that. It would be pretty invasive to change ObjectNotFound to have that.
omicron/common/src/api/external/error.rs
Lines 28 to 73 in c392c76
There was a problem hiding this comment.
Changing ObjectNotFound might be the right way to do it, but that's not what I specifically had in mind.
In the logs above, there's the external event message as you'd expect but also a serialized version of the actual error object. Even if the "query" part of this never showed up in an error, we'd still see it in the logs (we, at least under the context at which this was initially seen).
Now, it could be that we really only see errors serialized out like that in the logs under very special circumstances (maybe only saga events?). Were that the case, it may not be worth it.
There was a problem hiding this comment.
I see. If the LookupType itself gets logged, in theory I could put internal/external on that and we'd see it that way, without affecting the way the error message is written out for the API response.
It looks like we do see the full object in the logs because this happens during the instance create saga — this is from log output after running the test edited in this PR.
$ ntlog | rg 'default IP pool'
{"msg":"recording saga event","v":0,"name":"test_instance_ephemeral_ip_no_default_pool_error","level":20,"time":"2024-01-24T16:45:33.208653-06:00","hostname":"Davids-M1-MBP.local","pid":98315,"component":"SecStore","component":"nexus","component":"ServerContext","name":"391cc54f-028b-4b57-befe-0d3a4239ec4e","event_type":"Failed(ActionFailed { source_error: Object {\"ObjectNotFound\": Object {\"lookup_type\": Object {\"ByOther\": String(\"default IP pool for current silo\")}, \"type_name\": String(\"ip-pool\")}} })","node_id":"35","saga_id":"522d6f2d-b616-43a3-ae88-6c85f2e8596b"}
{"msg":"request completed","v":0,"name":"test_instance_ephemeral_ip_no_default_pool_error","level":30,"time":"2024-01-24T16:45:33.943636-06:00","hostname":"Davids-M1-MBP.local","pid":98315,"uri":"/v1/instances?project=springfield-squidport","method":"POST","req_id":"1d8fbacd-e242-495f-a033-1a0d7edfe88b","remote_addr":"127.0.0.1:55531","local_addr":"127.0.0.1:55523","component":"dropshot_external","name":"391cc54f-028b-4b57-befe-0d3a4239ec4e","error_message_external":"not found: default IP pool for current silo","error_message_internal":"not found: default IP pool for current silo","latency_us":1388504,"response_code":"404"}
Will think about this.
|
Merging this as a net improvement. Will think about making it more better later. |
Closes #4864
This is a bad error message to get when the problem is that there is no default IP pool configured for your current silo:
"Default pool for current silo" is not an id, so why would we call it one? This is better:
This PR is just making that possible.