Conversation
addressing comments
command/debug.go
Outdated
| healthInfo, err := c.cachedClient.Sys().Health() | ||
| if err != nil { | ||
| c.captureError("server-status.health", err) | ||
| return |
There was a problem hiding this comment.
Although it's true that an error on the health endpoint will probably mean one on the seal-status endpoint too, I'd rather we didn't assume that. Can you revert this change please?
command/debug.go
Outdated
| sealInfo, err := c.cachedClient.Sys().SealStatus() | ||
| if err != nil { | ||
| c.captureError("server-status.seal", err) | ||
| return |
command/debug.go
Outdated
| defer resp.Body.Close() | ||
| err = jsonutil.DecodeJSONFromReader(resp.Body, &data) | ||
| if err != nil { | ||
| c.captureError("inFlightReq-status", err) |
There was a problem hiding this comment.
It looks like the first arg to captureError is the target, which we're now calling requests right?
command/debug.go
Outdated
| return | ||
| } | ||
|
|
||
| if data != nil && len(data) > 0 { |
There was a problem hiding this comment.
I don't think we need this conditional do we?
There was a problem hiding this comment.
yeah, there should always be at least one entry which is the /v1/sys/in-flight-req related info
http/handler.go
Outdated
| ClientRemoteAddr: r.RemoteAddr, | ||
| ReqPath: r.URL.Path, | ||
| }) | ||
| if err != nil { |
There was a problem hiding this comment.
What error are we handling here?
There was a problem hiding this comment.
Ah, thanks for catching this!
vault/core.go
Outdated
| } | ||
|
|
||
| type InFlightRequests struct { | ||
| l sync.RWMutex |
There was a problem hiding this comment.
I'd like us to avoid having both a mutex and sync.Map; if we're going to use a mutex to guard the map, we may as well just use a regular map, since it's way cheaper than sync.Map if concurrent access is protected otherwise.
That said, I think you could get rid of the lock if you stored InFlightReqData instead of *InflightReqData. The race occurs when we try to update data referenced by a pointer while another goroutine is trying to load the data. If instead updating consists of copying the data out of the map, updating that copy, then storing a copy back into the map, I don't think it'll be racy.
There was a problem hiding this comment.
going to remove the mutex.
vault/testing.go
Outdated
|
|
||
| func TestCoreWithCustomResponseHeaderAndUI(t testing.T, CustomResponseHeaders map[string]map[string]string, enableUI bool) (*Core, [][]byte, string) { | ||
| confRaw := &server.Config{ | ||
| LogRequestsLevel: "basic", |
There was a problem hiding this comment.
This isn't a valid log level, right?
There was a problem hiding this comment.
yeah, I wanted to make sure invalid values are not translated to valid ones
There was a problem hiding this comment.
But we probably shouldn't do that kind of thing outside of a test specific to the feature. Unless it's somehow relevant to custom response header behaviour?
vault/core.go
Outdated
| currentInFlightReqMap := make(map[string]InFlightReqData) | ||
| c.inFlightReqData.InFlightReqMap.Range(func(key, value interface{}) bool { | ||
| // there is only one writer to this map, so skip checking for errors | ||
| v, _ := value.(InFlightReqData) |
There was a problem hiding this comment.
Minor nit, but: by using the two-argument form, you're deliberately ignoring errors. Either we don't believe errors are possible, in which case you can use the single-argument form which panics on error, or we do, in which case we should handle the error.
There was a problem hiding this comment.
I think at some point, you mentioned that there is only one writer for this and there is no need to check if the assertion worked or not. The comment I have on line 3006 talks about that. But, happy to add the error check for this.
There was a problem hiding this comment.
I'm ok with there being no error check, I'm just saying if we're not going to check for errors because we think they can't happen, why not use the single-argument form? If we're right that errors can't happen, then they're equivalent. If we're wrong, we'll learn of it because of panics.
There was a problem hiding this comment.
Sounds good! I think it is highly unlikely that a panic happens. I am going to use the single-arguments form.
Adding a trace capability to the request handling or HTTP layer that shows, for each requests received but not yet answered:
This would provide a point-in-time snapshot of what user requests Vault is handling, highlighting any deadlocks or abnormally long response times.
The API reporting on this information should be called as part of the debug command. It also support unauthenticated requests if a new profiling config option is set.
Adding a new metric for the total number of in-flight requests.
Adding documentation for the metric and the new endpoint.
Here is a sample result:
{ "7ecdd692-d934-e668-365d-b4a6664e1548": { "start_time": "2021-11-03T14:54:45.774893-07:00", "client_remote_address": "127.0.0.2:53771", "request_path": "/v1/secret/data/68", "duration": "230750 microseconds" } }