VAULT-31409: trace postUnseal function#28895
Conversation
|
CI Results: |
didn't bother to check the error from traceFile.Close()
helper/trace/debug_trace.go
Outdated
| path := fmt.Sprintf("%s/%s_%s", os.TempDir(), filePrefix, time.Now().Format(time.RFC3339)) | ||
| traceFile, err := os.Create(path) |
There was a problem hiding this comment.
still want to make this more similar to VAULT_STACKTRACE_WRITE_TO_FILE for the sake of consistency
after talking to Kuba it sounds like it's a good idea to try to move stuff out of core, so even if there's no immediate need for a generic debug trace function it's still fair to add it
also some usability improvements from manual testing
helper/trace/debug_trace.go
Outdated
| } | ||
| } | ||
|
|
||
| traceFile, err := filepath.Abs(fmt.Sprintf("%s/%s-%s.trace", dir, filePrefix, time.Now().Format(time.RFC3339))) |
There was a problem hiding this comment.
can we switch this to using filepath.Join to create the path, rather than Sprintf?
command/server/config.go
Outdated
| DisableSSCTokens bool `hcl:"-"` | ||
|
|
||
| EnablePostUnsealTrace bool `hcl:"enable_post_unseal_trace"` | ||
| PostUnsealTraceDir string `hcl:"post_unseal_trace_dir"` |
There was a problem hiding this comment.
let's also add these fields to the (*Config).Merge function
There was a problem hiding this comment.
ahh good catch, wouldn't have noticed this unless I'd tested with multiple files. I wonder why we didn't go for a more generic implementation here (maybe didn't want to use reflect?) but we could at least have some semgrep validation to check that all fields are merged (or explicitly ignored)
| defer stopTrace() | ||
| } | ||
|
|
||
| defer metrics.MeasureSince([]string{"core", "post_unseal"}, time.Now()) |
There was a problem hiding this comment.
not sure what is the best order between this metric and the new trace, but I think it's fair to include the metric in the captured trace even tho that will make the time spend on the tracing setup invisible to the metric
command/server/config.go
Outdated
| DisableSSCTokens bool `hcl:"-"` | ||
|
|
||
| EnablePostUnsealTrace bool `hcl:"enable_post_unseal_trace"` | ||
| PostUnsealTraceDir string `hcl:"post_unseal_trace_dir"` |
There was a problem hiding this comment.
ahh good catch, wouldn't have noticed this unless I'd tested with multiple files. I wonder why we didn't go for a more generic implementation here (maybe didn't want to use reflect?) but we could at least have some semgrep validation to check that all fields are merged (or explicitly ignored)
there were concerns about using the /tmp directory because of permissions, or having a default dir at all, so now it's required to set a dir in order to generate the traces.
sounds like it might be forbidden in Windows and possibly cause problems in some MacOS applications.
|
Build Results: |
| return "", nil, fmt.Errorf("trace directory %q does not exist", dir) | ||
| } | ||
|
|
||
| if !os.IsNotExist(err) && !d.IsDir() { |
There was a problem hiding this comment.
do we need the !os.IsNotExist(err) portion of this condition?
There was a problem hiding this comment.
yes, I think we need it. If os.IsNotExist(err) evaluates to true then we don't want to evaluate d.IsDir() since d would be nil.
The alternative would be to use a nested if-else structure that interleaves error returns with additional operations (creation of default dir), which I don't like as much:
d, err := os.Stat(dir)
if err != nil {
if os.IsNotExist(err) {
if dirMustExist {
return "", nil, fmt.Errorf("trace directory %q does not exist", dir)
} else {
if err := os.Mkdir(dir, 0o700); err != nil {
return "", nil, fmt.Errorf("failed to create trace directory %q: %s", dir, err)
}
}
} else {
return "", nil, fmt.Errorf("failed to stat trace directory %q: %s", dir, err)
}
} else if !d.IsDir() {
return "", nil, fmt.Errorf("trace directory %q is not a directory", dir)
}There was a problem hiding this comment.
makes sense, it's good as is!
CI was complaining about missing comments on the new test function. It feels a bit silly to require this of tests but whatever XD
miagilepner
left a comment
There was a problem hiding this comment.
looks good, though a config test is still failing. once that's passing i'm good to approve it!
| return "", nil, fmt.Errorf("trace directory %q does not exist", dir) | ||
| } | ||
|
|
||
| if !os.IsNotExist(err) && !d.IsDir() { |
There was a problem hiding this comment.
makes sense, it's good as is!
* initial implementation of unseal trace * close file if we fail to start the trace didn't bother to check the error from traceFile.Close() * use reloadable config instead of env var * license * remove leftover * allow setting custom dir and remove new package * bring back StartDebugTrace after talking to Kuba it sounds like it's a good idea to try to move stuff out of core, so even if there's no immediate need for a generic debug trace function it's still fair to add it * track postUnseal instead of unsealInternal also some usability improvements from manual testing * address PR comments * address security review there were concerns about using the /tmp directory because of permissions, or having a default dir at all, so now it's required to set a dir in order to generate the traces. * add unit tests to StartDebugTrace * move back to default dir * document new parameters * add tiny integration test * avoid column in trace filename sounds like it might be forbidden in Windows and possibly cause problems in some MacOS applications. * address PR feedback * add go doc to test CI was complaining about missing comments on the new test function. It feels a bit silly to require this of tests but whatever XD * fix tests
Description
This PR adds 2 new settings to the top-level Vault server configuration:
enable_post_unseal_tracethat when enabled will cause Vault to generate a Go trace during the execution ofcore.postUnsealand save it to disk under the directory configured inpost_unseal_trace_dir, defaulting to/tmp/vault-traces/instead if the field isn't set. This file can be inspected using thego tool tracecommand and will help us debug cases where the leadership transfer operation takes too long.Most of the feature was split into a reusable
StartDebugTracefunction in a separate package. It could be used for similar debug purposes in the future but most importantly for now it avoids adding more code and dependencies to the already-bloatedcorepackage.Manually tested the functionality to ensure that it was possible to generate the traces by updating the new fields in the config, then sending a
SIGHUPsignal to the running Vault process and triggering a leadership election viavault operator step-down. Disabling the generation of the traces via the sameSIGHUPprocess also works. Post-unseal only runs on the active node so restarting a follower won't generate the post-unseal traces.Jira: VAULT-31409
TODO only if you're a HashiCorp employee
to N, N-1, and N-2, using the
backport/ent/x.x.x+entlabels. If this PR is in the CE repo, you should only backport to N, using thebackport/x.x.xlabel, not the enterprise labels.of a public function, even if that change is in a CE file, double check that
applying the patch for this PR to the ENT repo and running tests doesn't
break any tests. Sometimes ENT only tests rely on public functions in CE
files.
in the PR description, commit message, or branch name.
description. Also, make sure the changelog is in this PR, not in your ENT PR.