[1.13.1] fix init race#56
Merged
kolyshkin merged 6 commits intoNov 23, 2021
Merged
Conversation
So we can extract more than the start time with a single read. Signed-off-by: W. Trevor King <wking@tremily.us> (cherry picked from commit 439eaa3) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
And convert the various start-time properties from strings to uint64s. This removes all internal consumers of the deprecated GetProcessStartTime function. Signed-off-by: W. Trevor King <wking@tremily.us> (cherry picked from commit 75d98b2) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
And Stat_t.PID and Stat_t.Name while we're at it. Then use the new .State property in runType to distinguish between running and zombie/dead processes, since kill(2) does not [1]. With this change we no longer claim Running status for zombie/dead processes. I've also removed the kill(2) call from runType. It was originally added in 13841ef (new-api: return the Running state only if the init process is alive, 2014-12-23), but we've been accessing /proc/[pid]/stat since 14e95b2 (Make state detection precise, 2016-07-05, #930), and with the /stat access the kill(2) check is redundant. I also don't see much point to the previously-separate doesInitProcessExist, so I've inlined that logic in runType. It would be nice to distinguish between "/proc/[pid]/stat doesn't exist" and errors parsing its contents, but I've skipped that for the moment. The Running -> Stopped change in checkpoint_test.go is because the post-checkpoint process is a zombie, and with this commit zombie processes are Stopped (and no longer Running). [1]: opencontainers/runc#1483 (comment) Signed-off-by: W. Trevor King <wking@tremily.us> (cherry picked from commit 2bea4c8) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When starting a container with `runc start` or `runc run`, the stub process (runc[2:INIT]) opens a fifo for writing. Its parent runc process will open the same fifo for reading. In this way, they synchronize. If the stub process exits at the wrong time, the parent runc process will block forever. This can happen when racing 2 runc operations against each other: `runc run/start`, and `runc delete`. It could also happen for other reasons, e.g. the kernel's OOM killer may select the stub process. This commit resolves this race by racing the opening of the exec fifo from the runc parent process against the stub process exiting. If the stub process exits before we open the fifo, we return an error. Another solution is to wait on the stub process. However, it seems it would require more refactoring to avoid calling wait multiple times on the same process, which is an error. Signed-off-by: Craig Furman <cfurman@pivotal.io> (cherry picked from commit 8d3e6c9) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Craig Furman <cfurman@pivotal.io> (cherry picked from commit 5c0af14) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Jordan Liggitt <liggitt@google.com> (cherry picked from commit 8541d9c) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
a407289 to
9471a96
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This cherry-picks commits from the following upstream PRs (modulo unrelated test fix):
to docker-1.13.1-rhel branch, in an attemp to fix https://bugzilla.redhat.com/show_bug.cgi?id=2000782
There was a single conflict while cherry-picking commit 2bea4c8, which is caused by missing commit 7335635 (possibly a few more). Resolved manually rather than adding more backported PRs.
Compile-tested only.
The fix needs to be vendored into docker-1.13.1 in extras-rhel-7.9.