Skip to content

Fix systemd.Apply() to check for DBus error before waiting on a channel.#1772

Merged
crosbymichael merged 1 commit into
opencontainers:masterfrom
filbranden:systemd1
Apr 10, 2018
Merged

Fix systemd.Apply() to check for DBus error before waiting on a channel.#1772
crosbymichael merged 1 commit into
opencontainers:masterfrom
filbranden:systemd1

Conversation

@filbranden
Copy link
Copy Markdown
Contributor

The channel was introduced in #1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.)

Later PR #1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so.

The timeout handling was kept, since there might be other cases where this situation occurs (the bug entry at Red Hat's bugzilla mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.)

/assign @mrunalp @hqhq -> Please review, as you reviewed the original PRs too.
/cc @cyphar -> Code review on the original PRs.
/cc @vikaschoudhary16 -> Authored the original PRs.
/cc @derekwaynecarr @sjenning -> Cc'd on original PRs and comments on Red Hat's bugzilla entry.

The channel was introduced in #1683 to work around a race condition.
However, the check for error in StartTransientUnit ignores the error for
an already existing unit, and in that case there will be no notification
from DBus (so waiting on the channel will make it hang.)

Later PR #1754 added a timeout, which worked around the issue, but we
can fix this correctly by only waiting on the channel when there is no
error. Fix the code to do so.

The timeout handling was kept, since there might be other cases where
this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358
mentions calling this code from inside a container, it's unclear whether
an existing container was in use or not, so not sure whether this would
have fixed that bug as well.)

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
@filbranden
Copy link
Copy Markdown
Contributor Author

Please give this one some attention... I'd say it's fixing an obvious bug (after you get to see it) and in my testing it did fix the hangs on Kubelet startup...

Thanks!
Filipe

@mrunalp
Copy link
Copy Markdown
Contributor

mrunalp commented Apr 10, 2018

LGTM

Approved with PullApprove

1 similar comment
@crosbymichael
Copy link
Copy Markdown
Member

crosbymichael commented Apr 10, 2018

LGTM

Approved with PullApprove

@crosbymichael crosbymichael merged commit 3cbb2fa into opencontainers:master Apr 10, 2018
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Apr 25, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update libcontainer to include PRs with fixes to systemd cgroup driver

**What this PR does / why we need it**:

PR opencontainers/runc#1754 works around an issue in manager.Apply(-1) that makes Kubelet startup hang when using systemd cgroup driver (by adding a timeout) and further PR opencontainers/runc#1772 fixes that bug by checking the proper error status before waiting on the channel.
    
PR opencontainers/runc#1776 checks whether Delegate works in slices, which keeps libcontainer systemd cgroup driver working on systemd v237+.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #61474

**Special notes for your reviewer**:
/assign @derekwaynecarr
cc @vikaschoudhary16 @sjenning @adelton @mrunalp 

**Release note**:

```release-note
NONE
```
mrunalp added a commit to projectatomic/runc that referenced this pull request Jun 12, 2018
mrunalp added a commit to projectatomic/runc that referenced this pull request Jun 12, 2018
@filbranden filbranden deleted the systemd1 branch February 7, 2019 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants