Skip to content
This repository was archived by the owner on Jan 30, 2020. It is now read-only.
This repository was archived by the owner on Jan 30, 2020. It is now read-only.

unit with lengthy deactivation procedure remains inactive (No such file or directory bug) #1158

@bcwaldon

Description

@bcwaldon

I'm scheduling the following units bar.service and baz.service to a single-node cluster:

core@core-01 ~ $ cat bar.service
[Service]
ExecStart=/usr/bin/sleep infinity

core@core-01 ~ $ cat baz.service
[Unit]
After=bar.service
BindsTo=bar.service
[Service]
ExecStart=/usr/bin/sleep infinity
ExecStop=/usr/bin/sleep 20

First, start the two units:

core@core-01 ~ $ fleetctl start --no-block bar baz
Triggered unit bar.service start
Triggered unit baz.service start

Check the status of fleet and systemd:

core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT        HASH    DSTATE      STATE       TARGET
bar.service 40ea664 launched    launched    a84622dd.../172.17.8.101
baz.service 221b757 launched    launched    a84622dd.../172.17.8.101
UNIT        MACHINE             ACTIVE  SUB
bar.service a84622dd.../172.17.8.101    active  running
baz.service a84622dd.../172.17.8.101    active  running
● bar.service
   Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Fri 2015-03-20 03:21:50 UTC; 1s ago
 Main PID: 2193 (sleep)
   CGroup: /system.slice/bar.service
           └─2193 /usr/bin/sleep infinity

Mar 20 03:21:50 core-01 systemd[1]: Starting bar.service...
Mar 20 03:21:50 core-01 systemd[1]: Started bar.service.

● baz.service
   Loaded: loaded (/run/fleet/units/baz.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Fri 2015-03-20 03:21:50 UTC; 1s ago
 Main PID: 2194 (sleep)
   CGroup: /system.slice/baz.service
           └─2194 /usr/bin/sleep infinity

Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.

Everything is OK. Now unload the units:

core@core-01 ~ $ fleetctl unload --no-block bar baz
Triggered unit bar.service unload
Triggered unit baz.service unload
core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT        HASH    DSTATE      STATE       TARGET
bar.service 40ea664 inactive    inactive    -
baz.service 221b757 inactive    inactive    -
UNIT    MACHINE ACTIVE  SUB
● bar.service
   Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Fri 2015-03-20 03:21:50 UTC; 12s ago
 Main PID: 2193 (sleep)
   CGroup: /system.slice/bar.service
           └─2193 /usr/bin/sleep infinity

Mar 20 03:21:50 core-01 systemd[1]: Starting bar.service...
Mar 20 03:21:50 core-01 systemd[1]: Started bar.service.

Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.

● baz.service
   Loaded: loaded (/run/fleet/units/baz.service; linked-runtime; vendor preset: disabled)
   Active: deactivating (stop) since Fri 2015-03-20 03:22:00 UTC; 2s ago
 Main PID: 2194 (sleep);         : 2215 (sleep)
   CGroup: /system.slice/baz.service
           ├─2194 /usr/bin/sleep infinity
           └─control
             └─2215 /usr/bin/sleep 20

Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.
Mar 20 03:22:00 core-01 systemd[1]: Stopping baz.service...

Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.

fleetctl stops reporting state for the units immediately, but baz.service is still deactivating. Now start the two units again before baz.service finishes its ExecStop:

core@core-01 ~ $ fleetctl start --no-block bar baz
Triggered unit bar.service start
Triggered unit baz.service start

Check the status of fleet and systemd immediately:

core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT        HASH    DSTATE      STATE       TARGET
bar.service 40ea664 launched    launched    a84622dd.../172.17.8.101
baz.service 221b757 launched    launched    a84622dd.../172.17.8.101
UNIT        MACHINE             ACTIVE      SUB
bar.service a84622dd.../172.17.8.101    active      running
baz.service a84622dd.../172.17.8.101    deactivating    stop
● bar.service
   Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Fri 2015-03-20 03:22:15 UTC; 2s ago
 Main PID: 2269 (sleep)
   CGroup: /system.slice/bar.service
           └─2269 /usr/bin/sleep infinity

Mar 20 03:22:15 core-01 systemd[1]: Started bar.service.

● baz.service
   Loaded: not-found (Reason: No such file or directory)
   Active: deactivating (stop) since Fri 2015-03-20 03:22:00 UTC; 17s ago
 Main PID: 2194 (sleep);         : 2215 (sleep)
   CGroup: /system.slice/baz.service
           ├─2194 /usr/bin/sleep infinity
           └─control
             └─2215 /usr/bin/sleep 20

Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.
Mar 20 03:22:00 core-01 systemd[1]: Stopping baz.service...

baz.service is still not done deactivating, but oddly enough, it's still not-found. Checking the status after deactivation is complete:

core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT        HASH    DSTATE      STATE       TARGET
bar.service 40ea664 launched    launched    a84622dd.../172.17.8.101
baz.service 221b757 launched    launched    a84622dd.../172.17.8.101
UNIT        MACHINE             ACTIVE      SUB
bar.service a84622dd.../172.17.8.101    active      running
baz.service a84622dd.../172.17.8.101    inactive    dead
● bar.service
   Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Fri 2015-03-20 03:22:15 UTC; 10s ago
 Main PID: 2269 (sleep)
   CGroup: /system.slice/bar.service
           └─2269 /usr/bin/sleep infinity

Mar 20 03:22:15 core-01 systemd[1]: Started bar.service.

● baz.service
   Loaded: loaded (/run/fleet/units/baz.service; linked-runtime; vendor preset: disabled)
   Active: inactive (dead)

Mar 20 03:20:33 core-01 systemd[1]: Starting baz.service...
Mar 20 03:20:38 core-01 systemd[1]: Starting baz.service...
Mar 20 03:20:38 core-01 systemd[1]: Starting baz.service...
Mar 20 03:20:38 core-01 systemd[1]: Started baz.service.
Mar 20 03:21:18 core-01 systemd[1]: Stopping baz.service...
Mar 20 03:21:38 core-01 systemd[1]: Stopped baz.service.
Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.
Mar 20 03:22:00 core-01 systemd[1]: Stopping baz.service...
Mar 20 03:22:20 core-01 systemd[1]: Stopped baz.service.

Now baz.service is inactive. Given that I just called fleetctl start on it, though, I would expect it to be active. Checking the logs, I see the dreaded No such file or directory error (fifth from the bottom):

Mar 20 03:21:49 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(bar.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:21:49 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: bar.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:21:49 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(baz.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:21:49 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: baz.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit bar.service (44b)
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit baz.service (117b)
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:134: Triggered systemd unit bar.service start: job=7560
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:134: Triggered systemd unit baz.service start: job=7640
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=bar.service reason="unit scheduled here but not loaded"
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=baz.service reason="unit scheduled here but not loaded"
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=bar.service reason="unit currently loaded but desired state is launched"
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=baz.service reason="unit currently loaded but desired state is launched"
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:145: Triggered systemd unit bar.service stop: job=7721
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:275: Removing systemd unit bar.service
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:145: Triggered systemd unit baz.service stop: job=7722
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:275: Removing systemd unit baz.service
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=UnloadUnit job=bar.service reason="unit loaded but not scheduled here"
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=UnloadUnit job=baz.service reason="unit loaded but not scheduled here"
Mar 20 03:22:00 core-01 fleetd[963]: INFO engine.go:257: Unscheduled Job(bar.service) from Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: UnscheduleUnit, JobName: bar.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state inactive"}
Mar 20 03:22:00 core-01 fleetd[963]: INFO engine.go:257: Unscheduled Job(baz.service) from Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: UnscheduleUnit, JobName: baz.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state inactive"}
Mar 20 03:22:14 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(bar.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:14 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: bar.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:22:14 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(baz.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:14 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: baz.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit bar.service (44b)
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:198: Instructing systemd to reload units
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit baz.service (117b)
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:134: Triggered systemd unit bar.service start: job=7806
Mar 20 03:22:15 core-01 fleetd[963]: ERROR manager.go:136: Failed to trigger systemd unit baz.service start: Unit baz.service failed to load: No such file or directory.
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=bar.service reason="unit scheduled here but not loaded"
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=baz.service reason="unit scheduled here but not loaded"
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=bar.service reason="unit currently loaded but desired state is launched"
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=baz.service reason="unit currently loaded but desired state is launched"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions