-
Notifications
You must be signed in to change notification settings - Fork 298
unit with lengthy deactivation procedure remains inactive (No such file or directory bug) #1158
Description
I'm scheduling the following units bar.service and baz.service to a single-node cluster:
core@core-01 ~ $ cat bar.service
[Service]
ExecStart=/usr/bin/sleep infinity
core@core-01 ~ $ cat baz.service
[Unit]
After=bar.service
BindsTo=bar.service
[Service]
ExecStart=/usr/bin/sleep infinity
ExecStop=/usr/bin/sleep 20
First, start the two units:
core@core-01 ~ $ fleetctl start --no-block bar baz
Triggered unit bar.service start
Triggered unit baz.service start
Check the status of fleet and systemd:
core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT HASH DSTATE STATE TARGET
bar.service 40ea664 launched launched a84622dd.../172.17.8.101
baz.service 221b757 launched launched a84622dd.../172.17.8.101
UNIT MACHINE ACTIVE SUB
bar.service a84622dd.../172.17.8.101 active running
baz.service a84622dd.../172.17.8.101 active running
● bar.service
Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
Active: active (running) since Fri 2015-03-20 03:21:50 UTC; 1s ago
Main PID: 2193 (sleep)
CGroup: /system.slice/bar.service
└─2193 /usr/bin/sleep infinity
Mar 20 03:21:50 core-01 systemd[1]: Starting bar.service...
Mar 20 03:21:50 core-01 systemd[1]: Started bar.service.
● baz.service
Loaded: loaded (/run/fleet/units/baz.service; linked-runtime; vendor preset: disabled)
Active: active (running) since Fri 2015-03-20 03:21:50 UTC; 1s ago
Main PID: 2194 (sleep)
CGroup: /system.slice/baz.service
└─2194 /usr/bin/sleep infinity
Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.
Everything is OK. Now unload the units:
core@core-01 ~ $ fleetctl unload --no-block bar baz
Triggered unit bar.service unload
Triggered unit baz.service unload
core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT HASH DSTATE STATE TARGET
bar.service 40ea664 inactive inactive -
baz.service 221b757 inactive inactive -
UNIT MACHINE ACTIVE SUB
● bar.service
Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
Active: active (running) since Fri 2015-03-20 03:21:50 UTC; 12s ago
Main PID: 2193 (sleep)
CGroup: /system.slice/bar.service
└─2193 /usr/bin/sleep infinity
Mar 20 03:21:50 core-01 systemd[1]: Starting bar.service...
Mar 20 03:21:50 core-01 systemd[1]: Started bar.service.
Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.
● baz.service
Loaded: loaded (/run/fleet/units/baz.service; linked-runtime; vendor preset: disabled)
Active: deactivating (stop) since Fri 2015-03-20 03:22:00 UTC; 2s ago
Main PID: 2194 (sleep); : 2215 (sleep)
CGroup: /system.slice/baz.service
├─2194 /usr/bin/sleep infinity
└─control
└─2215 /usr/bin/sleep 20
Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.
Mar 20 03:22:00 core-01 systemd[1]: Stopping baz.service...
Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.
fleetctl stops reporting state for the units immediately, but baz.service is still deactivating. Now start the two units again before baz.service finishes its ExecStop:
core@core-01 ~ $ fleetctl start --no-block bar baz
Triggered unit bar.service start
Triggered unit baz.service start
Check the status of fleet and systemd immediately:
core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT HASH DSTATE STATE TARGET
bar.service 40ea664 launched launched a84622dd.../172.17.8.101
baz.service 221b757 launched launched a84622dd.../172.17.8.101
UNIT MACHINE ACTIVE SUB
bar.service a84622dd.../172.17.8.101 active running
baz.service a84622dd.../172.17.8.101 deactivating stop
● bar.service
Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
Active: active (running) since Fri 2015-03-20 03:22:15 UTC; 2s ago
Main PID: 2269 (sleep)
CGroup: /system.slice/bar.service
└─2269 /usr/bin/sleep infinity
Mar 20 03:22:15 core-01 systemd[1]: Started bar.service.
● baz.service
Loaded: not-found (Reason: No such file or directory)
Active: deactivating (stop) since Fri 2015-03-20 03:22:00 UTC; 17s ago
Main PID: 2194 (sleep); : 2215 (sleep)
CGroup: /system.slice/baz.service
├─2194 /usr/bin/sleep infinity
└─control
└─2215 /usr/bin/sleep 20
Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.
Mar 20 03:22:00 core-01 systemd[1]: Stopping baz.service...
baz.service is still not done deactivating, but oddly enough, it's still not-found. Checking the status after deactivation is complete:
core@core-01 ~ $ fleetctl list-unit-files && fleetctl list-units && systemctl status bar baz
UNIT HASH DSTATE STATE TARGET
bar.service 40ea664 launched launched a84622dd.../172.17.8.101
baz.service 221b757 launched launched a84622dd.../172.17.8.101
UNIT MACHINE ACTIVE SUB
bar.service a84622dd.../172.17.8.101 active running
baz.service a84622dd.../172.17.8.101 inactive dead
● bar.service
Loaded: loaded (/run/fleet/units/bar.service; linked-runtime; vendor preset: disabled)
Active: active (running) since Fri 2015-03-20 03:22:15 UTC; 10s ago
Main PID: 2269 (sleep)
CGroup: /system.slice/bar.service
└─2269 /usr/bin/sleep infinity
Mar 20 03:22:15 core-01 systemd[1]: Started bar.service.
● baz.service
Loaded: loaded (/run/fleet/units/baz.service; linked-runtime; vendor preset: disabled)
Active: inactive (dead)
Mar 20 03:20:33 core-01 systemd[1]: Starting baz.service...
Mar 20 03:20:38 core-01 systemd[1]: Starting baz.service...
Mar 20 03:20:38 core-01 systemd[1]: Starting baz.service...
Mar 20 03:20:38 core-01 systemd[1]: Started baz.service.
Mar 20 03:21:18 core-01 systemd[1]: Stopping baz.service...
Mar 20 03:21:38 core-01 systemd[1]: Stopped baz.service.
Mar 20 03:21:50 core-01 systemd[1]: Starting baz.service...
Mar 20 03:21:50 core-01 systemd[1]: Started baz.service.
Mar 20 03:22:00 core-01 systemd[1]: Stopping baz.service...
Mar 20 03:22:20 core-01 systemd[1]: Stopped baz.service.
Now baz.service is inactive. Given that I just called fleetctl start on it, though, I would expect it to be active. Checking the logs, I see the dreaded No such file or directory error (fifth from the bottom):
Mar 20 03:21:49 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(bar.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:21:49 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: bar.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:21:49 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(baz.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:21:49 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: baz.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit bar.service (44b)
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit baz.service (117b)
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:134: Triggered systemd unit bar.service start: job=7560
Mar 20 03:21:50 core-01 fleetd[963]: INFO manager.go:134: Triggered systemd unit baz.service start: job=7640
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=bar.service reason="unit scheduled here but not loaded"
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=baz.service reason="unit scheduled here but not loaded"
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=bar.service reason="unit currently loaded but desired state is launched"
Mar 20 03:21:50 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=baz.service reason="unit currently loaded but desired state is launched"
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:145: Triggered systemd unit bar.service stop: job=7721
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:275: Removing systemd unit bar.service
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:145: Triggered systemd unit baz.service stop: job=7722
Mar 20 03:22:00 core-01 fleetd[963]: INFO manager.go:275: Removing systemd unit baz.service
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=UnloadUnit job=bar.service reason="unit loaded but not scheduled here"
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=UnloadUnit job=baz.service reason="unit loaded but not scheduled here"
Mar 20 03:22:00 core-01 fleetd[963]: INFO engine.go:257: Unscheduled Job(bar.service) from Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: UnscheduleUnit, JobName: bar.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state inactive"}
Mar 20 03:22:00 core-01 fleetd[963]: INFO engine.go:257: Unscheduled Job(baz.service) from Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:00 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: UnscheduleUnit, JobName: baz.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state inactive"}
Mar 20 03:22:14 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(bar.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:14 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: bar.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:22:14 core-01 fleetd[963]: INFO engine.go:272: Scheduled Unit(baz.service) to Machine(a84622dda07549d0b4d855ca2b78948c)
Mar 20 03:22:14 core-01 fleetd[963]: INFO reconciler.go:163: EngineReconciler completed task: {Type: AttemptScheduleUnit, JobName: baz.service, MachineID: a84622dda07549d0b4d855ca2b78948c, Reason: "target state launched and unit not scheduled"}
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit bar.service (44b)
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:198: Instructing systemd to reload units
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:262: Writing systemd unit baz.service (117b)
Mar 20 03:22:15 core-01 fleetd[963]: INFO manager.go:134: Triggered systemd unit bar.service start: job=7806
Mar 20 03:22:15 core-01 fleetd[963]: ERROR manager.go:136: Failed to trigger systemd unit baz.service start: Unit baz.service failed to load: No such file or directory.
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=bar.service reason="unit scheduled here but not loaded"
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=LoadUnit job=baz.service reason="unit scheduled here but not loaded"
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=bar.service reason="unit currently loaded but desired state is launched"
Mar 20 03:22:15 core-01 fleetd[963]: INFO reconcile.go:311: AgentReconciler completed task: type=StartUnit job=baz.service reason="unit currently loaded but desired state is launched"