Surfaced during the #125 / #129 chaos-test work.
Context (intended behaviour)
A safe scope's dispose() / cancel() does not force-cancel an
already-started child — it marks the child a zombie
(coroutine.c, async_coroutine_cancel, the is_safely branch) and drops
it from the active coroutine count. This was investigated and is by design:
a safe scope lets a running child finish gracefully rather than tearing it
down mid-flight. asNotSafely() opts into forced cancellation.
The problem
A zombie parked in a long delay() (or any timer await) keeps its libuv
timer armed. An armed timer keeps the event loop alive until the timer
naturally expires — even when nothing else is left to run.
Consequently Scope::disposeAfterTimeout(), whose whole point is a
bounded cleanup, can still hang the loop for the full remaining sleep
duration of a zombie child. The "timeout" is effectively ignored: the
process cannot exit until the arbitrary delay() elapses on its own.
Reproduction sketch
- Open a safe scope, spawn a child that does
Async\delay(<long>).
disposeAfterTimeout(<short>) the scope.
- The child becomes a zombie at the short timeout, but the process keeps
running until the long delay() expires, not the short timeout.
Proposed idea
When only zombies remain (active coroutine count is 0 and the sole
remaining work is zombie coroutines), deliver a cancellation to them so the
process can exit. A cancellation is still graceful — the zombie's
finally / catch blocks unwind normally — so it does not violate the
"let it finish gracefully" contract; it only skips the arbitrary sleep that
nothing is waiting on anymore.
Where to investigate
scheduler.c shutdown path:
- the loop stop condition / the
active_coroutines > real_coroutines check
start_graceful_shutdown
Determine whether zombie timers are already drained anywhere, and whether a
"cancel zombies when only zombies remain" step is the right fix or whether
the stop condition itself should treat zombie-only state as quiescent.
Surfaced during the #125 / #129 chaos-test work.
Context (intended behaviour)
A safe scope's
dispose()/cancel()does not force-cancel analready-started child — it marks the child a zombie
(
coroutine.c,async_coroutine_cancel, theis_safelybranch) and dropsit from the active coroutine count. This was investigated and is by design:
a safe scope lets a running child finish gracefully rather than tearing it
down mid-flight.
asNotSafely()opts into forced cancellation.The problem
A zombie parked in a long
delay()(or any timer await) keeps its libuvtimer armed. An armed timer keeps the event loop alive until the timer
naturally expires — even when nothing else is left to run.
Consequently
Scope::disposeAfterTimeout(), whose whole point is abounded cleanup, can still hang the loop for the full remaining sleep
duration of a zombie child. The "timeout" is effectively ignored: the
process cannot exit until the arbitrary
delay()elapses on its own.Reproduction sketch
Async\delay(<long>).disposeAfterTimeout(<short>)the scope.running until the long
delay()expires, not the short timeout.Proposed idea
When only zombies remain (active coroutine count is 0 and the sole
remaining work is zombie coroutines), deliver a cancellation to them so the
process can exit. A cancellation is still graceful — the zombie's
finally/catchblocks unwind normally — so it does not violate the"let it finish gracefully" contract; it only skips the arbitrary sleep that
nothing is waiting on anymore.
Where to investigate
scheduler.cshutdown path:active_coroutines > real_coroutinescheckstart_graceful_shutdownDetermine whether zombie timers are already drained anywhere, and whether a
"cancel zombies when only zombies remain" step is the right fix or whether
the stop condition itself should treat zombie-only state as quiescent.