Skip to content

Zombie coroutine in a long delay() keeps the event loop alive until the timer expires #132

@EdmondDantes

Description

@EdmondDantes

Surfaced during the #125 / #129 chaos-test work.

Context (intended behaviour)

A safe scope's dispose() / cancel() does not force-cancel an
already-started child — it marks the child a zombie
(coroutine.c, async_coroutine_cancel, the is_safely branch) and drops
it from the active coroutine count. This was investigated and is by design:
a safe scope lets a running child finish gracefully rather than tearing it
down mid-flight. asNotSafely() opts into forced cancellation.

The problem

A zombie parked in a long delay() (or any timer await) keeps its libuv
timer armed. An armed timer keeps the event loop alive until the timer
naturally expires — even when nothing else is left to run.

Consequently Scope::disposeAfterTimeout(), whose whole point is a
bounded cleanup, can still hang the loop for the full remaining sleep
duration of a zombie child. The "timeout" is effectively ignored: the
process cannot exit until the arbitrary delay() elapses on its own.

Reproduction sketch

  • Open a safe scope, spawn a child that does Async\delay(<long>).
  • disposeAfterTimeout(<short>) the scope.
  • The child becomes a zombie at the short timeout, but the process keeps
    running until the long delay() expires, not the short timeout.

Proposed idea

When only zombies remain (active coroutine count is 0 and the sole
remaining work is zombie coroutines), deliver a cancellation to them so the
process can exit. A cancellation is still graceful — the zombie's
finally / catch blocks unwind normally — so it does not violate the
"let it finish gracefully" contract; it only skips the arbitrary sleep that
nothing is waiting on anymore.

Where to investigate

scheduler.c shutdown path:

  • the loop stop condition / the active_coroutines > real_coroutines check
  • start_graceful_shutdown

Determine whether zombie timers are already drained anywhere, and whether a
"cancel zombies when only zombies remain" step is the right fix or whether
the stop condition itself should treat zombie-only state as quiescent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions