Skip to content

Worker lifecycle hooks #3300

Description

@jacobtomlinson

Sometimes workers get killed, memory is lost and tasks need to be run again. Many cloud providers have a cheaper compute option which can be removed at any time in exchange for a discount and this regularly happens when using these services.

Most of these services offer a warning ahead of the machines being pulled. It would be nice to take advantage of this warning, stop workers from executing new tasks and ask them to shuffle memory to other workers.

In Kubernetes a node can be cordoned (do not accept new work) and drained (move existing work to another node) via API calls. This is the kind of functionality that would be useful here also.

A couple of questions:

  • Is it currently possible to tell a worker not to pick up new tasks?
  • Are workers able to be drained of tasks and memory currently via an RPC call?
  • Given that this logic would be cloud provider specific could we implement worker hooks which runs another process alongside the worker automatically (via entrypoints?) and communicates with the worker via RPC to manage these kind of lifecycle events?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions