Worker lifecycle hooks

Sometimes workers get killed, memory is lost and tasks need to be run again. Many cloud providers have a cheaper compute option which can be removed at any time in exchange for a discount and this regularly happens when using these services. 

Most of these services offer a warning ahead of the machines being pulled. It would be nice to take advantage of this warning, stop workers from executing new tasks and ask them to shuffle memory to other workers.

_In Kubernetes a node can be **cordoned** (do not accept new work) and **drained** (move existing work to another node) via API calls. This is the kind of functionality that would be useful here also._

A couple of questions:
- Is it currently possible to tell a worker not to pick up new tasks?
- Are workers able to be drained of tasks and memory currently via an RPC call?
- Given that this logic would be cloud provider specific could we implement worker hooks which runs another process alongside the worker automatically (via entrypoints?) and communicates with the worker via RPC to manage these kind of lifecycle events?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Worker lifecycle hooks #3300

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Worker lifecycle hooks #3300

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions