Skip to content

Node failure timeout fix#1774

Merged
sanimej merged 1 commit into
moby:masterfrom
fcrisciani:node-leave
May 23, 2017
Merged

Node failure timeout fix#1774
sanimej merged 1 commit into
moby:masterfrom
fcrisciani:node-leave

Conversation

@fcrisciani
Copy link
Copy Markdown

The time to keep a node failed into the failed node list
was originally supposed to be 24h.

If a node leaves explicitly it will be removed from the list of nodes
and put into the leftNodes list. This way the NotifyLeave event won't
insert it into the retry list.
NOTE: if the event is lost instead the behavior will be the same as a failed node.

If a node fails, the NotifyLeave will insert it into the failedNodes
list with a reapTime of 24h. This means that the node will be checked
for 24h before being completely forgot. The current check time is every
1 second and is done by the reconnectNode function.
The failed node list is updated every 2h instead.

Signed-off-by: Flavio Crisciani flavio.crisciani@docker.com

The time to keep a node failed into the failed node list
was originally supposed to be 24h.

If a node leaves explicitly it will be removed from the list of nodes
and put into the leftNodes list. This way the NotifyLeave event won't
insert it into the retry list.
NOTE: if the event is lost instead the behavior will be the same as a failed node.

If a node fails, the NotifyLeave will insert it into the failedNodes
list with a reapTime of 24h. This means that the node will be checked
for 24h before being completely forgot. The current check time is every
1 second and is done by the reconnectNode function.
The failed node list is updated every 2h instead.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
@sanimej
Copy link
Copy Markdown

sanimej commented May 23, 2017

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants