-
Notifications
You must be signed in to change notification settings - Fork 747
Description
The nakamoto miner implementation can end up submitting a tenure extension "too soon" after submitting a prior extend in certain scenarios. This can lead to a 90s to 3 minute stall (as happened on block 7,099,944).
I think what happens is roughly:
- Miner proposes a mid-tenure Block A with a tenure extend, because their local view of the signer set shows that the most recent timestamps all would accept a tenure extend.
- Signers accept Block A and broadcast the block
- The miner sees Block A in their staging db before they get the latest wave of signer messages
- The miner exits the signer_coordinator, which means that they do not ever process the latest wave of signer messages.
- The miner assembles Block B, thinking it should try a tenure extend. The miner checks the signer_coordinator to see if the signer set would allow it, but because the signer_coordinator still has stale messages, the miner sees a tenure extend as allowable (even though its been much less than the required idle time).
So, signers reject Block B (which is repeatedly broadcasted by the miner when they hit retry-timeouts) until the idle time is reached.
There's two strategies I can immediately think of for remedying this.
First is that we try to flush the stale messages. I think that means tagging each signer response with the block that they correspond to, and flushing any messages that do not correspond to the parent.
The second strategy would be to make the miner more responsive to this kind of block rejection. That is, when the signers reject an extension due to timing, the miner could check the response to see how far off they are, and if its greater than some threshold, abandon the block.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status