Skip to content
This repository was archived by the owner on Oct 28, 2021. It is now read-only.

Propagate blocks after PoW validation#5713

Merged
halfalicious merged 9 commits intomasterfrom
propagate-blocks-pow
Aug 23, 2019
Merged

Propagate blocks after PoW validation#5713
halfalicious merged 9 commits intomasterfrom
propagate-blocks-pow

Conversation

@halfalicious
Copy link
Contributor

@halfalicious halfalicious commented Aug 3, 2019

Fix #5371

Propagate new blocks after they've been imported into the block queue and validated. This results in the blocks being sent out faster compared to waiting until they've been imported into the block chain which reduces the uncle rate.

@halfalicious halfalicious force-pushed the propagate-blocks-pow branch 2 times, most recently from f74ec50 to 7093a31 Compare August 3, 2019 21:18
@codecov-io
Copy link

codecov-io commented Aug 3, 2019

Codecov Report

Merging #5713 into master will decrease coverage by <.01%.
The diff coverage is 27.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5713      +/-   ##
==========================================
- Coverage   63.23%   63.23%   -0.01%     
==========================================
  Files         353      353              
  Lines       30245    30276      +31     
  Branches     3387     3390       +3     
==========================================
+ Hits        19126    19144      +18     
- Misses       9887     9898      +11     
- Partials     1232     1234       +2

@halfalicious halfalicious force-pushed the propagate-blocks-pow branch from 1e30b99 to db277ea Compare August 3, 2019 22:26
@halfalicious
Copy link
Contributor Author

Currently propagate all blocks which are imported into the block queue and which pass validation, should only propagate new blocks

@halfalicious halfalicious force-pushed the propagate-blocks-pow branch 2 times, most recently from b28a8b6 to e390d3c Compare August 3, 2019 22:56
@halfalicious
Copy link
Contributor Author

halfalicious commented Aug 3, 2019

Currently propagate all blocks which are imported into the block queue and which pass validation, should only propagate new blocks

At the very least we should check if we are currently syncing in EthereumCapability::propagateBlocks and return early if we are here (the presumption being that if we are syncing we are still trying to get to chain tip and haven't started being sent new blocks or mining yet):

void EthereumCapability::propagateBlocks(std::shared_ptr<VerifiedBlocks const> const& _blocks)
{
if (_blocks->empty())
return;

Note that we have similar logic in EthereumCapability::doBackgroundWork:

void EthereumCapability::doBackgroundWork()
{
ensureInitialised();
auto h = m_chain.currentHash();
// If we've finished our initial sync (including getting all the blocks into the chain so as to
// reduce invalid transactions), start trading transactions & block hashes
if (!isSyncing() && m_chain.isKnown(m_latestBlockHashSent))
{

@halfalicious halfalicious force-pushed the propagate-blocks-pow branch 2 times, most recently from 5d2ac04 to 9ab9964 Compare August 4, 2019 23:58
@halfalicious
Copy link
Contributor Author

Had to do some manual reformatting of the code since I don't have git clang-format working again via the command line after reinstalling my OS so apologies if there are some unnecessary formatting changes in this PR.

Timer t;
tie(ir, m_syncBlockQueue, count) = bc().sync(m_bq, m_stateDB, m_syncAmount);

shared_ptr<VerifiedBlocks> verifiedBlocks = make_shared<VerifiedBlocks>();
Copy link
Contributor Author

@halfalicious halfalicious Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared_ptr is required here since we propagate the blocks on another thread (network thread) so we need to ensure that they are kept around until they can be sent out. We can't just create a copy of the blocks since the copy ctor is deleted.

@halfalicious halfalicious force-pushed the propagate-blocks-pow branch 4 times, most recently from f7f9b9e to c8d5b0c Compare August 5, 2019 00:49
@halfalicious
Copy link
Contributor Author

halfalicious commented Aug 13, 2019

Received some internal RLPX handshake errors during local testing:

aleth, a C++ Ethereum client
INFO  08-12 20:41:00 main net    Id: ##90a0431e…
INFO  08-12 20:41:00 main net    ENR: [ seq=1 id=v4 key=0290a043… tcp=30303 udp=30303 ]
aleth 1.7.0-alpha.1-20+commit.b4e4fb1a.dirty
INFO  08-12 20:41:41 p2p  info   UPnP device not found.
INFO  08-12 20:41:41 p2p  net    Active peer count: 0
INFO  08-12 20:41:41 p2p  net    Looking for peers...
Node ID: enode://90a0431e893c8fa9a0697b51e5876668f099e0fd7fd268bd0021e8d9125c1805bfac7d0e44613c7fce8275406b8126a4f299990ec8df9b1859e6b7f4e123352a@127.0.0.1:30303
INFO  08-12 20:41:41 main rpc    JSON-RPC socket path: \\.\pipe\\geth.ipc
JSONRPC Admin Session Key: hUlrVuzBFoo=
INFO  08-12 20:41:41 p2p  net    ENR updated: [ seq=2 id=v4 key=0290a043… ip=71.227.153.208 tcp=30303 udp=30303 ]
INFO  08-12 20:41:42 p2p  sync   Starting full sync
ERROR 08-12 20:41:54 p2p  rlpx   egress Internal error in handshake: RLPXFrameCoder disappeared (##5a85cb59…@35.181.105.86:30303)
ERROR 08-12 20:41:54 p2p  rlpx   egress Internal error in handshake: RLPXFrameCoder disappeared (##e31b2dbe…@40.115.25.33:30303)
ERROR 08-12 20:41:54 p2p  rlpx   egress Internal error in handshake: RLPXFrameCoder disappeared (##5fb5c839…@188.214.30.221:30415)                                                                                                            

These appear to be transient, will dig deeper to see if they are related to my changes (I don't think they are since I didn't touch anything related to handshakes but I haven't seen these errors before)

[Edit] I haven't been able to replicate this - @gumb0 : Have you seen this before?

@halfalicious
Copy link
Contributor Author

Rebased

@halfalicious
Copy link
Contributor Author

Received some internal RLPX handshake errors during local testing:

aleth, a C++ Ethereum client
INFO  08-12 20:41:00 main net    Id: ##90a0431e…
INFO  08-12 20:41:00 main net    ENR: [ seq=1 id=v4 key=0290a043… tcp=30303 udp=30303 ]
aleth 1.7.0-alpha.1-20+commit.b4e4fb1a.dirty
INFO  08-12 20:41:41 p2p  info   UPnP device not found.
INFO  08-12 20:41:41 p2p  net    Active peer count: 0
INFO  08-12 20:41:41 p2p  net    Looking for peers...
Node ID: enode://90a0431e893c8fa9a0697b51e5876668f099e0fd7fd268bd0021e8d9125c1805bfac7d0e44613c7fce8275406b8126a4f299990ec8df9b1859e6b7f4e123352a@127.0.0.1:30303
INFO  08-12 20:41:41 main rpc    JSON-RPC socket path: \\.\pipe\\geth.ipc
JSONRPC Admin Session Key: hUlrVuzBFoo=
INFO  08-12 20:41:41 p2p  net    ENR updated: [ seq=2 id=v4 key=0290a043… ip=71.227.153.208 tcp=30303 udp=30303 ]
INFO  08-12 20:41:42 p2p  sync   Starting full sync
ERROR 08-12 20:41:54 p2p  rlpx   egress Internal error in handshake: RLPXFrameCoder disappeared (##5a85cb59…@35.181.105.86:30303)
ERROR 08-12 20:41:54 p2p  rlpx   egress Internal error in handshake: RLPXFrameCoder disappeared (##e31b2dbe…@40.115.25.33:30303)
ERROR 08-12 20:41:54 p2p  rlpx   egress Internal error in handshake: RLPXFrameCoder disappeared (##5fb5c839…@188.214.30.221:30415)                                                                                                            

These appear to be transient, will dig deeper to see if they are related to my changes (I don't think they are since I didn't touch anything related to handshakes but I haven't seen these errors before)

[Edit] I haven't been able to replicate this - @gumb0 : Have you seen this before?

I investigated and I'm fairly confident that my changes aren't causing this issue. It looks like this error can happen if writing the hello packet during the RLPX handshake fails - we use the RLPXFrameCoder to read/write frames after exchanging the auths/acks with the key material, and we start using it to queue a write hello packet, followed by queueing a read of an incoming hello packet. If the write fails then we will call transition with a boost error code which will call into error, which in turn will call into cancel which will destroy the RLPXFrameCoder instance. When the read handler is finally executed - assuming that it has no boost error code - then the if (!m_io) check will be true which will result in the error message being logged:

ba::async_read(m_socket->ref(),
boost::asio::buffer(m_handshakeInBuffer, handshakeSizeBytes),
[this, self](boost::system::error_code ec, std::size_t) {
if (ec)
transition(ec);
else
{
if (!m_io)
{
LOG(m_errorLogger)
<< "Internal error in handshake: RLPXFrameCoder disappeared ("
<< m_remote << ")";
m_failureReason = HandshakeFailureReason::InternalError;
m_nextState = Error;
transition();
return;
}

@halfalicious halfalicious changed the title [WIP] Propagate blocks after PoW validation Propagate blocks after PoW validation Aug 14, 2019
@halfalicious halfalicious requested review from chfast and gumb0 and removed request for gumb0 August 14, 2019 03:34
@halfalicious
Copy link
Contributor Author

halfalicious commented Aug 14, 2019

Potential optimization - EthereumCapability::propagateNewBlocks assumes that if a peer doesn't know about the most recent block in the set of new blocks, it doesn't know about any of them which results in the whole set of blocks being sent to the peer (this is existing logic):

auto const peersWithoutBlock = selectPeers(
[&](EthereumPeer const& _peer) { return !_peer.isBlockKnown(latestHash); });
auto const peersToSendNumber =
std::max<std::size_t>(c_minBlockBroadcastPeers, std::sqrt(m_peers.size()));
std::vector<NodeID> const peersToSend = randomPeers(peersWithoutBlock, peersToSendNumber);
for (NodeID const& peerID : peersToSend)
for (auto const& b : *_newBlocks)
{
RLPStream ts;
m_host->prep(peerID, name(), ts, NewBlockPacket, 2)
.appendRaw(b.blockData, 1)
.append(b.verified.info.difficulty());
auto itPeer = m_peers.find(peerID);
if (itPeer != m_peers.end())
{
m_host->sealAndSend(peerID, ts);
// We don't want to send new block hashes to these same peers
itPeer->second.markBlockAsKnown(b.verified.info.hash());
}
}

I think this assumption is incorrect - we should check if a peer knows about a block on a block-by-block basis and only send new blocks to peers who don't know about them.

@gumb0 / @chfast : Thoughts?

@halfalicious
Copy link
Contributor Author

Rebased (to address CHANGELOG conflict) and addressed PR feedback

@halfalicious halfalicious force-pushed the propagate-blocks-pow branch 2 times, most recently from 2eb39cb to c66fa7f Compare August 17, 2019 21:15
@halfalicious
Copy link
Contributor Author

cc @gumb0

Timer t;
tie(ir, m_syncBlockQueue, count) = bc().sync(m_bq, m_stateDB, m_syncAmount);

std::shared_ptr<VerifiedBlocks> verifiedBlocks = std::make_shared<VerifiedBlocks>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an explanation comment why we need a shared_ptr (we need to keep these blocks around until they're later processed in the network thread)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good suggestion, I had already left a comment in this PR but it would be nice to also have a comment in the code.

Propagate new blocks after we've validated their PoW. PoW (and other
checks) are done as a part of importation into the block queue, so we
remove blocks from the queue once they've been validated, propagate
them to the appropriate peers, then import them into the block chain.
Don't send new blocks if we're not syncing or we're too far behind. This
new code follows the logic in EthereumCapability::maintainBlockHashes.

Also make some global constants constexpr and rename some constants for
consistency reasons.
We perform this check on the network thread because it turns out that std::atomic only supports types with noexcept ctors (i.e. no h256) and using a mutex to synchronize access to the latest block sent info seems like overkill.
Fix bug in randomPeers function
Fix comment formatting, return bad block hashes in BlockChain::sync() via existing tuple
rather than via out parameter, remove unused
EthereumCapability::randomPartitionPeers, optimize
EthereumCapability::randomPeers
@halfalicious
Copy link
Contributor Author

Rebased

Fix some bugs:
* Only shuffle the peers list in randomPeers if a subset
of them are being returned
* Update the hash of the latest block sent in EthereumCapability if
we're too far behind - otherwise we'll always be behind and therefore
send any new blocks or new block hashes.

Also, change the CHANGELOG entry from "added" to "changed" (since this
isn't a new feature but modification of existing behavior)
@gumb0
Copy link
Member

gumb0 commented Aug 23, 2019

I think this assumption is incorrect - we should check if a peer knows about a block on a block-by-block basis and only send new blocks to peers who don't know about them.

Yeah I think it would be reasonable, but for now I'd avoid further complicating this area of code. It's difficult to test, I'm not even sure whether the current logic works as expected.

@halfalicious halfalicious merged commit 5f3d000 into master Aug 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Propagate new blocks after PoW check

3 participants