Garbage collect incompatible peers in Host::run() by halfalicious · Pull Request #5624 · ethereum/aleth

halfalicious · 2019-06-12T03:51:30Z

Garbage collect peers in Host::run() which are deemed incompatible - an incompatible peer is one with which we will never be able to successfully peer with (e.g. on a different chain or network, not running any capabilities).

Garbage collect peers in Host::run() which are deemed incompatible - an incompatible peer is one with which we will never be able to connect to successfully peer with (e.g. on a different chain or network, not running any capabilities).

libp2p/Host.cpp

gumb0 · 2019-06-12T16:04:34Z

libp2p/Peer.cpp

+    case UserReason:
+        return numeric_limits<unsigned>::max();
    case TooManyPeers:
        return 25 * (m_failedAttempts + 1);


m_failedAttempts seems to affect only the value of fallbackSeconds currently. Maybe we should use it now to retry several times and then go to "critical error, disconnect" state.
(at least for some cases of failures)

@gumb0 Good idea - what do you think of this being taken care of in another PR? I'd like to limit the amount of changes I make to the peer gc logic in this PR so if something ends up breaking it will be easier to debug.

Ok for another PR, but it seems to be the matter of only adding condition like m_failedAttempts >= 20 to uselessPeer function.

Additional thought is that we could make all this change more conservative if we leave fallbackSeconds() as it was before (so that we do the same reconnects as before) and have just this check for failed attempts count in uselessPeer (plus the new check for handshake failures)
This way it would reconnect with the same intervals for each case as before, but stop after limited number of attempts.

But I'm fine with it if you think it's better to immediately stop in some cases.

libp2p/Host.cpp

Rather than use a magic number to indicate a useless peer (numeric_limits<unsigned>::max()), add a function to Peer (Peer::uselessPeer) which can be called to detect this. Also update Peer::fallbackSeconds() to return a number which is large enough to prevent us from realistically ever connecting to a useless peer but which is small enough to not overflow when added to Peer::m_lastAttempted. Also, log a message with error verbosity in Host::handshakeFailed() if a peer is not found rather than asserting, since while we never expect users to hit this case, the EIP8 handshake tests can.

Update Peer::uselessPeer() to check peer type and return false if it's a required peer, since in this case the user obviously wants to try to stay connected to this peer.

codecov-io · 2019-06-14T04:18:32Z

Codecov Report

Merging #5624 into master will increase coverage by <.01%.
The diff coverage is 59.42%.

@@            Coverage Diff             @@
##           master    #5624      +/-   ##
==========================================
+ Coverage   62.62%   62.63%   +<.01%     
==========================================
  Files         350      350              
  Lines       29690    29742      +52     
  Branches     3344     3350       +6     
==========================================
+ Hits        18593    18628      +35     
- Misses       9889     9901      +12     
- Partials     1208     1213       +5

gumb0

Some minor comments and a small bug in the loop iterating over peers.

libp2p/RLPxHandshake.h

libp2p/RLPxHandshake.cpp

gumb0 · 2019-06-14T10:29:48Z

libp2p/Host.h

+    std::shared_ptr<Peer> peer(NodeID const& _n) const;
+
+    /// Set a handshake failure reason for a peer
+    void handshakeFailed(NodeID const& _n, HandshakeFailureReason _r);


Maybe better onHandshakeFailed

Also I would make it public and declare close to startPeerSession

@gumb0 : Why make this public, does it make sense to expose the concept of a handshake to consumers of Host?

Well it looks to me like a callback similar to startPeerSession, the callback called by RLPXHandshake when handshake is finished. One is for success another one is for failure.

It works as private, because RLPXHandshake is a friend of Host, but idealluy we should get rid of this friend declarations at some point.

Exposing it to the clients of Host is of course not great, but the proper way to deal with it could be to create a separate interface with these callbacks only, don't expose it to Host clients, but pass it only to RLPXHandshake. That's a bit complicated change, at least it's not for this PR.

(In other words, we won't make it much worse, because startPeerSession is already public)

Well it looks to me like a callback similar to startPeerSession, the callback called by RLPXHandshake when handshake is finished. One is for success another one is for failure.

It works as private, because RLPXHandshake is a friend of Host, but idealluy we should get rid of this friend declarations at some point.

Exposing it to the clients of Host is of course not great, but the proper way to deal with it could be to create a separate interface with these callbacks only, don't expose it to Host clients, but pass it only to RLPXHandshake. That's a bit complicated change, at least it's not for this PR.

(In other words, we won't make it much worse, because startPeerSession is already public)

Ah that makes sense, thank you for clarifying! 😄 I'll make the change before merging.

libp2p/Host.cpp

libp2p/Peer.h

gumb0 · 2019-06-14T10:45:46Z

libp2p/Peer.cpp

+    case UselessPeer:
+    case IncompatibleProtocol:
+    case UnexpectedIdentity:
+    case UserReason:


Not sure about UserReason - this could in theory be any kind of reason specific to subprotocol, including some temporary reasons, maybe it makes sense to try to reconnect

I chose to treat UserReason as terminal because we only use it in 2 cases - if status validation fails:

aleth/libethereum/BlockChainSync.cpp

Lines 190 to 202 in 505aead

std::string disconnectReason;

if (peerSessionInfo->clientVersion.find("/v0.7.0/") != string::npos)

disconnectReason = "Blacklisted client version.";

else

disconnectReason = _peer.validate(

host().chain().genesisHash(), host().protocolVersion(), host().networkId());

if (!disconnectReason.empty())

{

LOG(m_logger) << "Peer " << _peer.id() << " not suitable for sync: " << disconnectReason;

m_host.capabilityHost().disconnect(_peer.id(), p2p::UserReason);

return;

}

Here's where we perform the actual validation in EthereumPeer::validate:

aleth/libethereum/EthereumPeer.cpp

Lines 58 to 68 in 505aead

if (m_networkId != _hostNetworkId)

error << "Network identifier mismatch. Host network id: " << _hostNetworkId

<< ", peer network id: " << m_networkId;

else if (m_protocolVersion != _hostProtocolVersion)

error << "Protocol version mismatch. Host protocol version: " << _hostProtocolVersion

<< ", peer protocol version: " << m_protocolVersion;

else if (m_genesisHash != _hostGenesisHash)

error << "Genesis hash mismatch. Host genesis hash: " << _hostGenesisHash.abridged()

<< ", peer genesis hash: " << m_genesisHash.abridged();

else if (m_asking != Asking::State && m_asking != Asking::Nothing)

error << "Peer banned for unexpected status message.";

If there's a bug in our network code and Session read validation fails:

aleth/libp2p/Session.cpp

Lines 409 to 418 in 505aead

else if (_length != _expected)

{

// with static m_data-sized buffer this shouldn't happen unless there's a regression

// sec recommends checking anyways (instead of assert)

LOG(m_netLoggerError)

<< "Error reading - TCP read buffer length differs from expected frame size ("

<< _length << " != " << _expected << ")";

disconnect(UserReason);

return false;

}

You bring up a good point though, technically it can be any reason specific to a subprotocol and other clients can also choose to send it to us for reasons specific to their implementation. I think that rather than treating it as immediately critical we should use it in some sort of reconnection count threshold.

@gumb0 I've decided to change UserReason -> UselessPeer when status message validation fails for a peer (since if that happens the peer is effectively useless to us i.e. we can't sync with it). I've also added in logic in Peer::isUseless to take the number of failed connection attempts into account when the last disconnect isn't critical.

libp2p/Host.cpp

libp2p/Peer.h

Status validation failing means we have no chance of syncing with the peer (e.g. it's running on a different network) so a UselessPeer disconnect reason makes more sense than UserReason. This also has the benefit of status failure validation being treated as a critical disconnect and these nodes being gc'd in Host::run.

gumb0

Looks good, feel free to make onHandshakeFailed public, if you agree with my reasoning, but it's not critical

Various minor changes in RLPxHandshake, Host, and Peer classes: * Initialize RLPxHandshake::m_failureReason in ctor * Reduce redundant code in RLPxHandshake which sets the last failure reason when TCP errors occur * Rename Host::handshakeFailed * Fix m_peers iteration bug in Host * Make Host::onHandshakeFailed public (we want to avoid using friend classes where possible) * Rename Peers::uselessPeer * Update deprecated comment in Peer * Take # of failed connection attempts into account in Peer function which determines if instance is useless or not Move fallback seconds computation for default case to anonymous namespace function

halfalicious added networking in progress labels Jun 12, 2019

halfalicious changed the title ~~Garbage collect incompatible peers in Host::run()~~ [WIP] Garbage collect incompatible peers in Host::run() Jun 12, 2019

halfalicious self-assigned this Jun 12, 2019

halfalicious and others added 2 commits June 11, 2019 22:39

Garbage collect incompatible peers

d7741b1

Garbage collect peers in Host::run() which are deemed incompatible - an incompatible peer is one with which we will never be able to connect to successfully peer with (e.g. on a different chain or network, not running any capabilities).

Update changelog

06b00ce

halfalicious force-pushed the gc-peers branch from 362c17b to 06b00ce Compare June 12, 2019 05:39

gumb0 reviewed Jun 12, 2019

View reviewed changes

halfalicious added 2 commits June 13, 2019 20:54

Set handshake failure reason in EIP8 code paths

a587575

halfalicious force-pushed the gc-peers branch from f1f5db3 to 64f120c Compare June 14, 2019 03:58

Required peers are never useless

e65197a

Update Peer::uselessPeer() to check peer type and return false if it's a required peer, since in this case the user obviously wants to try to stay connected to this peer.

halfalicious removed the in progress label Jun 14, 2019

halfalicious changed the title ~~[WIP] Garbage collect incompatible peers in Host::run()~~ Garbage collect incompatible peers in Host::run() Jun 14, 2019

halfalicious mentioned this pull request Jun 14, 2019

Don't add official Ethereum bootnodes as required peers #5628

Merged

gumb0 suggested changes Jun 14, 2019

View reviewed changes

gumb0 reviewed Jun 14, 2019

View reviewed changes

libp2p/Peer.h Outdated Show resolved Hide resolved

halfalicious force-pushed the gc-peers branch from 81deb54 to 7b67130 Compare June 16, 2019 22:32

halfalicious requested a review from chfast June 16, 2019 22:48

gumb0 approved these changes Jun 17, 2019

View reviewed changes

halfalicious force-pushed the gc-peers branch from 7b67130 to b0d5485 Compare June 18, 2019 03:12

halfalicious merged commit b9c49ba into master Jun 18, 2019

halfalicious deleted the gc-peers branch June 18, 2019 03:13

	std::string disconnectReason;
	if (peerSessionInfo->clientVersion.find("/v0.7.0/") != string::npos)
	disconnectReason = "Blacklisted client version.";
	else
	disconnectReason = _peer.validate(
	host().chain().genesisHash(), host().protocolVersion(), host().networkId());

	if (!disconnectReason.empty())
	{
	LOG(m_logger) << "Peer " << _peer.id() << " not suitable for sync: " << disconnectReason;
	m_host.capabilityHost().disconnect(_peer.id(), p2p::UserReason);
	return;
	}

	if (m_networkId != _hostNetworkId)
	error << "Network identifier mismatch. Host network id: " << _hostNetworkId
	<< ", peer network id: " << m_networkId;
	else if (m_protocolVersion != _hostProtocolVersion)
	error << "Protocol version mismatch. Host protocol version: " << _hostProtocolVersion
	<< ", peer protocol version: " << m_protocolVersion;
	else if (m_genesisHash != _hostGenesisHash)
	error << "Genesis hash mismatch. Host genesis hash: " << _hostGenesisHash.abridged()
	<< ", peer genesis hash: " << m_genesisHash.abridged();
	else if (m_asking != Asking::State && m_asking != Asking::Nothing)
	error << "Peer banned for unexpected status message.";

	else if (_length != _expected)
	{
	// with static m_data-sized buffer this shouldn't happen unless there's a regression
	// sec recommends checking anyways (instead of assert)
	LOG(m_netLoggerError)
	<< "Error reading - TCP read buffer length differs from expected frame size ("
	<< _length << " != " << _expected << ")";
	disconnect(UserReason);
	return false;
	}

Conversation

halfalicious commented Jun 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gumb0 Jun 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-io commented Jun 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gumb0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halfalicious Jun 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gumb0 Jun 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gumb0 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

halfalicious commented Jun 12, 2019 •

edited

Loading

gumb0 Jun 14, 2019 •

edited

Loading

codecov-io commented Jun 14, 2019 •

edited

Loading

halfalicious Jun 15, 2019 •

edited

Loading

gumb0 Jun 17, 2019 •

edited

Loading