Skip to content

proposal: use Happy Eyeballs-like logic for dialing peers #1785

@marten-seemann

Description

@marten-seemann

Happy Eyeballs, defined in RFC 8305, specifies how to dial a server that one has an IPv4 and an IPv6 address for: In order to not overload the network, the IPv6 address should be dialed first, and if no connection can be established within 250ms, another dial attempt using IPv4 should be started in parallel. The application will use whatever connection is established first.

Why?

We're in the process of adding more transports. As more and more upgrade, we can expect their list of advertised addresses to grow. libp2p/specs#353 will further increase the number of addresses. It puts a lot of load on our node, on the network and on the peer to dial all these addresses in parallel. We need to be smart and dial addresses such that 1. we end up with a connection over a transport that we prefer and 2. we have a high probability of successfuly connecting on the first or second connection attempt.

Differences from RFC 8305

  • In the general case, we’ll start with a list of (multi)addrs that contain IPs and / or dnsaddrs (which need to be resolved first)
  • We don’t only need to rank IPv6 and IPv4, but first and foremost different transports

Proposed address ranking algorithm

Preprocessing: Bucket addresses into local / internet-wide addresses (⇒ 2 buckets). For every bucket, run:

  1. Filtering: if we have the same IP address or domain name for QUIC / WebTransport and TCP / WebSocket, remove WebTransport / WebSocket address
  2. Sorting:
    1. a single address of each transport in the order: QUIC > TCP > WebTransport > WebSocket > WebRTC > Circuit
    2. other addresses: we don’t really care, randomize?
  3. re-rerun filtering step

Open Questions

  • What about IPv4 vs. IPv6: if given QUIC (v4 + v6) and TCP (v4 + v6), do we do
    • QUIC v6, QUIC v4, TCP v6, TCP v4 or
    • QUIC v6, TCP v6, QUIC v4, TCP v4
  • If given multiple IP addresses for the same transport, how do we select the one we dial
    • this really shouldn’t happen if we had decent address discovery on the sender side, but we don’t…
    • picking one at random seems fine
  • When (if at all) do we start DNS resolution, when we have some multiaddrs containing IP addresses?
  • Can we tell if IPv6 is not available (not all ISPs provide v6 functionality to their customers)?

Possible optimizations

  • Find out if UDP is blackholed in our network, and disable QUIC in that case. We should re-probe on a regular basis to see if things have changed.
  • Build an RTT estimation logic based on IP address. For exact matches (most likely re-dials, or dials to Hydra nodes), we can use a value based on the last RTT instead of the fixed 250ms. Using prefix matching, we might also be able to take an informed guess to an IP that's "close" to another RTT that we know the RTT for.

Metadata

Metadata

Assignees

Labels

P1High: Likely tackled by core team if no one steps upeffort/weeksEstimated to take multiple weeksexp/intermediatePrior experience is likely helpfulkind/enhancementA net-new feature or improvement to an existing feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions