Skip to content

fix: peers struggle to grow past ~10 connections during bootstrap #3508

@iduartgomez

Description

@iduartgomez

Problem

Peers have difficulty climbing past ~10 connections during the bootstrap phase. The compounded failure probability across multiple independent steps (routing, acceptance, NAT traversal) makes each connection attempt unlikely to succeed, and the current parameters don't compensate for this.

Root Cause Analysis

Connection growth requires bilateral cooperation — both the initiator and the terminus peer must succeed through several independent steps:

  1. Initiator picks a target location and sends a CONNECT
  2. CONNECT must route successfully (max 10 hops, greedy forwarding)
  3. Terminus peer must accept (probabilistic Kleinberg filter)
  4. NAT traversal must succeed

Each step has independent failure probability. The compounded success rate is quite low.

Key Throttling Mechanisms

Mechanism Current Value Effect
KLEINBERG_FILTER_MIN_CONNECTIONS 3 Above 3 connections, acceptance becomes probabilistic (50-100%) — premature for bootstrap
MAX_CONCURRENT_CONNECTIONS 3 Only 3 outbound attempts at a time, regardless of bootstrap phase
Accept-at-terminus rule Always on Each CONNECT gets exactly one acceptance chance at the terminus — if rejected, the entire attempt fails
Location-based backoff On failure Failed CONNECTs trigger exponential backoff per Location, blocking retries to that ring region

The Kleinberg gap score is not meaningful with <12 connections (too few data points for a useful distribution), yet the probabilistic filter kicks in at just 3 connections.

Proposed Solutions

1. Raise KLEINBERG_FILTER_MIN_CONNECTIONS to min_connections / 2 (~12)

Below ~12 connections, accept everything — gap-based filtering during early bootstrap is premature optimization. The gap score isn't meaningful with so few connections anyway.

Impact: Simple parameter change. Peers below 12 connections would accept all inbound CONNECTs, dramatically increasing bilateral success rate during bootstrap.

2. Increase MAX_CONCURRENT_CONNECTIONS during bootstrap

Currently hardcoded to 3 regardless of phase. Below min_connections, allowing 5–8 concurrent outbound attempts would increase the rate of successful connections since each attempt is independent.

Impact: Simple parameter change. More attempts per tick = faster growth through the early phase.

3. (More invasive) Allow non-terminus acceptance during bootstrap

If a relay peer is below min_connections / 2, it could accept the joiner even if it can forward closer. This breaks the strict accept-at-terminus rule but only during early growth, meaning each CONNECT has multiple acceptance chances rather than a single shot at the terminus.

Impact: Protocol-level change, most impactful but needs careful evaluation of topology quality tradeoffs.

Relevant Code

  • crates/core/src/ring/connection_manager.rs:469-648should_accept() with Kleinberg filter
  • crates/core/src/ring/connection_manager.rs:580KLEINBERG_FILTER_MIN_CONNECTIONS = 3
  • crates/core/src/ring.rs:1790MAX_CONCURRENT_CONNECTIONS = 3
  • crates/core/src/ring.rs:1766connection_maintenance() loop
  • crates/core/src/topology.rs:339adjust_topology()
  • crates/core/src/operations/connect.rs:452 — accept-at-terminus logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-networkingArea: Networking, ring protocol, peer discoveryE-mediumExperience needed to fix/implement: Medium / intermediateP-highHigh priorityS-needs-designStatus: Needs architectural design or RFCT-bugType: Something is broken

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions