Skip to content

feat: persist peer bans for a configurable duration (A-1157)#23922

Merged
fcarreiro merged 12 commits into
merge-train/spartan-v5from
pw/ban-peers
Jun 10, 2026
Merged

feat: persist peer bans for a configurable duration (A-1157)#23922
fcarreiro merged 12 commits into
merge-train/spartan-v5from
pw/ban-peers

Conversation

@PhilWindle

@PhilWindle PhilWindle commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Fixes A-1157. Addresses security advisory GHSA-h4vv-85x5-6hmh.

Problem

Peer scores decay toward zero (~0.9/minute). A peer whose score crossed the ban threshold (MIN_SCORE_BEFORE_BAN = -100) recovered to a healthy score within approximately 1 hour.

Fix

Record a ban when a peer's score drops below the ban threshold and hold it for a configurable duration (default 24h). Bans are kept in memory only and are cleared on restart — a restarted node re-learns bad peers from their behaviour rather than carrying stale bans across runs.

  • PeerScoring records { score, expiry } in an in-memory bannedPeers map, so getScore/getScoreState stay synchronous (required by the peer-manager hot paths, including a .sort() comparator).
  • While banned, getScore returns the ban score regardless of decay, so a peer cannot recover its way out of the ban early — even after decayAllScores cleans up the decayed live-score entry. Once the ban expires it is lifted and the live (decayed) score takes over, letting the peer recover.
  • Expired bans are lifted lazily on the next score query (getActiveBanScore) and swept proactively each heartbeat via pruneExpiredBans(), so a banned peer that disconnects and is never queried again does not linger in the map.

Configuration

New P2P_PEER_BAN_DURATION_SECONDS (config field peerBanDurationSeconds), default 86400 (24h). Registered in foundation env vars and the P2P config mappings.

Tests

peer_scoring.test.ts covers the full lifecycle, asserting both score values and states:

  • ban floor held through banned → recovered-live-score → expiry transitions;
  • peerBanDurationSeconds drives the window (60s case);
  • the advisory regression: after decay cleans up the live-score entry, getScore still returns the -150 ban score (not 0), keeping the peer Banned;
  • a peer whose previous ban has expired can be re-banned;
  • pruneExpiredBans removes expired bans but keeps active ones.

Existing peer_manager and peer_scoring suites pass; the previously existing "returns to Healthy after improving score" assertion was updated to reflect the new intended behaviour (a banned peer stays banned for the full window).

Peer scores decay toward zero over time, so a peer that crossed the ban
threshold recovered to a healthy score within minutes, effectively making
bans toothless (see security advisory GHSA-h4vv-85x5-6hmh).

Persist a ban when a peer's score drops below the ban threshold, holding it
for a configurable duration (P2P_PEER_BAN_DURATION_SECONDS, default 24h):

- PeerScoring records {score, expiry} in an in-memory map (so getScore stays
  synchronous for the peer-manager hot paths) and writes through to a
  dedicated kv-store map for durability across restarts.
- While banned, getScore returns the persisted ban score regardless of decay,
  so the peer cannot recover out of the ban early. Once the ban expires it is
  removed and the live (decayed) score takes over, letting the peer recover.
- PeerScoring.new() restores active bans on startup, pruning expired ones in a
  single transaction.
- PeerManager.stop() flushes pending ban writes so they are durable on
  shutdown.
Update the peer-scoring docs to reflect that bans are now persisted for a
configurable duration rather than decaying away within minutes: clarify the
ban-floor behaviour, add a Ban Persistence section, fix the recovery example
that assumed app-score decay un-bans a peer, and note
P2P_PEER_BAN_DURATION_SECONDS.
Replace the hand-rolled promise-chain persistence queue with a SerialQueue
(created and started when a store is configured). enqueueBanPersistence now
puts onto the queue and flushBanPersistence awaits a syncPoint, matching the
SerialQueue pattern used by the tx pool.
Persisting a ban only mattered if the peer was actually kept out. Banned
peers were disconnected reactively each heartbeat, but the libp2p
connection gater (isNodeAllowedToConnect, used by denyInboundEncryptedConnection)
only checked failed auth handshakes, so a banned peer could reconnect
inbound between heartbeats and resume sending messages.

Reject peers with an active ban in isNodeAllowedToConnect so the gater
refuses the connection during the noise handshake, before any protocol
stream opens, for the full ban duration.
Introduce a single getActiveBanScore helper that returns a peer's persisted
ban score while the ban is active, or undefined otherwise, lazily lifting an
expired ban before returning undefined. getScore and maybeBanPeer now both go
through it.

This removes the duplicated ban-lookup/expiry logic and fixes an edge case:
maybeBanPeer previously checked bannedPeers.has(), so an expired-but-not-yet-
pruned record blocked a fresh ban. It now prunes the stale record and starts a
new window, so a peer that re-offends after its ban expired is re-banned.
…1157)

The ban threshold (MIN_SCORE_BEFORE_BAN) is a hardcoded constant. If it changes
across a software upgrade, a persisted ban score from the old threshold may no
longer cross the new one, which would otherwise pin the peer at a stale floor
for the rest of the ban window. restoreBannedPeers now drops (and prunes) any
ban whose score is no longer below MIN_SCORE_BEFORE_BAN, in addition to expired
bans.
…nts (A-1157)

isNodeAllowedToConnect took `string | PeerId` and was called with either a peer
id or an IP, conflating two keys with different rules (bans are peer-id only).
Replace it with isPeerAllowedToConnect (ban + failed-auth, used by the encrypted
inbound gater and dialing) and isAddressAllowedToConnect (failed-auth only, used
by the raw inbound gater), sharing a private isWithinFailedAuthLimit helper.
private bannedPeers: Map<string, BanRecord> = new Map();
/** The kv-store backing bans, kept so ban pruning can run in a single transaction. */
private readonly kvStore?: AztecAsyncKVStore;
/** Backing store for bans, so they survive restarts. */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is surviving restarts worth the hassle of a store and keeping things in sync? Assuming that restarts are not very common and that a previously banned peer would get itself banned again.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not, I did consider it solely being in memory. I added the persistence just for completeness as I agree, it seems like something that will rarely come in use.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer to remove the DB persistence code, just because (1) more code means more chances for mistakes (2) once code gets added, it rarely goes away even if it could.

But I'm approving since I don't feel strongly

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed the persistence. It's reduced the changeset significantly.

…157)

Adds the regression test recommended by GHSA-h4vv-85x5-6hmh: a peer banned via
penalties must not be silently restored to Healthy once decayAllScores removes
its decayed score entry. Fails against the pre-fix behaviour (peer reads back
Healthy after ~hours of idle decay); passes now the ban is persisted.
* lifted (removed in memory and in the store) before returning undefined, so callers never see a
* stale ban.
*/
private getActiveBanScore(peerId: string): number | undefined {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only point at which an expired ban is removed from the maps? Could the maps potentially grow indefinitely if we lose track of a peerId and never again ask getActiveBanScore on it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should now be fixed.

private bannedPeers: Map<string, BanRecord> = new Map();
/** The kv-store backing bans, kept so ban pruning can run in a single transaction. */
private readonly kvStore?: AztecAsyncKVStore;
/** Backing store for bans, so they survive restarts. */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer to remove the DB persistence code, just because (1) more code means more chances for mistakes (2) once code gets added, it rarely goes away even if it could.

But I'm approving since I don't feel strongly

expect(peerManager.isPeerAllowedToConnect(peerIdStr)).toBe(false);
expect(peerManager.isAddressAllowedToConnect(ipAddress)).toBe(false);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should/do we have a test for the case where the address is allowed to connect, but then we realize it's a known peer and we don't allow connection?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added one under 'allows the address but denies the banned peer id'.

Per review feedback: persisting bans across restarts isn't worth the extra
code and sync surface for something that will rarely matter — a still-bad
peer simply re-earns its ban. Remove the kv-store wiring, SerialQueue,
restoreBannedPeers/flushBanPersistence and the PeerScoring.new factory; bans
are now held in memory only and cleared on restart. The configurable ban
duration (P2P_PEER_BAN_DURATION_SECONDS, default 24h) and gater enforcement
are unchanged.

Also add a test for the two-stage connection gate: the raw-inbound gate
allows an address (no peer id to match a ban) while the encrypted-inbound
gate denies the banned peer id.
Expired bans were only pruned lazily when a peer's score was next queried, so
a banned peer that disconnects and is never queried again would linger in the
ban map. Add PeerScoring.pruneExpiredBans(), called from PeerManager.heartbeat
alongside decayAllScores, to drop elapsed bans and bound the map's size.

@fcarreiro fcarreiro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcarreiro fcarreiro merged commit 024cd37 into merge-train/spartan-v5 Jun 10, 2026
19 of 20 checks passed
@fcarreiro fcarreiro deleted the pw/ban-peers branch June 10, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants