fix: retry on first connection failure and on failed IRC registration#413
Open
mbologna wants to merge 1 commit intokiwiirc:masterfrom
Open
fix: retry on first connection failure and on failed IRC registration#413mbologna wants to merge 1 commit intokiwiirc:masterfrom
mbologna wants to merge 1 commit intokiwiirc:masterfrom
Conversation
Root cause: the socketClose reconnect gate checks reconnect_attempts > 0 && reconnect_attempts < max_retries But reconnect_attempts is initialised to 0, so the first connection failure always falls through to "give up" — the condition is falsy. Additionally, reconnect_attempts is reset in socketOpen (raw TCP connect), not in registeredSuccessfully. A connection that got a TCP socket but failed IRC registration (auth timeout, ECONNRESET during registration, server-side kill) would reset the counter immediately on the next attempt, exhausting retries before IRC registration ever succeeds. Fix (proposed by mornfall in kiwiirc#211, validated with ~100 users for 2+ years without issues): 1. Initialise reconnect_attempts to 1 so the first failure triggers the retry branch. 2. Remove the premature reset from socketOpen (TCP connected ≠ IRC registered). 3. Reset reconnect_attempts to 1 in registeredSuccessfully so the counter starts fresh after a fully successful registration. Fixes kiwiirc#211
|
I've been running this fix on my instance for years now without issues and the handling of reconnects is much better this way. it's like it should've been from the beginning. so let's add it then. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When an IRC connection fails for the first time (DNS not yet ready, transient network error, ECONNRESET during registration), The Lounge shows:
Even though
auto_reconnectis enabled andauto_reconnect_max_retriesis set to a high value (e.g. 30 in The Lounge). This has been reported in several places:Root cause
The reconnect decision in
socketCloseis:Bug 1 — initial value:
reconnect_attemptsis initialised to0. Since0is falsy in JavaScript, the first branch never fires on a first-ever connection failure.was_connectedis also false, so the second branch doesn't fire either. Result: the very first failure gives up immediately, no matter how highmax_retriesis.Bug 2 — premature reset:
reconnect_attemptsis reset to0insidesocketOpen(raw TCP connect). This means a connection that reached the TCP layer but failed IRC registration (auth timeout, server-sideECONNRESETduring NICK/USER, netsplit during connect) will reset the counter on the next attempt's TCP connect, effectively giving only one IRC-level retry before exhausting the counter.Both bugs can be observed in the log pattern from #4103:
Fix
Three minimal changes, first proposed by @mornfall in this issue:
Initialise
reconnect_attemptsto1— the first failure now satisfiesreconnect_attempts > 0 && reconnect_attempts < max_retries, triggering a retry.Remove the reset from
socketOpen— a raw TCP connection is not a successful IRC session. The counter should not reset until IRC registration completes.Reset to
1inregisteredSuccessfully— after a fully successful registration, the counter resets so the next connection problem gets a fresh set of retries.Validation
@mornfall has been running this exact fix in production for 2+ years with ~100 users without issues (last comment in this thread). The fix has also been independently validated by @tokariu.
Diff