fix(connector): auto-recover when HTTP listener dies#29
Closed
rkdrnf wants to merge 2 commits into
Closed
Conversation
The listener could enter several dead states that required an Editor restart to recover from: 1. Zombie listener: Start() early-returned when s_Listener != null, so if the listener crashed without nulling the reference, every later call became a no-op. 2. ListenLoop exited via exception without clearing s_Listener, leaving the connector permanently dead until the next assembly reload. 3. Initial port-in-use was fatal — once all 10 ports failed at startup, nothing ever retried. 4. A hung command handler held CommandRouter's static semaphore forever; restarting the HTTP server didn't help because the lock lives in CommandRouter. Changes: - HttpServer.IsRunning checks the listener actually accepts traffic, and Start() tears down a stale reference before rebinding. - ListenLoop has a finally block that nulls the listener and marks the heartbeat stopped if it exits unexpectedly. - ProcessQueue acts as a watchdog: if the listener is down it calls Start() (rate-limited via AUTO_RESTART_INTERVAL), so transient port conflicts and silent crashes recover automatically. Failure logging is rate-limited to avoid console spam. - CommandRouter.Dispatch captures the semaphore locally so a ResetLock() swap can't make an in-flight call double-release the new semaphore. ResetLock() exposes a recovery hook for future UI/CLI surfaces that need to clear a hung handler without restarting Unity. - Heartbeat.MarkStopped() lets failure paths flag the heartbeat file as dead so the CLI sees an honest "not responding" state.
MarkStopped() set s_ForcedState = "stopped" and wrote, but the next Tick (~500ms later) cleared s_ForcedState and wrote a live snapshot because Tick only guarded on Port == 0 — a port set during the last successful bind survives a listener crash. Result: the "stopped" state existed for one heartbeat interval, then got overwritten with state="ready" while the listener was actually dead. Switch the guard to !HttpServer.IsRunning so the heartbeat goes silent when the listener is down, letting MarkStopped's write persist until the watchdog restarts it.
7 tasks
Owner
|
상세한 리포트와 PR 감사합니다. 이 PR을 그대로 머지하지는 않고, 문제의 핵심만
port persistence도 이번에는 포함하지 않았습니다. 명시적 포트 지정 자체를 계속 유지할지 고민 중입니다. 포트를 기억하거나 고정하기보다, 사용자는 프로젝트만 지정하고 CLI가 해당 프로젝트의 현재 살아 있는 커넥터를 찾는 방향도 검토하고 있습니다. |
Contributor
Author
|
감사합니다. 이 PR은 닫도록 하겠습니다. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Unity-side HTTP listener has several dead-state scenarios that
currently require an Editor restart to recover from. This PR makes
the connector self-heal in all of them.
Failure scenarios fixed
Start()early-returns ons_Listener != null,so if
ListenLoopexits without nulling the reference (e.g.exception path), every subsequent call is a no-op until Unity
restarts.
ListenLoopexits unexpectedly. Caught exceptionsbreakoutof the loop without clearing
s_Listener, producing scenario 1.the connector logs an error and gives up forever.
CommandRouter's semaphore. Becausethe lock is
static readonly, restarting the HTTP server doesn'trelease it — the only recovery is restarting Unity.
Changes
HttpServer.IsRunningchecks the listener actually listens.Start()tears down a stale reference before rebinding.ListenLoophas afinallyblock that cleans up state if the loopexits without an explicit cancel.
ProcessQueuewatchdog: when!IsRunning, callsStart()(rate-limited via
AUTO_RESTART_INTERVAL = 1s). Failure logging israte-limited via
FAILURE_LOG_INTERVAL = 5sto avoid console spam.CommandRouter.Dispatchcaptures the semaphore in a local so afuture
ResetLock()swap can't make in-flight calls double-releasethe new semaphore.
ResetLock()is exposed for future recoverysurfaces.
Heartbeat.MarkStopped()lets failure paths flag the heartbeat asdead so the CLI sees an honest "not responding" state.
Notes
ResetLock()has no caller in this PR. It's the recovery hook for afollow-up that adds a status window / CLI command to purge hung
handlers without restarting Unity. Happy to drop it if you'd rather
land the API alongside its caller.
so the listener rebinds to the same port across restarts, keeping
any running CLI client's instance reference valid).
Test plan
unity-cli statusreturns
ready.ListenLoopto throw) andverify
[UnityCliConnector] ListenLoop exited unexpectedlylogfollowed by an automatic restart within ~1s.
verify rate-limited "no available port" warning every 5s and
automatic recovery once a port frees up.
go test ./...passes (no Go-side changes).