Skip to content

fix(connector): auto-recover when HTTP listener dies#29

Closed
rkdrnf wants to merge 2 commits into
youngwoocho02:mainfrom
rkdrnf:feat/connector-failure-recovery
Closed

fix(connector): auto-recover when HTTP listener dies#29
rkdrnf wants to merge 2 commits into
youngwoocho02:mainfrom
rkdrnf:feat/connector-failure-recovery

Conversation

@rkdrnf
Copy link
Copy Markdown
Contributor

@rkdrnf rkdrnf commented May 2, 2026

Summary

The Unity-side HTTP listener has several dead-state scenarios that
currently require an Editor restart to recover from. This PR makes
the connector self-heal in all of them.

Failure scenarios fixed

  1. Zombie listener. Start() early-returns on s_Listener != null,
    so if ListenLoop exits without nulling the reference (e.g.
    exception path), every subsequent call is a no-op until Unity
    restarts.
  2. ListenLoop exits unexpectedly. Caught exceptions break out
    of the loop without clearing s_Listener, producing scenario 1.
  3. Initial port-in-use is fatal. If all 10 ports fail at startup,
    the connector logs an error and gives up forever.
  4. Hung command handler holds CommandRouter's semaphore. Because
    the lock is static readonly, restarting the HTTP server doesn't
    release it — the only recovery is restarting Unity.

Changes

  • HttpServer.IsRunning checks the listener actually listens.
    Start() tears down a stale reference before rebinding.
  • ListenLoop has a finally block that cleans up state if the loop
    exits without an explicit cancel.
  • ProcessQueue watchdog: when !IsRunning, calls Start()
    (rate-limited via AUTO_RESTART_INTERVAL = 1s). Failure logging is
    rate-limited via FAILURE_LOG_INTERVAL = 5s to avoid console spam.
  • CommandRouter.Dispatch captures the semaphore in a local so a
    future ResetLock() swap can't make in-flight calls double-release
    the new semaphore. ResetLock() is exposed for future recovery
    surfaces.
  • Heartbeat.MarkStopped() lets failure paths flag the heartbeat as
    dead so the CLI sees an honest "not responding" state.

Notes

  • ResetLock() has no caller in this PR. It's the recovery hook for a
    follow-up that adds a status window / CLI command to purge hung
    handlers without restarting Unity. Happy to drop it if you'd rather
    land the API alongside its caller.
  • A second PR will follow with port persistence (preferred-port file
    so the listener rebinds to the same port across restarts, keeping
    any running CLI client's instance reference valid).

Test plan

  • Open Unity, confirm connector starts and unity-cli status
    returns ready.
  • Force a listener crash (e.g. patch ListenLoop to throw) and
    verify [UnityCliConnector] ListenLoop exited unexpectedly log
    followed by an automatic restart within ~1s.
  • Hold port 8090..8099 from another process before opening Unity;
    verify rate-limited "no available port" warning every 5s and
    automatic recovery once a port frees up.
  • go test ./... passes (no Go-side changes).

rkdrnf added 2 commits May 2, 2026 12:36
The listener could enter several dead states that required an Editor
restart to recover from:

1. Zombie listener: Start() early-returned when s_Listener != null, so
   if the listener crashed without nulling the reference, every later
   call became a no-op.
2. ListenLoop exited via exception without clearing s_Listener, leaving
   the connector permanently dead until the next assembly reload.
3. Initial port-in-use was fatal — once all 10 ports failed at startup,
   nothing ever retried.
4. A hung command handler held CommandRouter's static semaphore
   forever; restarting the HTTP server didn't help because the lock
   lives in CommandRouter.

Changes:
- HttpServer.IsRunning checks the listener actually accepts traffic,
  and Start() tears down a stale reference before rebinding.
- ListenLoop has a finally block that nulls the listener and marks the
  heartbeat stopped if it exits unexpectedly.
- ProcessQueue acts as a watchdog: if the listener is down it calls
  Start() (rate-limited via AUTO_RESTART_INTERVAL), so transient port
  conflicts and silent crashes recover automatically. Failure logging
  is rate-limited to avoid console spam.
- CommandRouter.Dispatch captures the semaphore locally so a
  ResetLock() swap can't make an in-flight call double-release the new
  semaphore. ResetLock() exposes a recovery hook for future UI/CLI
  surfaces that need to clear a hung handler without restarting Unity.
- Heartbeat.MarkStopped() lets failure paths flag the heartbeat file
  as dead so the CLI sees an honest "not responding" state.
MarkStopped() set s_ForcedState = "stopped" and wrote, but the next
Tick (~500ms later) cleared s_ForcedState and wrote a live snapshot
because Tick only guarded on Port == 0 — a port set during the
last successful bind survives a listener crash. Result: the
"stopped" state existed for one heartbeat interval, then got
overwritten with state="ready" while the listener was actually dead.

Switch the guard to !HttpServer.IsRunning so the heartbeat goes
silent when the listener is down, letting MarkStopped's write
persist until the watchdog restarts it.
@youngwoocho02
Copy link
Copy Markdown
Owner

youngwoocho02 commented May 3, 2026

상세한 리포트와 PR 감사합니다.

이 PR을 그대로 머지하지는 않고, 문제의 핵심만 main에 직접 반영했습니다. 리스너가 죽었는데 s_Listener 참조만 남아 있어서 서버가 살아 있다고 오판하던 문제는 v0.3.18로 릴리즈했습니다. 이제는 s_Listener.IsListening까지 확인해서 죽은 리스너 참조가 남아 있으면 정리 후 다시 시작합니다. 예상치 못한 리스너 종료 시에는 heartbeat를 stopped로 표시하고, 재시작 예약은 에디터 메인 루프에서 처리합니다.

CommandRouter semaphore reset / hung handler 복구 훅은 이번에는 포함하지 않았습니다. 이 부분은 조금 더 고민이 필요합니다. 이 프로젝트는 커넥터를 단순하게 유지하는 방향이고, 해당 훅은 복구 경로로 동작할 수는 있지만 상태와 복잡도도 같이 늘어납니다. 서버 복구 쪽에도 이미 여러 처리가 붙고 있어서, 재시작 처리는 우선 제한적인 형태로만 반영했습니다.

port persistence도 이번에는 포함하지 않았습니다. 명시적 포트 지정 자체를 계속 유지할지 고민 중입니다. 포트를 기억하거나 고정하기보다, 사용자는 프로젝트만 지정하고 CLI가 해당 프로젝트의 현재 살아 있는 커넥터를 찾는 방향도 검토하고 있습니다.

@rkdrnf
Copy link
Copy Markdown
Contributor Author

rkdrnf commented May 5, 2026

감사합니다. 이 PR은 닫도록 하겠습니다.

@rkdrnf rkdrnf closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants