Skip to content

Transient reconnect leaves thread stuck in error and can orphan pending user turns #313

@johntheyoung

Description

@johntheyoung

Summary

On Linux/Wayland, I hit a case where a transient Reconnecting... 1/5 provider event was persisted as a sticky thread error, even though the Codex session recovered and continued producing assistant/tool activity normally.

In the same thread, a follow-up user message also left an orphan pending turn, which made the thread feel unresponsive in the UI.

Reproduction / Observed Behavior

I do not have a minimal deterministic repro yet, but this is the sequence I observed on March 7, 2026:

  1. A Codex-backed thread was running normally.
  2. T3 showed a red Reconnecting... 1/5 banner.
  3. The same thread then continued producing reasoning updates, assistant messages, and tool activity for ~20+ more minutes.
  4. T3 continued to show the thread as errored / unhealthy.
  5. A later user message (go) did not become the active turn cleanly, and the thread felt partially stuck / unresponsive.

Expected

  • A transient reconnect should not leave the thread in a sticky error state if provider activity resumes.
  • Once the provider resumes streaming normal events, the thread session should move back to running or ready.
  • A later user message should either become a real turn or fail cleanly, not remain as an orphan pending turn.

Actual

  • Reconnecting... 1/5 was persisted as a hard thread session error.
  • The provider recovered and continued sending normal activity, but the session error was never cleared.
  • A later user message created a pending turn entry that was never promoted/cleared, while the provider kept streaming on the previous turn.

Evidence

For the affected thread:

  • At 2026-03-07T07:23:49.663Z, T3 persisted:

    • thread.session-set
    • status = error
    • lastError = "Reconnecting... 1/5"
  • But later in the same thread, normal activity continued up to at least 2026-03-07T07:46:23.391Z, including:

    • thread.message-sent
    • thread.activity-appended
    • thread.turn-diff-completed
  • A later user message at 2026-03-07T07:43:22.399Z created:

    • thread.turn-start-requested
    • a pending turn row with turn_id = NULL
  • That pending row never became a real turn, while subsequent assistant/tool activity still attached to the prior turn.

Suspected Cause

From local inspection, this looks like two related state-management problems:

  1. runtime.error is converted into a hard thread.session.set(status="error"), but subsequent normal provider activity does not necessarily clear that state.
  2. thread.turn-start-requested creates a pending turn placeholder, but if no later session update arrives with a new active turn ID, that pending row can be left orphaned.

The result is:

  • the thread stays visually errored after recovery
  • a later user message can appear queued/stuck even though the provider is still alive

Local Workaround

I was able to recover locally by:

  • clearing the stale thread session error
  • deleting the orphan pending turn row

I also confirmed a local code workaround by clearing reconnect errors once fresh provider activity resumes (for example after task.progress / tool events), which prevented the sticky error state from persisting.

Environment

  • Arch Linux
  • Hyprland
  • Wayland
  • T3 Code 0.0.3
  • installed via AUR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions