Transient reconnect leaves thread stuck in error and can orphan pending user turns

## Summary

On Linux/Wayland, I hit a case where a transient `Reconnecting... 1/5` provider event was persisted as a sticky thread error, even though the Codex session recovered and continued producing assistant/tool activity normally.

In the same thread, a follow-up user message also left an orphan `pending` turn, which made the thread feel unresponsive in the UI.

## Reproduction / Observed Behavior

I do not have a minimal deterministic repro yet, but this is the sequence I observed on March 7, 2026:

1. A Codex-backed thread was running normally.
2. T3 showed a red `Reconnecting... 1/5` banner.
3. The same thread then continued producing reasoning updates, assistant messages, and tool activity for ~20+ more minutes.
4. T3 continued to show the thread as errored / unhealthy.
5. A later user message (`go`) did not become the active turn cleanly, and the thread felt partially stuck / unresponsive.

## Expected

- A transient reconnect should not leave the thread in a sticky `error` state if provider activity resumes.
- Once the provider resumes streaming normal events, the thread session should move back to `running` or `ready`.
- A later user message should either become a real turn or fail cleanly, not remain as an orphan pending turn.

## Actual

- `Reconnecting... 1/5` was persisted as a hard thread session error.
- The provider recovered and continued sending normal activity, but the session error was never cleared.
- A later user message created a `pending` turn entry that was never promoted/cleared, while the provider kept streaming on the previous turn.

## Evidence

For the affected thread:

- At `2026-03-07T07:23:49.663Z`, T3 persisted:
  - `thread.session-set`
  - `status = error`
  - `lastError = "Reconnecting... 1/5"`

- But later in the same thread, normal activity continued up to at least `2026-03-07T07:46:23.391Z`, including:
  - `thread.message-sent`
  - `thread.activity-appended`
  - `thread.turn-diff-completed`

- A later user message at `2026-03-07T07:43:22.399Z` created:
  - `thread.turn-start-requested`
  - a `pending` turn row with `turn_id = NULL`

- That pending row never became a real turn, while subsequent assistant/tool activity still attached to the prior turn.

## Suspected Cause

From local inspection, this looks like two related state-management problems:

1. `runtime.error` is converted into a hard `thread.session.set(status="error")`, but subsequent normal provider activity does not necessarily clear that state.
2. `thread.turn-start-requested` creates a pending turn placeholder, but if no later session update arrives with a new active turn ID, that pending row can be left orphaned.

The result is:

- the thread stays visually errored after recovery
- a later user message can appear queued/stuck even though the provider is still alive

## Local Workaround

I was able to recover locally by:

- clearing the stale thread session error
- deleting the orphan pending turn row

I also confirmed a local code workaround by clearing reconnect errors once fresh provider activity resumes (for example after `task.progress` / tool events), which prevented the sticky error state from persisting.

## Environment

- Arch Linux
- Hyprland
- Wayland
- T3 Code `0.0.3`
- installed via AUR



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transient reconnect leaves thread stuck in error and can orphan pending user turns #313

Summary

Reproduction / Observed Behavior

Expected

Actual

Evidence

Suspected Cause

Local Workaround

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Transient reconnect leaves thread stuck in error and can orphan pending user turns #313

Description

Summary

Reproduction / Observed Behavior

Expected

Actual

Evidence

Suspected Cause

Local Workaround

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions