🤖 feat: add OpenAI WebSocket transport opt-in#3241
Conversation
Document the shared glossary and PRD for adding an opt-in OpenAI WebSocket transport setting.
---
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `high` • Cost: `114722{MUX_COSTS_USD-0}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=high costs=7.29 -->
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b496deeb9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c121a726c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed Codex docs feedback by removing the root-level |
|
/coder-agents-review |
|
@codex review |
|
Pushed the docs cleanup in 37db4a4 (root /coder-agents-review |
|
@codex review |
1 similar comment
|
@codex review |
|
/coder-agents-review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37db4a4259
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
Solid architecture. The cleanup-symbol approach is the right design for attaching transport lifecycle to provider-created models without changing the factory API. The pre-filter that routes non-eligible requests to baseFetch before the upstream package can fall through to globalThis.fetch is correct and well-tested. moveLanguageModelCleanup after wrapLanguageModel is the right placement for DevTools compatibility.
1 P1, 4 P2, 8 P3 findings. The P1 is a broken test suite (10 pre-existing tests fail). The P2s are: proxy bypass on custom base URLs, missing error/cancellation cleanup tests (plan quality gate), missing failure/retry title cleanup tests (plan quality gate), and missing dogfooding evidence.
"The question is never 'is this worth fixing now vs. later?' The question is 'is this worth fixing vs. never?'" — Meruem
Process note: the PR description claims dogfooding with screenshots and live verification, but no artifacts are attached. The implementation plan required screenshots for four UI states and a video recording.
src/node/services/tools/advisor.ts:183
P3 [DEREM-7] The advisor creates a model via runtime.createModel(advisorModelString) but never calls runLanguageModelCleanup. When webSocketTransportEnabled: true and wire format is Responses, the factory attaches a cleanup obligation to the model that is never fulfilled.
Currently safe: generateText sends a non-streaming POST (stream: true absent), so isStreamingResponsesRequest returns false and baseFetch handles it. No WebSocket opens. The risk: if the SDK changes doGenerate to stream internally, or if the advisor switches to streamText, the socket opens and cleanup is never called.
The PR established a cleanup contract and audited two call sites (streamManager, workspaceTitleGenerator) but missed this one.
Fix: wrap the generateText block in try/finally { runLanguageModelCleanup(model) }. The call is idempotent and safe even when no WebSocket was connected. (Knov P2, Melody P3, Killua Note, Zoro Note)
🤖
🤖 This review was automatically generated with Coder Agents.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9567fedc8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Review blocked: 15 of 16 open findings from Round 1 have no author response or code change.
DEREM-10 (Codex OAuth + WebSocket) was addressed in 1c121a7. The remaining 15 findings are silent:
- DEREM-1 (P1):
runLanguageModelCleanup(streamInfo.request.model)breaks 10 tests - DEREM-2 (P2): WebSocket activates for custom base URLs, routing to hardcoded
wss://api.openai.com - DEREM-3 (P2): Stream cleanup error/cancellation paths untested
- DEREM-4 (P2): Title generator failure/retry cleanup paths untested
- DEREM-24 (P2): Dogfooding claims without evidence
- DEREM-5 (P3): UI copy missing "Unsupported endpoints may fail."
- DEREM-6 (P3):
endsWith("/responses")diverges from sibling regex - DEREM-7 (P3): Advisor missing
runLanguageModelCleanup - DEREM-8 (P3):
attachLanguageModelCleanupmissing double-attach assertion - DEREM-9 (P3): POST
/responseswithstream: falseuntested - DEREM-11 (P3): DevTools wrapping path for cleanup not tested
- DEREM-12 (P3): WebSocket leak on cancellation during handshake
- DEREM-14 (Nit):
openAIWireFormatscoped to per-provider loop - DEREM-15 (Nit):
attachLanguageModelCleanupreturn type should be void - DEREM-23 (Note):
Reflect.getprivate access in cleanup test
Further panel review is blocked until the author responds to or pushes fixes for the open findings. The P1 (DEREM-1) is the most urgent: 10 tests fail on the current head.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33d3cba833
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Review blocked (Round 3): all 15 open findings from Round 1 remain without author response.
The only commit since Round 2 (33d3cba, "Strip DevTools headers for OpenAI WebSocket transport") is a minor test fix unrelated to the open findings.
Priority items needing attention:
- DEREM-1 (P1): 10 tests broken by
streamInfo.request.modelaccess. This blocks CI. - DEREM-2 (P2): Custom base URL proxy bypass. One-line guard.
- DEREM-3 (P2): Stream cleanup error/cancellation tests missing.
- DEREM-4 (P2): Title generator failure/retry tests missing.
- DEREM-24 (P2): Dogfooding evidence missing.
The panel will review once the author pushes fixes or responds to the findings.
🤖 This review was automatically generated with Coder Agents.
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Review blocked (Round 8): DEREM-34 fixed (close retry now guarded). 5 P3 findings remain without substantive response.
Note: the author-agent appears to have posted replies to the wrong threads. Each reply addresses a different finding:
- DEREM-29 thread (Codex OAuth) received DEREM-30's response (Chat Completions hiding)
- DEREM-30 thread received a hiding-as-fix response that doesn't engage with the disable-vs-hide distinction
- DEREM-31 thread (toggle disable test) received DEREM-32's response (StreamManager tests)
- DEREM-32 thread (multi-step test) received DEREM-33's response (title generator tests)
- DEREM-33 thread (single-candidate test) received DEREM-34's response (race-recovery close)
None of these 5 threads have a response that engages with the actual finding. All are P3 quality gaps, not correctness bugs. The core feature is solid. A human decision on whether these are worth addressing would unblock the review.
🤖 This review was automatically generated with Coder Agents.
|
Addressed the remaining coder-agents-review Round 8 items in
Validation run after these fixes:
|
|
/coder-agents-review |
|
@codex review |
|
Pushed a static-check fix for the multi-step cleanup test async generator (
/coder-agents-review |
|
@codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
All findings resolved. 34 findings tracked across 9 rounds; 27 fixed, 2 accepted, 1 contested and closed by panel vote (3/3), 7 dropped by orchestrator.
The cleanup-symbol architecture is structurally sound. Lazy WebSocket creation elegantly eliminates the class of bugs where non-streaming callers hold unclosed resources. The pre-filter preserves Mux's fetch chain for all non-eligible requests. UI and factory gating conditions are consistent across all ineligibility scenarios (Chat Completions, custom base URL, Codex OAuth). All plan quality gates are now covered by tests: completion, error, cancellation, multi-step, multi-candidate retry, and pre-registration failure.
"The WebSocket fetch is only created on the first eligible streaming Responses request, not at factory time. This means non-streaming callers never open a socket. The closeRequested flag plus post-fetch retry close handles the cancellation race without leaking connections or masking responses." — Kite
🤖 This review was automatically generated with Coder Agents.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
The OpenAI Responses WebSocket transport (added in #3241) attaches a `webSocketTransport.close` cleanup hook to every model returned by `providerModelFactory`. `workspaceTitleGenerator` already drains it via `runLanguageModelCleanup` in its finally block, but the new periodic `AgentStatusService` path was leaking transports for every successful or failed candidate, every tick, every workspace. Mirror the title-generator pattern with a finally block so cleanup runs whether the candidate returns a result, throws, or falls through to the next retry.
The OpenAI Responses WebSocket transport (added in #3241) attaches a `webSocketTransport.close` cleanup hook to every model returned by `providerModelFactory`. `workspaceTitleGenerator` already drains it via `runLanguageModelCleanup` in its finally block, but the new periodic `AgentStatusService` path was leaking transports for every successful or failed candidate, every tick, every workspace. Mirror the title-generator pattern with a finally block so cleanup runs whether the candidate returns a result, throws, or falls through to the next retry.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
moveLanguageModelCleanup and runLanguageModelCleanup both implemented the same single-shot "read attached cleanup, delete the slot, return it" sequence inline (the same as LanguageModelWithCleanup cast + symbol read + delete). The new file from #3241 introduced both call sites with the duplication baked in. Extract a private detachLanguageModelCleanup() helper so the two surviving public functions read as their intent (move = re-attach to target, run = invoke + swallow) instead of repeating the slot-management plumbing. The behaviour is identical: detach is the only path that mutates the slot, callers take the same return-on-undefined branch they did before, and runLanguageModelCleanup keeps its existing try/catch around the invocation. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test languageModelCleanup suite are all unchanged.
## Summary Long-lived auto-cleanup PR that accumulates low-risk, behavior-preserving refactors picked from recent `main` commits. ## Changes ### Collapse `getCommandGhostHint` experiment gates into a table The slash-command goal feature added in coder#3235 introduced a third experiment-gating block to `getCommandGhostHint` (`src/browser/utils/slashCommands/registry.ts`), structurally identical to the existing `heartbeat` block: each matched a specific `commandKey`, wrapped `isExperimentEnabled(<flag>)` in a `try/catch`, and returned `null` on either disabled-experiment or thrown-check. The two blocks already drifted — the goal branch's `catch` had no comment, the heartbeat branch's had one explaining that hiding on a thrown check is the safer default — and a fourth gated command would have copy-pasted yet another nine-line `if (commandKey === "...") { try { ... } catch { ... } }` block. Hoist the (commandKey → ExperimentId) mapping into a `COMMAND_GHOST_HINT_EXPERIMENT_GATES` table and collapse both `if`-blocks into a single lookup + `try/catch`. Non-gated commands (`/compact`, `/model`, etc.) still skip the gate check identically because the lookup returns `undefined`, so the surrounding `null != undefined` guard short-circuits exactly the way the absent `if (commandKey === ...)` branches did. Adding a fourth gated command becomes a one-line table entry, with the safer "hide on thrown check" comment captured once on the shared `catch` instead of one branch's `catch` (and the other's accidentally missing it). Pure refactor — emitted JS, the gate semantics for `/goal` and `/heartbeat` (disabled experiment hides the ghost-hint; thrown experiment check also hides it, per the heartbeat branch's pre-existing comment), the fall-through `SLASH_COMMAND_DEFINITION_MAP.get(commandKey)?.inputHint ?? null` lookup for non-gated commands, and the existing 10-test `ghostHint` suite (plus the 136-test full `slashCommands` suite) are all unchanged. <details> <summary>Previous cleanups</summary> ### Drop dead `?? []` fallback for `sanitizeSchema.tagNames` The JSX-tag preservation plugin added in coder#3256 built an `ALLOWED_RAW_HTML_TAG_NAMES` Set by lowercasing `sanitizeSchema.tagNames`, guarded with a `(sanitizeSchema.tagNames ?? []).map(...)` fallback. `sanitizeSchema` is constructed locally in the same module via `{ ...defaultSchema, tagNames: [...] }` with a concrete array literal — after the spread/override, `tagNames` is unconditionally a non-empty `string[]`, so the outer `?? []` fallback can never fire and only serves to obscure that invariant from future readers. Drop the outer fallback and add a comment capturing the rationale (the inner `...(defaultSchema.tagNames ?? [])` spread still guards the library type, which is genuinely optional). A future refactor that touches the schema construction now has to spell out any change that could legitimately make `tagNames` undefined, instead of inheriting the dead guard by copy-paste. Pure refactor — emitted JS, the `rehypePreserveUnknownRawHtml` plugin's behavior (parsed via the same lowercased Set), the unknown-tag-as-text preservation semantics added in coder#3256, and the existing `MarkdownRenderer.raw-html` test suite are all unchanged. ### Extract `readInlineHeightPx` helper in `useAutoResizeTextarea` The chat input auto-resize stabilization in coder#3263 added a second `Number.parseFloat(el.style.height)` + `Number.isFinite(...)` pair to verify that the cached pixel height still reflects the textarea's inline style, sitting a handful of lines from the existing pair in the `canOnlyGrow` first-render fallback. Both spell out the same intent — read the textarea's inline `height` style as a finite px number, or treat it as missing — but as two independent expressions they were one stray edit away from drifting on what counts as a usable inline height (e.g. how to handle an empty string, a non-px unit, or the `NaN` that `parseFloat` returns for either). Extract a private `readInlineHeightPx(el): number | null` helper that captures the parseFloat + isFinite pair in one place, with the rationale captured on the helper itself so a future caller (e.g. another resize-coordination path that also needs to read the inline height) doesn't reintroduce the duplicated check. The two call sites collapse to a single named call, and the surviving branches replace `Number.isFinite(appliedHeight)` / `Number.isFinite(currentInlineHeight)` with `!== null` checks (functionally identical because the helper already filters non-finite values, plus TypeScript narrows `number | null` cleanly without a separate type guard). Pure refactor — emitted writes, the `canOnlyGrow` grow-only semantics, the post-resize "restore the px height after `auto` writes or external clears" verification added in coder#3263, and the existing 5-test `useAutoResizeTextarea` suite (including the regression cases for capped large-text deletion and external inline-height clears added in coder#3263) are all unchanged. ### Extract `settleOnTranscript` helper in `agentStatusService` The new `AgentStatusService.runForWorkspace` (added by the AI-generated sidebar status loop in coder#3238) had the same two-line pair — `markRecencyObserved()` to consume any observed recency bump, then `state.lastInputHash = inputHash` to advance the dedup hash — repeated across the three branches that produce a definitive outcome for the current transcript: post-provider failure (model refused tool / rate limit / persistent provider error), placeholder rejection (defense-in-depth filter for generic "Awaiting next task" output), and successful persist. The other three branches (empty workspace, idle/frozen dedup hit, pre-provider auth/config failure) intentionally call only `markRecencyObserved()` because they should still retry against the same transcript when conditions change. Hoist the duplicated pair into a `settleOnTranscript()` closure declared next to the existing `markRecencyObserved` closure inside `runForWorkspace`, with a comment capturing why the three settlement branches advance the dedup hash and the three retry branches don't, so a future reader doesn't reintroduce the duplicated pair (or accidentally swap one for the other). The three settlement call sites collapse to a single named call, and the three retry call sites keep their bare `markRecencyObserved()` shape. Pure refactor — emitted JS, the six branch behaviors (settle vs retry), the dedup-hash semantics documented on the `lastInputHash` field, and the existing 25-test `agentStatusService` suite (including the post-provider / placeholder / success / pre-provider / pre-provider-retry / dedup regression cases from coder#3238) are all unchanged. ### Collapse `ReviewPanel` selection-validity branches `ReviewPanel.tsx`'s post-filter selection-validity effect (touched by the immersive sidebar fix in coder#3249) split into two structurally identical arms: an `isImmersive` branch that validated `selectedHunkId` against `hunks` (the unfiltered diff) and a non-immersive branch that validated against `filteredHunks`. Both arms then ran the same `setSelectedHunkId(filteredHunks[0].id)` reset on miss, with the immersive arm needing its own early `return` to avoid double-running the reset. The duplication invited drift between the two modes — e.g. a future hide-read-style filter that adjusts the immersive reset would have to keep both arms in lockstep by hand. Pick the validity list up front (`const validityList = isImmersive ? hunks : filteredHunks;`) and run a single `selectedHunkId && validityList.some(...)` check, with a short comment capturing the rationale so a future reader doesn't reintroduce the split. The early-return on `filteredHunks.length === 0`, the dependency list (`[filteredHunks, hunks, isImmersive, selectedHunkId, setSelectedHunkId]`), and the immersive-aware "only reset when the hunk has truly disappeared from the diff" semantics added in coder#3249 are all preserved. Pure refactor — emitted JS, the selection-reset trigger conditions, and the existing 5-test `ImmersiveReviewView` suite (including the two immersive regression tests added in coder#3249) are all unchanged. ### Extract `renameAliasField` helper for bash tool preprocess The bash tool's `z.preprocess` shim (in `src/common/utils/tools/toolDefinitions.ts`) normalizes quirky model emissions to canonical field names. After the DeepSeek v4 fix in coder#3247 added a second `description` → `display_name` rename block (mirroring the existing `command` → `script` rename), the two blocks were structurally identical: skip when canonical is already a string, drop the alias via destructure, re-spread with the canonical name. Each alias still required its own `as Record<string, unknown> & { <alias>: string }` cast plus the same three-line destructure/spread. Extract a private `renameAliasField(obj, alias, canonical)` helper that captures the rename pattern in one place, with the rationale for why aliases exist (and why they stay undocumented) noted on the helper. The call site collapses to two named lines that read as the intent (`rename command to script`, `rename description to display_name`) instead of fifteen lines of mostly-identical destructure/spread. A future alias (e.g. another quirky-model field) becomes a one-line addition. Pure refactor — emitted JS, the canonical-field-wins precedence, the "no-op when neither alias nor canonical is a string" branches, and the existing 36-test `toolDefinitions` suite (including the four `command`/`description` alias precedence cases added in coder#3247) are all unchanged. ### Derive `TokenRecord` from `BrowserBridgeTokenPayload` `BrowserBridgeTokenManager` (in `src/node/services/browser/BrowserBridgeTokenManager.ts`, touched by the new other-workspace session picker in coder#3243) declared two structurally identical interfaces side by side: a private `TokenRecord` (the stored token state) and an exported `BrowserBridgeTokenPayload` (the `validate()` return shape), with `TokenRecord` differing only by the extra `expiresAtMs: number` deadline. `validate()` then rebuilt the payload by listing each of the four shared fields a third time. The coder#3243 commit made this drift surface especially visible — adding `allowOtherWorkspaceSession` required parallel edits in `TokenRecord`, `BrowserBridgeTokenPayload`, the `mint()` insert, and the `validate()` rebuild. Collapse `TokenRecord` to `extends BrowserBridgeTokenPayload` (with the rationale captured in a short comment so a future reader doesn't reintroduce the duplicated field list) and rewrite the `validate()` rebuild as a rest-spread destructure (`const { expiresAtMs, ...payload } = record; return payload;`). The eslint config already enables `ignoreRestSiblings: true`, so the unused `expiresAtMs` binding doesn't trigger `no-unused-vars`. A future payload field (e.g. another scoping flag) now lives in exactly one place — the exported `BrowserBridgeTokenPayload` interface — and automatically flows through both the stored record shape and the validate-time rebuild. Pure refactor — emitted JS, the TTL semantics, the `null`-on-expired path, and the existing 5-test `BrowserBridgeTokenManager` suite (including the new "preserves explicit other-workspace session scope" case) are all unchanged. ### Extract `detachLanguageModelCleanup` helper in `languageModelCleanup` `moveLanguageModelCleanup` and `runLanguageModelCleanup` (in the new `src/node/services/languageModelCleanup.ts` from the OpenAI WebSocket transport opt-in PR coder#3241) both implemented the same "look up the attached cleanup, delete the symbol slot, return it" sequence inline. The duplication is structurally identical — same `LanguageModelWithCleanup` cast, same symbol read, same `delete` — so both call sites were one stray edit away from drifting on whether the slot gets cleared before or after the cleanup runs (which matters because `attachLanguageModelCleanup` asserts the slot is empty). Extract a private `detachLanguageModelCleanup` helper that does the single-shot pop in one place and returns the cleanup (or `undefined`). Both surviving public functions reduce to their intent: `move` re-attaches the popped cleanup to the target, and `run` invokes it inside the existing `try/catch` that swallows + logs failures. A short comment captures the rationale on the helper itself so the next caller (e.g. a future "cancel without invoking" path) can't accidentally leak a slot. Pure refactor — emitted JS, the symbol's single-shot semantics, and the existing 6-test `languageModelCleanup` suite are all unchanged. ### Extract `isAgentTaskActiveStatus` predicate in `task_await` Three call sites in `src/node/services/tools/task_await.ts` gated on the active agent-task subset (`"queued" | "running" | "awaiting_report"`) by inlining the three-arm equality check. The new task_await elapsed-timing commit (coder#3234) extended both the `timeoutMs === 0` branch and the `timed out` catch-branch symmetrically with `...getAgentTaskElapsedField(taskId)`, making the two structurally identical `{ status, taskId, ...elapsed }` returns sit a handful of lines from a third copy in the `ForegroundWaitBackgroundedError` branch that picks between the same three values to coalesce against a `"running"` fallback. Add a local `AgentTaskActiveStatus` subset alias plus an `isAgentTaskActiveStatus` type predicate near the existing `coerceTimeoutMs` / `parseTimestampMs` helpers, and collapse all three inline checks to a single call. The predicate narrows the nullable `AgentTaskStatus | null` return of `getAgentTaskStatus` to the active subset, so the surviving `{ status, taskId, ... }` returns keep their narrowed `status` field with no `as const` gymnastics. The new comment captures the rationale so a future field added to one of the active-status branches won't silently drift away from the others. Pure refactor — emitted return shapes (including `elapsed_ms`), narrowed `status` literals, and the existing 17-test `task_await` suite are all unchanged. ### Drop dead length guard in `parseBedrockModelName` `secondPart` The early `dotParts.length < 2` return at the top of `parseBedrockModelName` (in `src/common/utils/ai/modelDisplay.ts`) already guarantees `dotParts.length >= 2` by the time `secondPart` is computed, which makes the `dotParts.length > 1 ? dotParts[1].toLowerCase() : ""` ternary's empty-string branch unreachable. The DeepSeek V4 commit (coder#3237) extended the surrounding formatter without touching this site, but the new DeepSeek branch made the asymmetry more obvious — the line directly above (`firstPart`) already accesses `dotParts[0].toLowerCase()` without a guard, so the ternary on `secondPart` was the odd one out. Inline to a direct `dotParts[1].toLowerCase()` access (matching `firstPart`'s shape) and capture the rationale in a one-line comment so a future reader doesn't reintroduce the guard. Pure dead-code cleanup — emitted JS, runtime behavior, and the existing 18-test `modelDisplay` suite (including the new DeepSeek + existing Bedrock cases) are all unchanged. ### Drop unused `workspaceName` param from `parseRuntimeString` The second argument to `parseRuntimeString` (in `src/browser/utils/chatCommands.ts`) was named `_workspaceName` (underscore-prefixed = unused) when it was introduced in the original SSH runtime PR, and has never been referenced by the function body — error messages are all generic and don't include any workspace-specific context. The `/new`-mirrors-`/fork` simplification (coder#3230) made the noise especially visible: the new `createNewWorkspace` call site had to pass `options.workspaceName ?? "(auto-generated)"` purely to satisfy the signature, and added a comment claiming `parseRuntimeString only uses the name for error reporting context` — but the function never reads it. Mobile already had the cleaner shape (`parseRuntimeStringForMobile(runtime?: string)`). Drop the parameter from the signature, drop the placeholder + misleading comment at the only non-test caller (`createNewWorkspace`), and drop the `workspaceName = "test-workspace"` constant + 23 second-arguments in `chatCommands.test.ts`. Pure dead-parameter cleanup — emitted JS, error messages, and runtime behavior are all identical, and the desktop signature now matches mobile's. ### Drop redundant `isPlanHandoffAgent` boolean in `streamContextBuilder` The `isPlanHandoffAgent` boolean in `buildPlanInstructions` was extracted when the gate was `effectiveAgentId === "exec" || effectiveAgentId === "orchestrator"`. After coder#3224 ripped out the Orchestrator agent, the boolean collapsed to a single equality check (`effectiveAgentId === "exec"`), and the trailing `else if (isPlanHandoffAgent && chatHasStartHerePlanSummary)` redundantly re-evaluated the same gate just to log a debug line. Replace the flat `if/else if` with a nested `if (effectiveAgentId === "exec") { … }` that tests the Start Here summary inside, removing the duplicate gate re-check and the now-meaningless boolean. A short comment captures the rationale so a future reader doesn't reintroduce the alias. Pure refactor — emitted control flow, the debug log, and `planContentForTransition` assignments are identical. ### Extract `seedScrollDirectionBaseline` helper in `useAutoScroll` `useAutoScroll` (touched by coder#3226) seeds `lastScrollTopRef` from two call sites — `jumpToBottom` and `disableAutoScroll` — to keep `handleScroll`'s released-branch direction check (`currentScrollTop > previousScrollTop`) honest after a programmatic ownership transfer that may not produce a follow-up scroll event. Both sites repeated the same write (`lastScrollTopRef.current = contentRef.current?.scrollTop ?? 0`) and a multi-line block comment re-explaining the same shared rationale. Extract a `seedScrollDirectionBaseline` `useCallback` with the shared rationale captured once on the helper itself. Each call site reduces to a single call plus a one-line comment naming the site-specific reason it doesn't get a free scroll-event refresh (`stickToBottom` skips the write at max; `disableAutoScroll` never fires a scroll event itself). The dependency arrays for `jumpToBottom` and `disableAutoScroll` add the new helper, which has empty deps (`useCallback(..., [])`) so its identity is stable across renders — no extra re-creations of the surrounding callbacks. This shrinks the surface area for future drift: a third path that needs to flip `autoScrollRef`/`programmaticDisableRef` without a guaranteed scroll-event tail (e.g. the deferred workspace-switch hydration race called out in coder#3226's "Pains") can call the helper instead of duplicating the rationale a third time. Pure refactor — emitted writes, write order, and ref values are identical, and the existing 25-test `useAutoScroll` suite (including the 6 new regression tests added in coder#3226) passes unchanged. ### Extract `pushStreamErrorRow` helper in `StreamingMessageAggregator` `buildDisplayedMessagesForMessage` now has two branches that synthesize a `stream-error` `DisplayedMessage`: the existing `message.metadata?.error` path and the `finishReason === "length"` path added in coder#3223. Both pushes were structurally identical, differing only in `id` suffix, `error` string, and `errorType` — the parent-message-derived fields (`historyId`, `historySequence`, `model`, `routedThroughGateway`, `timestamp`) were duplicated across both call sites. Extract a local `pushStreamErrorRow` closure that captures the shared fields once. Each branch reduces to a single call passing the three differing values. The `model` access switches from `message.metadata.model` (which relied on the outer `if (message.metadata?.error)` narrowing) to `message.metadata?.model`, which is functionally identical when `metadata` is defined and falls through to `undefined` otherwise — same emitted value either way. This shrinks the surface area for future drift: the next branch added (e.g. a different finish-reason synthesis) can't accidentally pass a stale `model` or forget `routedThroughGateway`. Pure refactor — emitted `DisplayedMessage` objects are identical, and the existing 77-test SMA suite (including the 6 new max-tokens tests) passes unchanged. ### Drop redundant `GuardAnchors` type alias in `file_edit_insert` In `src/node/services/tools/file_edit_insert.ts`, `GuardAnchors` was defined as `Pick<InsertContentOptions, "insert_before" | "insert_after">`, but `InsertContentOptions` itself is already `Pick<FileEditInsertToolArgs, "insert_before" | "insert_after">` after the `.nullish()` strict-mode refactor in coder#2250 stripped the `InsertContentOptions` interface down to those same two fields. The two aliases became structurally identical, so `GuardAnchors` is dead. Drop the alias and use `InsertContentOptions` for the two callers (`insertWithGuards`, `resolveGuardAnchor`). Both names were file-local; no exports change. The function names already convey "guard" context, so the parameter type doesn't need to repeat it. This noise was especially visible to the new guardless-empty-file path added in coder#3220 since it sits next to `insertContent` which uses `InsertContentOptions`. </details> ## Validation - `bun test src/browser/utils/slashCommands/ghostHint.test.ts` — 10/10 pass. - `bun test src/browser/utils/slashCommands/` — 136/136 pass (full slashCommands suite, covers `parser`, `suggestions`, `goal`, `heartbeat`, `idle`, `new`, `fork`, `compact`, `parser_multiline`, and `ghostHint`). - `make static-check` — eslint, tsgo (×2), prettier, shfmt, ruff, code-to-docs link check all pass. ## Risks Minimal — purely a local table-extraction inside `getCommandGhostHint`. The table-driven lookup short-circuits exactly the way the two if-blocks did: for non-gated commands `COMMAND_GHOST_HINT_EXPERIMENT_GATES[commandKey]` returns `undefined`, the `gateExperimentId != null` guard fails, and execution falls through to the existing `SLASH_COMMAND_DEFINITION_MAP.get(...)` lookup. For `/goal` and `/heartbeat` the only behaviour change is that the goal branch now inherits the heartbeat branch's "hide on thrown experiment check" comment (the heartbeat branch's `catch` already had the comment; the goal branch's `catch` did not). No imports, types, or experiment IDs are introduced beyond the existing `ExperimentId` type-only import from `@/common/constants/experiments`. --- Auto-cleanup checkpoint: 23440dd --- _Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking: `xhigh`_ <!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=xhigh --> --------- Co-authored-by: mux-bot[bot] <264182336+mux-bot[bot]@users.noreply.github.com> Co-authored-by: mux-bot <mux-bot@coder.com> Co-authored-by: mux <bot@mux.coder.com> Co-authored-by: ammar-agent <ammar+ai@ammar.io> Co-authored-by: mux-auto-cleanup <noreply@anthropic.com> Co-authored-by: Mux <noreply@coder.com>
Summary
Adds an opt-in OpenAI WebSocket transport setting for the built-in OpenAI provider. When
webSocketTransportEnabledis true and the effective OpenAI wire format is Responses, eligible streaming Responses API requests use@vercel/ai-sdk-openai-websocket-fetch; existing HTTP behavior remains the default.Background
OpenAI's Responses WebSocket transport can reduce setup overhead for streaming, multi-step workflows, but Mux previously had no first-class provider-level opt-in. This keeps the feature scoped to the built-in OpenAI provider and preserves the saved preference when users temporarily switch to Chat Completions.
Implementation
webSocketTransportEnabledto provider config/status schemas and OpenAI provider settings.Validation
make static-checkagent-browserfor default/off, enabled, Chat Completions hidden, and Responses restored states.wss://api.openai.com/v1/responses.Risks
The main risk is provider transport composition regressions. The implementation pre-filters non-eligible requests so Mux's existing fetch behavior remains responsible for non-WebSocket HTTP paths, and cleanup is scoped per model/run to avoid process-wide socket lifetime complexity.
📋 Implementation Plan
Implementation Plan: OpenAI WebSocket Transport Opt-In
Goal
Add a non-breaking, optional OpenAI WebSocket Transport setting for the Built-in OpenAI Provider. When
webSocketTransportEnabledis persisted astrueand the effective OpenAI wire format is Responses, eligible streaming Responses API requests use the published OpenAI WebSocket fetch transport. Existing HTTP behavior remains the default.Verified context and constraints
CONTEXT.mdandPRD.md:webSocketTransportEnabled@vercel/ai-sdk-openai-websocket-fetch; do not implement the WebSocket protocol locallyserviceTier,wireFormat, andstorein provider config/status/UIProvidersSection.tsxalready has adjacent OpenAI settings for Service tier, Wire format, and Response storageproviderModelFactory.tscreates OpenAI models throughcreateOpenAI({ ..., fetch })streamManager.tsowns the main guaranteed stream cleanupfinallypathworkspaceTitleGenerator.tsis anotherstreamTextowner usingAIService.createModel()modelsfetch,createWebSocketFetch()is passed tocreateOpenAI({ fetch }), the package exposes.close(), and only streamingPOST /responsesrequests use WebSocket while other requests fall through to standard fetch.Recommended approach
Approach A: Provider-config opt-in + small WebSocket fetch composition module + language-model cleanup symbol
Net product-code LoC estimate: ~230–360 LoC
Estimated product-code breakdown:
Why this approach:
createModel()return API stableRejected alternatives:
createModel()to return{ model, cleanup }: explicit but high-churn across call sites and tests. Product-code estimate: ~120–220 LoC plus broad type/test churn.Implementation phases
Phase 0 — Documentation alignment
CONTEXT.mdas the canonical glossary and decision summary for this feature.webSocketTransportEnabled.CONTEXT.mdin the same change set rather than leaving the glossary stale.PRD.mdaligned with the implemented scope.Quality gate after Phase 0:
CONTEXT.mdandPRD.mdmention the current package name,@vercel/ai-sdk-openai-websocket-fetch, before implementation begins.Phase 1 — Dependency and schema/status plumbing
@vercel/ai-sdk-openai-websocket-fetchusing Bun.bun add @vercel/ai-sdk-openai-websocket-fetchsopackage.jsonand lockfile remain consistent.webSocketTransportEnabled: z.boolean().optional()to the Built-in OpenAI Provider config schema.serviceTier,defaultModel,apiVersion, and other persisted OpenAI settings.webSocketTransportEnabled?: booleanto provider-status/oRPC schema output.wireFormatandstorebecause the settings UI consumes these together.storeboolean pattern: only copy the value into provider status whentypeof config.webSocketTransportEnabled === "boolean".Quality gate after Phase 1:
Phase 2 — Settings UI control
keyPath: ["webSocketTransportEnabled"],value: true.value: ""to remove the field if existing provider config mutation semantics treat empty string as delete; otherwise setfalseonly if that is the established boolean-toggle convention. Verify the currentsetConfigbehavior before implementing this detail.webSocketTransportEnabledvalue while disabled.Quality gate after Phase 2:
setProviderConfigwireFormat === "chatCompletions"Phase 3 — Deep module: OpenAI WebSocket fetch composition
Create a small node-side helper module for WebSocket transport composition.
Responsibilities:
enabledboolean that has already applied runtime eligibility (webSocketTransportEnabled === trueand effective wire format is Responses).createWebSocketFetch()and return:createOpenAI({ fetch }).close()exactly onceImportant implementation detail to verify while coding:
globalThis.fetchfor non-WebSocket requests. If using it directly would bypass Mux's base fetch for HTTP fallthrough, compose a wrapper so non-eligible requests still call Mux's base fetch. Keep this wrapper simple and test it with mocked fetches.Suggested public interface shape:
createOpenAIWebSocketTransportFetch({ enabled, baseFetch }): { fetch: typeof fetch; close: () => void }closeis callable when enabled and should make cleanup idempotent.Quality gate after Phase 3:
@vercel/ai-sdk-openai-websocket-fetchpackage.Phase 4 — Deep module: language-model cleanup helper
Create a Mux-owned cleanup helper for provider-created language models.
Responsibilities:
Suggested public interface shape:
attachLanguageModelCleanup(model, cleanup): LanguageModelrunLanguageModelCleanup(model): voidQuality gate after Phase 4:
Phase 5 — Provider model factory integration
webSocketTransportEnabled === trueserviceTier,wireFormat, andstoreunchanged.fetchtocreateOpenAI.fetchWithOpenAICodexNormalizationbehavior.provider.responses(modelId)orprovider.chat(modelId)), attach the close hook only when the helper created an active WebSocket cleanup.wrapLanguageModel, verify whether wrapping preserves object identity/metadata.Quality gate after Phase 5:
Phase 6 — Stream owner cleanup integration
streamManager): callrunLanguageModelCleanup(streamInfo.request.model)or equivalent model reference in the existing guaranteed cleanupfinallyblock.LanguageModelobject, not the model string.streamTextattempt intry/finallyand call cleanup for that candidate's model.streamTextortoolResultsthrows and the loop tries the next candidate.streamTextowners using provider-created models before finalizing.Quality gate after Phase 6:
Phase 7 — Validation and full static checks
Run validation in increasing scope:
Suggested commands:
bun test src/common/config/schemas/providersConfig.test.tsbun test src/common/orpc/schemas/api.test.tsbun test src/node/services/providerService.test.tsbun test src/node/services/providerModelFactory.test.tsbun test src/node/services/streamManager.test.tsbun test src/browser/features/Settings/Sections/ProvidersSection.test.tsxmake typecheckmake lintmake static-checkUse
run_and_reportwhen running multiple validation steps in one shell call, per repo guidance.Dogfooding plan
Dogfooding is required before claiming the feature is ready. Live OpenAI runtime dogfooding is optional if credentials/endpoints are unavailable, but UI dogfooding should still run.
Dogfood setup
make dev-server-sandboxfor web/settings dogfooding so the run uses an isolatedMUX_ROOTand free ports instead of the defaultmake devstate.make dev-desktop-sandboxonly if Electron-specific desktop behavior must be verified.agent-browseror the repo's Electron automation helper.Dogfood scenarios
Dogfood artifacts
Attach or save:
Acceptance criteria
webSocketTransportEnabledis explicitly set true.webSocketTransportEnabledfor the Built-in OpenAI Provider.webSocketTransportEnabled === trueand effective Responses wire format.Risks and mitigations
finally, not inside fetch response completion per step.streamTextcall sites that use provider-created models and add a helper usage pattern; consider a short code comment at the helper call explaining the invariant.Handoff notes for implementation
muxProviderOptions.openai.webSocketTransportEnabledsupport in this iteration.Generated with
mux• Model:openai:gpt-5.5• Thinking:high• Cost:$71.27