Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/opencode/src/session/message-v2.ts
Original file line number Diff line number Diff line change
Expand Up @@ -711,6 +711,11 @@ export function fromError(
},
{ cause: e },
).toObject()
// Convert APIError class instances thrown via `Effect.fail(new APIError(...))`
// to their wire form so the TUI receives the structured message and metadata
// instead of being wrapped by the generic Error fallback below.
case APIError.isInstance(e):
return e instanceof Error ? e.toObject() : e
case e instanceof Error:
return new NamedError.Unknown({ message: errorMessage(e) }, { cause: e }).toObject()
default:
Expand Down
57 changes: 55 additions & 2 deletions packages/opencode/src/session/processor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,10 @@ interface ProcessorContext extends Input {
currentTextID: string | undefined
reasoningMap: Record<string, SessionV1.ReasoningPart>
v2AssistantMessageID: SessionMessage.ID | undefined
// Part id created just before the current attempt begins; parts with a
// greater id were produced by the attempt and are discarded when it is
// retried after a stream truncation.
partFloor: PartID
}

type StreamEvent = LLMEvent
Expand Down Expand Up @@ -128,6 +132,7 @@ export const layer = Layer.effect(
currentTextID: undefined,
reasoningMap: {},
v2AssistantMessageID: undefined,
partFloor: PartID.ascending(),
}
const mirrorAssistant = flags.experimentalEventSystem && !input.assistantMessage.summary
let aborted = false
Expand Down Expand Up @@ -395,7 +400,7 @@ export const layer = Layer.effect(
time: { start: Date.now() },
metadata: value.providerMetadata,
}
yield* session.updatePart(ctx.reasoningMap[value.id])
yield* session.updatePart(ctx.reasoningMap[value.id])
return

case "reasoning-delta":
Expand Down Expand Up @@ -701,6 +706,20 @@ export const layer = Layer.effect(
usage: value.usage ?? new Usage({}),
metadata: value.providerMetadata,
})
// Detect stream truncation: the AI SDK reports the unmapped
// fallback reason when the upstream provider stream ends without a
// proper stop_reason. No usage and no output means the connection
// was cut mid-generation, which is a transient failure that should
// be retried.
if (value.reason === "unknown" && usage.tokens.output === 0) {
return yield* Effect.fail(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for the human to decide: this failure happens after stream parts may already have been persisted on the current assistant message. Because Effect.retry(...) wraps the stream before cleanup() runs, a retry will start a new stream on the same message without removing the partial text/reasoning parts from the truncated attempt, so a successful retry can leave the original truncated content plus the retried response in the final assistant message. Consider clearing the in-flight attempt parts before retrying, or moving this detection earlier to a place where no partial parts have been committed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — verified the concern is real:

  • Effect.ensuring(cleanup()) wraps the retry, so cleanup() only runs at the very end of the whole chain.
  • ctx.currentText / ctx.reasoningMap persist across retry attempts (closure-captured).
  • text-start and reasoning-start call session.updatePart(...) immediately, so partial parts are already in SQLite by the time finish-step fires.

Pushed a fix in 0a09591b2:

  • Track partIDs created during each attempt on ctx.attemptParts (pushed in text-start / reasoning-start).
  • New discardAttempt() helper deletes those parts via session.removePart(...) and resets currentText / reasoningMap / snapshot.
  • Hooked into the retry policy's set callback so it fires only when a retry will actually happen. Terminal failures (no retry) route through halt and keep the partial content as user-visible context.

Note this is a pre-existing issue affecting all retryable mid-stream errors (ECONNRESET, ZlibError, SSE timeout, etc.); the EmptyOther path just makes it more frequent. The fix applies uniformly to all of them.

Added an it.instance regression test (retry discards in-flight parts from the failed attempt) that pushes a truncated reply followed by a clean success and asserts the final message contains only the retried text.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small style-guide suggestion, optional for the human to decide: in Effect.gen / Effect.fn, this repo prefers yield* new MyError(...) for direct typed-error failures instead of wrapping the error with Effect.fail(...). This branch could be written as return yield* new MessageV2.APIError({ ... }) while preserving the same behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The yield* new MyError(...) pattern requires Schema.TaggedErrorClass-derived errors (Effect's YieldableError). MessageV2.APIError is built with namedSchemaError (message-v2.ts:51), which extends Error directly without [Symbol.iterator]. The suggested form fails to compile:

src/session/processor.ts: error TS2488:
  Type 'NamedSchemaError' must have a '[Symbol.iterator]()' method that returns an iterator.

All 14 existing yield* new ... sites in src/ use Schema.TaggedErrorClass (UpgradeFailedError, CliError, PhotonUnavailableError, RejectedError, etc.). Migrating MessageV2.APIError and its siblings (AbortedError, OutputLengthError, AuthError, ContextOverflowError) from namedSchemaError to Schema.TaggedErrorClass would change the wire schema ({ name, data }{ _tag, ... }) and break SDK consumers — out of scope for this PR.

Keeping the Effect.fail(new MessageV2.APIError(...)) form.

new SessionV1.APIError({
message: "Provider stream ended without a stop reason",
isRetryable: true,
metadata: { code: "EmptyOther" },
}),
)
}
Comment on lines +714 to +722

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest extending this check to also catch finish_reason: "stop" with zero output tokens.

Hit the same failure shape on Azure-served gpt-5.5 via the OpenAI-compatible adapter:

  • Assistant turn finished cleanly: finish_reason: stop, 0 output tokens, no text or tool parts
  • Every subsequent user message returned the same empty shape
  • Same "session degradation" pattern you describe in this PR

Why the current check misses it: the PR guards on reason === "unknown", which is the AI SDK fallback when the stream ends without a stop_reason. In my case the /chat/completions stream still emitted finish_reason: stop in the final chunk despite carrying no content. My turn slipped through with reason: stop and got persisted.

Suggested extension (reuses ctx.attemptParts from this PR so it doesn't trip on legitimate text-emitting stop turns):

if (
  usage.tokens.output === 0 &&
  ctx.attemptParts.length === 0 &&
  (value.reason === "unknown" || value.reason === "stop")
) {
  return yield* Effect.fail(new MessageV2.APIError({
    message: "Provider returned empty stream",
    isRetryable: true,
    metadata: { code: "EmptyStream" },
  }))
}

if (!ctx.assistantMessage.summary) {
// TODO(v2): Temporary dual-write while migrating session messages to v2 events.
if (mirrorAssistant) {
Expand Down Expand Up @@ -846,6 +865,27 @@ export const layer = Layer.effect(
}
})

// Discards every part the failed attempt persisted (anything created
// after partFloor) so a successful retry replaces rather than appends to
// the truncated content. The assistant message is created fresh per
// process() call, so the floor scopes removal to this attempt's output.
const discardAttempt = Effect.fn("SessionProcessor.discardAttempt")(function* () {
const existing = yield* MessageV2.parts(ctx.assistantMessage.id).pipe(
Effect.provideService(Database.Service, database),
)
for (const part of existing) {
if (part.id <= ctx.partFloor) continue
yield* session.removePart({
sessionID: ctx.sessionID,
messageID: ctx.assistantMessage.id,
partID: part.id,
})
}
ctx.currentText = undefined
ctx.reasoningMap = {}
ctx.toolcalls = {}
})

const cleanup = Effect.fn("SessionProcessor.cleanup")(function* () {
if (ctx.snapshot) {
const patch = yield* snapshot.patch(ctx.snapshot)
Expand Down Expand Up @@ -933,6 +973,11 @@ export const layer = Layer.effect(
yield* events.publish(Session.Event.Error, { sessionID: ctx.sessionID, error })
return
}
// Retries are exhausted: drop the truncated attempt's partial parts so
// the failed message doesn't keep an orphan step-start / partial text.
if (SessionV1.APIError.isInstance(error) && error.data.metadata?.code === "EmptyOther") {
yield* discardAttempt()
}
if (!ctx.assistantMessage.summary) {
// TODO(v2): Temporary dual-write while migrating session messages to v2 events.
if (mirrorAssistant) {
Expand All @@ -959,6 +1004,9 @@ export const layer = Layer.effect(
slog.info("process")
ctx.needsCompaction = false
ctx.shouldBreak = (yield* config.get()).experimental?.continue_loop_on_deny !== true
// Record the high-water mark before any attempt persists parts so a
// truncation retry can discard exactly this call's output.
ctx.partFloor = PartID.ascending()

return yield* Effect.gen(function* () {
yield* Effect.gen(function* () {
Expand Down Expand Up @@ -1003,7 +1051,12 @@ export const layer = Layer.effect(
timestamp: DateTime.makeUnsafe(Date.now()),
})
: Effect.void
return flushV2Fragments().pipe(
// Only stream truncations leave partial parts worth discarding;
// other retryable errors (rate limits, 5xx) retry untouched.
const truncated =
SessionV1.APIError.isInstance(info.error) && info.error.data.metadata?.code === "EmptyOther"
return (truncated ? discardAttempt() : Effect.void).pipe(
Effect.andThen(flushV2Fragments()),
Effect.andThen(event),
Effect.andThen(
status.set(ctx.sessionID, {
Expand Down
20 changes: 19 additions & 1 deletion packages/opencode/src/session/retry.ts
Original file line number Diff line number Diff line change
Expand Up @@ -176,13 +176,30 @@ function parseJSON(value: unknown) {
export function policy(opts: {
provider: string
parse: (error: unknown) => Err
set: (input: { attempt: number; message: string; action?: Retryable["action"]; next: number }) => Effect.Effect<void>
set: (input: {
attempt: number
message: string
action?: Retryable["action"]
next: number
error: Err
}) => Effect.Effect<void>
}) {
return Schedule.fromStepWithMetadata(
Effect.succeed((meta: Schedule.InputMetadata<unknown>) => {
const error = opts.parse(meta.input)
const retry = retryable(error, opts.provider)
if (!retry) return Cause.done(meta.attempt)
// Cap empty-other stream-truncation retries to avoid infinite loops if a
// provider keeps closing streams without a stop_reason. Other retryable
// classifications (rate limits, 5xx, ZlibError, etc.) keep their existing
// unbounded behaviour.
if (
SessionV1.APIError.isInstance(error) &&
error.data.metadata?.code === "EmptyOther" &&
meta.attempt >= 3
) {
return Cause.done(meta.attempt)
}
return Effect.gen(function* () {
const wait = delay(meta.attempt, SessionV1.APIError.isInstance(error) ? error : undefined)
const now = yield* Clock.currentTimeMillis
Expand All @@ -191,6 +208,7 @@ export function policy(opts: {
message: retry.message,
action: retry.action,
next: now + wait,
error,
})
return [meta.attempt, Duration.millis(wait)] as [number, Duration.Duration]
})
Expand Down
32 changes: 32 additions & 0 deletions packages/opencode/test/session/prompt.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2316,3 +2316,35 @@ noLLMServer.instance(
}),
30_000,
)

it.instance("retry discards in-flight parts from the failed attempt", () =>
Effect.gen(function* () {
const { llm } = yield* useServerConfig(providerCfg)
const prompt = yield* SessionPrompt.Service
const sessions = yield* Session.Service
const chat = yield* sessions.create({
title: "Discard test",
permission: [{ permission: "*", pattern: "*", action: "allow" }],
})
yield* prompt.prompt({
sessionID: chat.id,
agent: "build",
noReply: true,
parts: [{ type: "text", text: "hello" }],
})
// Attempt 1: emit partial text but never a finish_reason. The AI SDK
// flushes with finishReason="other" and usage.outputTokens=0, which the
// processor catches as EmptyOther and triggers a retry.
yield* llm.push(reply().text("partial first attempt").item())
yield* llm.push(reply().text("final answer").stop().item())

const result = yield* prompt.loop({ sessionID: chat.id })

expect(yield* llm.hits).toHaveLength(2)
const texts = result.parts.filter((p) => p.type === "text").map((p) => (p as SessionV1.TextPart).text)
expect(texts).toEqual(["final answer"])
// The discarded attempt's step-start must be removed too, otherwise the
// message keeps an orphan step-start per retry.
expect(result.parts.filter((p) => p.type === "step-start")).toHaveLength(1)
}),
)
77 changes: 77 additions & 0 deletions packages/opencode/test/session/retry.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import { ProviderError } from "../../src/provider/error"
import { SessionID } from "../../src/session/schema"
import { SessionStatus } from "../../src/session/status"
import { testEffect } from "../lib/effect"
import { provideTmpdirInstance } from "../fixture/fixture"
import { ProviderV2 } from "@opencode-ai/core/provider"

const providerID = ProviderV2.ID.make("test")
Expand Down Expand Up @@ -343,6 +344,63 @@ describe("session.retry.retryable", () => {
"Usage limit reached. It will reset in 15 minutes. To continue using this model now, enable usage from your available balance",
)
})

test("retries EmptyOther stream truncation failures", () => {
const error = new SessionV1.APIError({
message: "Provider stream ended without a stop reason",
isRetryable: true,
metadata: { code: "EmptyOther" },
}).toObject() as SessionV1.APIError

expect(SessionRetry.retryable(error, retryProvider)).toEqual({
message: "Provider stream ended without a stop reason",
})
})

it.live("policy stops retrying EmptyOther after 3 attempts", () =>
provideTmpdirInstance(() =>
Effect.gen(function* () {
const sessionID = SessionID.make("session-empty-other-test")
// retry-after-ms=0 keeps the test fast; the cap is driven by metadata.code.
const error = new SessionV1.APIError({
message: "Provider stream ended without a stop reason",
isRetryable: true,
metadata: { code: "EmptyOther" },
responseHeaders: { "retry-after-ms": "0" },
}).toObject() as SessionV1.APIError
const status = yield* SessionStatus.Service

const step = yield* Schedule.toStepWithMetadata(
SessionRetry.policy({
provider: retryProvider,
parse: (err) => err as SessionV1.APIError,
set: (info) =>
status.set(sessionID, {
type: "retry",
attempt: info.attempt,
message: info.message,
next: info.next,
}),
}),
)
// attempt=1 and attempt=2 run normally and update status.
yield* step(error)
yield* step(error)
// attempt=3 hits the EmptyOther cap and signals Cause.done.
// Effect.exit captures the schedule termination so it doesn't
// leak as an unhandled failure.
const thirdExit = yield* Effect.exit(step(error))

expect(thirdExit._tag).toBe("Failure")

expect(yield* status.get(sessionID)).toMatchObject({
type: "retry",
attempt: 2,
message: "Provider stream ended without a stop reason",
})
}),
),
)
})

describe("session.message-v2.fromError", () => {
Expand Down Expand Up @@ -397,6 +455,25 @@ describe("session.message-v2.fromError", () => {
expect(retryable).toEqual({ message: "Connection reset by server" })
})

test("converts APIError class instances to wire form for storage", () => {
// The processor throws via `yield* new SessionV1.APIError(...)`; fromError
// must convert the class instance to its wire form so the TUI renders the
// structured message and metadata rather than a JSON-stringified
// UnknownError wrapper.
const thrown = new SessionV1.APIError({
message: "Provider stream ended without a stop reason",
isRetryable: true,
metadata: { code: "EmptyOther" },
})

const result = MessageV2.fromError(thrown, { providerID })

expect(SessionV1.APIError.isInstance(result)).toBe(true)
expect((result as SessionV1.APIError).data.message).toBe("Provider stream ended without a stop reason")
expect((result as SessionV1.APIError).data.metadata?.code).toBe("EmptyOther")
expect((result as { name: string }).name).toBe("APIError")
})

test("marks OpenAI 404 status codes as retryable", () => {
const error = new APICallError({
message: "boom",
Expand Down
Loading