Skip to content

Add realtime speech append control#27917

Merged
guinness-oai merged 13 commits into
mainfrom
guinness/realtime-handoff-append-control
Jun 15, 2026
Merged

Add realtime speech append control#27917
guinness-oai merged 13 commits into
mainfrom
guinness/realtime-handoff-append-control

Conversation

@guinness-oai

@guinness-oai guinness-oai commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Why

Realtime voice harness tuning needs app-side control over what backend Codex text is spoken. Backend orchestrator text is written for a reading UI, so automatically speaking every preamble, progress update, or final assistant message can make the realtime voice model too chatty.

For experimentation, clients need two simple controls: keep app/client text-item injection on the existing item-create path, and add an explicit speakable path that app code can call only when it wants realtime to speak. Automatic Codex output also needs an opt-in way to switch from the protocol's default speakable path to regular realtime items, with a caller-provided prefix so prompt wording can be tuned outside core.

The default remains unchanged: if a client omits the new start fields and never calls appendSpeech, automatic backend output continues down the existing speakable path for the selected realtime protocol.

What Changed

  • Adds experimental thread/realtime/appendSpeech for app-provided speakable text.
  • Keeps existing thread/realtime/appendText as the item-create API for app-provided realtime text items.
  • Adds codexResponsesAsItems / codex_responses_as_items on thread/realtime/start to send automatic Codex responses with conversation.item.create instead of the protocol's default speakable output path.
  • Adds codexResponseItemPrefix / codex_response_item_prefix so clients can prepend experiment instructions to those automatic Codex response items.
  • Keeps literal conversation.handoff.append routing scoped to the v1 speakable path; v2 default speech uses its item/function-output plus response.create behavior.
  • Removes the earlier public silent-context API and hardcoded silent-context prefix.
  • Updates realtime tests to cover default automatic speakable behavior, opt-in automatic item-create behavior, and explicit appendSpeech behavior.

Validation

  • cargo check -p codex-core -p codex-app-server -p codex-api
  • just test -p codex-app-server realtime_conversation
  • just test -p codex-core realtime_conversation (50/51 passed in the filtered parallel run; the lone failure passed when rerun in isolation)
  • just test -p codex-core conversation_mirrors_assistant_message_text_to_realtime_handoff
  • just test -p codex-api e2e_connect_and_exchange_events_against_mock_ws_server
  • just fix -p codex-core
  • just fix -p codex-app-server
  • cargo build -p codex-cli

@guinness-oai guinness-oai changed the title Add realtime handoff append controls Use explicit realtime handoff speech Jun 12, 2026
…doff-append-control

# Conflicts:
#	codex-rs/app-server-protocol/src/protocol/common.rs
#	codex-rs/app-server-protocol/src/protocol/v2/realtime.rs
#	codex-rs/app-server/README.md
#	codex-rs/app-server/src/request_processors/turn_processor.rs
#	codex-rs/app-server/tests/suite/v2/experimental_api.rs
#	codex-rs/app-server/tests/suite/v2/realtime_conversation.rs
#	codex-rs/codex-api/src/endpoint/realtime_websocket/methods_v1.rs
#	codex-rs/codex-api/src/endpoint/realtime_websocket/methods_v2.rs
#	codex-rs/codex-api/src/endpoint/realtime_websocket/protocol.rs
#	codex-rs/core/src/realtime_conversation.rs
#	codex-rs/core/tests/suite/compact_remote.rs
#	codex-rs/core/tests/suite/realtime_conversation.rs
#	codex-rs/protocol/src/protocol.rs

@jiayuhuang-openai jiayuhuang-openai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean

@guinness-oai guinness-oai changed the title Use explicit realtime handoff speech Add explicit realtime speech and silent context APIs Jun 14, 2026
@guinness-oai guinness-oai changed the title Add explicit realtime speech and silent context APIs Add realtime speech append control Jun 15, 2026
@guinness-oai

Copy link
Copy Markdown
Collaborator Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab5d0d0282

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pub architecture: Option<RealtimeConversationArchitecture>,
/// Sends automatic Codex responses as realtime conversation items instead of handoff appends.
#[ts(optional = nullable)]
pub codex_responses_as_items: Option<bool>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Model omission-false flag as a bool

This flag is collapsed with unwrap_or(false), so omission and null mean the same thing while clients see a nullable tri-state. The app-server API guidance says omission-means-false booleans should be defaulted bool; please avoid the meaningless null case. guidance

Useful? React with 👍 / 👎.

Comment on lines +1428 to +1429
writer
.send_conversation_handoff_append(handoff_id, output_text)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid appending the final handoff text twice

When a V2 background_agent handoff completes in the default mode, the final assistant text has already been sent from handoff_out as a ProgressUpdate; TurnComplete then calls handoff_complete, and this arm appends the same last_output_text again. For delegated turns with any assistant output, realtime receives duplicate speakable backend text, so complete the handoff without re-appending the text or send only a true delta.

Useful? React with 👍 / 👎.

@guinness-oai guinness-oai marked this pull request as ready for review June 15, 2026 22:44
@guinness-oai guinness-oai requested a review from a team as a code owner June 15, 2026 22:44
@chatgpt-codex-connector

Copy link
Copy Markdown
Contributor

💡 Codex Review

pub struct ThreadRealtimeAppendSpeechParams {

P2 Badge Regenerate app-server schema fixtures

New v2 realtime API shapes are added, but app-server-protocol/schema has no appendSpeech/ThreadRealtimeAppendSpeech entries, so generated schema consumers will miss the method. Regenerate the app-server schema fixtures for this API shape change as required (guidance).

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@guinness-oai guinness-oai merged commit 1d8ff89 into main Jun 15, 2026
31 checks passed
@guinness-oai guinness-oai deleted the guinness/realtime-handoff-append-control branch June 15, 2026 23:16
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants