Add realtime speech append control#27917
Conversation
…doff-append-control # Conflicts: # codex-rs/app-server-protocol/src/protocol/common.rs # codex-rs/app-server-protocol/src/protocol/v2/realtime.rs # codex-rs/app-server/README.md # codex-rs/app-server/src/request_processors/turn_processor.rs # codex-rs/app-server/tests/suite/v2/experimental_api.rs # codex-rs/app-server/tests/suite/v2/realtime_conversation.rs # codex-rs/codex-api/src/endpoint/realtime_websocket/methods_v1.rs # codex-rs/codex-api/src/endpoint/realtime_websocket/methods_v2.rs # codex-rs/codex-api/src/endpoint/realtime_websocket/protocol.rs # codex-rs/core/src/realtime_conversation.rs # codex-rs/core/tests/suite/compact_remote.rs # codex-rs/core/tests/suite/realtime_conversation.rs # codex-rs/protocol/src/protocol.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab5d0d0282
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub architecture: Option<RealtimeConversationArchitecture>, | ||
| /// Sends automatic Codex responses as realtime conversation items instead of handoff appends. | ||
| #[ts(optional = nullable)] | ||
| pub codex_responses_as_items: Option<bool>, |
There was a problem hiding this comment.
Model omission-false flag as a bool
This flag is collapsed with unwrap_or(false), so omission and null mean the same thing while clients see a nullable tri-state. The app-server API guidance says omission-means-false booleans should be defaulted bool; please avoid the meaningless null case. guidance
Useful? React with 👍 / 👎.
| writer | ||
| .send_conversation_handoff_append(handoff_id, output_text) |
There was a problem hiding this comment.
Avoid appending the final handoff text twice
When a V2 background_agent handoff completes in the default mode, the final assistant text has already been sent from handoff_out as a ProgressUpdate; TurnComplete then calls handoff_complete, and this arm appends the same last_output_text again. For delegated turns with any assistant output, realtime receives duplicate speakable backend text, so complete the handoff without re-appending the text or send only a true delta.
Useful? React with 👍 / 👎.
💡 Codex ReviewNew v2 realtime API shapes are added, but ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Why
Realtime voice harness tuning needs app-side control over what backend Codex text is spoken. Backend orchestrator text is written for a reading UI, so automatically speaking every preamble, progress update, or final assistant message can make the realtime voice model too chatty.
For experimentation, clients need two simple controls: keep app/client text-item injection on the existing item-create path, and add an explicit speakable path that app code can call only when it wants realtime to speak. Automatic Codex output also needs an opt-in way to switch from the protocol's default speakable path to regular realtime items, with a caller-provided prefix so prompt wording can be tuned outside core.
The default remains unchanged: if a client omits the new start fields and never calls
appendSpeech, automatic backend output continues down the existing speakable path for the selected realtime protocol.What Changed
thread/realtime/appendSpeechfor app-provided speakable text.thread/realtime/appendTextas the item-create API for app-provided realtime text items.codexResponsesAsItems/codex_responses_as_itemsonthread/realtime/startto send automatic Codex responses withconversation.item.createinstead of the protocol's default speakable output path.codexResponseItemPrefix/codex_response_item_prefixso clients can prepend experiment instructions to those automatic Codex response items.conversation.handoff.appendrouting scoped to the v1 speakable path; v2 default speech uses its item/function-output plusresponse.createbehavior.appendSpeechbehavior.Validation
cargo check -p codex-core -p codex-app-server -p codex-apijust test -p codex-app-server realtime_conversationjust test -p codex-core realtime_conversation(50/51 passed in the filtered parallel run; the lone failure passed when rerun in isolation)just test -p codex-core conversation_mirrors_assistant_message_text_to_realtime_handoffjust test -p codex-api e2e_connect_and_exchange_events_against_mock_ws_serverjust fix -p codex-corejust fix -p codex-app-servercargo build -p codex-cli