fix(voice): recover buffered hotkey events after select! race (#527)#545
Conversation
publish_transcription now returns the number of active TRANSCRIPTION_BUS subscribers that received the message. When no receivers are connected (e.g. Socket.IO bridge not yet subscribed), the function logs a warning instead of silently discarding the broadcast. This makes it possible to diagnose "transcription produced but never delivered" scenarios from logs alone. Closes tinyhumansai#527 (partial) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On warm CPAL init (second+ recording), audio_capture::start_recording() completes fast enough that the pending_ready branch and the hotkey_rx branch are both ready simultaneously inside tokio::select!. select! picks one pseudo-randomly — when it picks pending_ready, the Released event sits unprocessed in hotkey_rx. The recording is stored as "live" with no deferred stop, then the next loop iteration processes the buffered Released and stops the recording almost immediately, producing a near-zero-length clip that the duration gate silently drops. Fix: after pending_ready resolves, call hotkey_rx.try_recv() to check for a buffered stop event that lost the race. If found, apply the deferred stop mechanism (MIN_RECORDING_AFTER_SETUP = 1500ms) instead of treating the recording as live. Also adds pipeline_id (UUID prefix) to all process_recording_bg log lines for end-to-end correlation, and labels each pipeline stage (stop_recording, gate_duration, gate_silence, transcribe, deliver) so dropped recordings are diagnosable from logs. Closes tinyhumansai#527 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe changes modify the transcription publishing mechanism to return receiver count alongside added structured pipeline logging, identifier tracking, and buffered hotkey recovery during recording setup completion. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/voice/server.rs (1)
646-656:⚠️ Potential issue | 🟠 MajorRedact transcription text from pipeline logs.
These logs now emit recognized text on both the normal and drop paths. Even truncated snippets can contain user secrets or PII, and this code runs for every recording. Please log metadata only here.
Suggested change
info!( - "{LOG_PREFIX} [pipeline={pipeline_id}] stage=transcription_result text='{}' chars={} elapsed_ms={}", - truncate_for_log(text, 80), + "{LOG_PREFIX} [pipeline={pipeline_id}] stage=transcription_result chars={} elapsed_ms={}", text.len(), transcribe_elapsed.as_millis() ); @@ warn!( - "{LOG_PREFIX} [pipeline={pipeline_id}] stage=gate_hallucination DROPPED text='{}'", - truncate_for_log(text, 60) + "{LOG_PREFIX} [pipeline={pipeline_id}] stage=gate_hallucination DROPPED chars={}", + text.len() );As per coding guidelines "Never log secrets, raw JWTs, API keys, or full PII in debug logs; redact or omit sensitive fields".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/voice/server.rs` around lines 646 - 656, The logs currently emit recognized transcription text (via truncate_for_log(text, ...)) in both the normal transcription_result and the gate_hallucination warn paths; remove/raw text from these logs and replace with metadata-only fields (e.g., pipeline_id, stage, chars=text.len(), elapsed_ms=..., and a redaction marker like "<REDACTED>"). Update the LOG_PREFIX/transcription_result logging call and the warn! in the is_hallucinated_output branch so they do not include or interpolate the transcription string (use truncate_for_log only for debugging counters if absolutely required — better: omit it) and keep use of is_hallucinated_output, pipeline_id, text.len(), and transcribe_elapsed to provide non-sensitive context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/voice/server.rs`:
- Around line 301-326: The buffered hotkey read via hotkey_rx.try_recv() is
currently consumed but not rebroadcast, so
dictation_listener::publish_dictation_event(...) is never called for buffered
HotkeyEvent::Released/Pressed; before setting pending_stop = true in the
try_recv() match arm, call the same publish path used in the main
hotkey_rx.recv() branch (i.e., forward the buffered HotkeyEvent to
dictation_listener::publish_dictation_event or the helper that publishes
dictation:toggle events), preserving the event variant (HotkeyEvent::Released or
HotkeyEvent::Pressed) so subscribers see the recovered stop/toggle, then mark
pending_stop.
---
Outside diff comments:
In `@src/openhuman/voice/server.rs`:
- Around line 646-656: The logs currently emit recognized transcription text
(via truncate_for_log(text, ...)) in both the normal transcription_result and
the gate_hallucination warn paths; remove/raw text from these logs and replace
with metadata-only fields (e.g., pipeline_id, stage, chars=text.len(),
elapsed_ms=..., and a redaction marker like "<REDACTED>"). Update the
LOG_PREFIX/transcription_result logging call and the warn! in the
is_hallucinated_output branch so they do not include or interpolate the
transcription string (use truncate_for_log only for debugging counters if
absolutely required — better: omit it) and keep use of is_hallucinated_output,
pipeline_id, text.len(), and transcribe_elapsed to provide non-sensitive
context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a1e611f2-ec73-4d4a-bf20-eaf3b13ddf5a
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockapp/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
src/openhuman/voice/dictation_listener.rssrc/openhuman/voice/server.rs
| // Check for a buffered stop event that lost the | ||
| // select! race against pending_ready. On warm CPAL | ||
| // init both branches may be ready simultaneously; | ||
| // select! picks one pseudo-randomly, so a Released | ||
| // event can sit unprocessed in hotkey_rx. | ||
| let had_pending_stop = pending_stop; | ||
| if !pending_stop { | ||
| if let Ok(buffered) = hotkey_rx.try_recv() { | ||
| match buffered { | ||
| HotkeyEvent::Released => { | ||
| info!( | ||
| "{LOG_PREFIX} recording handle ready — found buffered Released in hotkey_rx (select! race recovered)" | ||
| ); | ||
| pending_stop = true; | ||
| } | ||
| HotkeyEvent::Pressed => { | ||
| // A second Pressed while pending means | ||
| // user wants to stop (tap-style). Treat | ||
| // the same as a stop intent. | ||
| info!( | ||
| "{LOG_PREFIX} recording handle ready — found buffered Pressed in hotkey_rx (treating as stop intent)" | ||
| ); | ||
| pending_stop = true; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Rebroadcast the recovered hotkey event before consuming it.
hotkey_rx.try_recv() removes the buffered Released/Pressed, but only the main hotkey_rx.recv() branch publishes dictation_listener::publish_dictation_event(...). In this raced path, dictation:toggle subscribers never see the recovered stop/toggle event, so the frontend can stay out of sync with the actual recording state. Please route buffered through the same publish path before turning it into pending_stop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/voice/server.rs` around lines 301 - 326, The buffered hotkey
read via hotkey_rx.try_recv() is currently consumed but not rebroadcast, so
dictation_listener::publish_dictation_event(...) is never called for buffered
HotkeyEvent::Released/Pressed; before setting pending_stop = true in the
try_recv() match arm, call the same publish path used in the main
hotkey_rx.recv() branch (i.e., forward the buffered HotkeyEvent to
dictation_listener::publish_dictation_event or the helper that publishes
dictation:toggle events), preserving the event variant (HotkeyEvent::Released or
HotkeyEvent::Pressed) so subscribers see the recovered stop/toggle, then mark
pending_stop.
Summary
tokio::select!race condition that caused voice transcripts to be silently dropped on second+ recordingstry_recv()check for buffered hotkey events when recording setup completes, recovering stop intents that lost the racepublish_transcriptionreturn receiver count and log warnings when no subscribers are connectedProblem
After the first successful voice recording, subsequent recordings would silently fail to produce transcriptions. The root cause is a race condition in the hotkey event loop's
tokio::select!:On warm CPAL init (second+ recording),
audio_capture::start_recording()completes fast enough that bothpending_readyandhotkey_rxbranches are simultaneously ready.select!picks pseudo-randomly — when it pickspending_ready, theReleasedevent sits unprocessed inhotkey_rx. The recording is stored as "live" (no deferred stop), then the next loop iteration immediately processes the bufferedReleased, stopping the recording after near-zero duration. The duration gate silently drops the clip with only awarn!log.Additionally, multiple pipeline stages (duration gate, silence gate, hallucination filter, empty transcription) all silently discard recordings without structured logging, making diagnosis difficult.
Solution
Core fix (
server.rs:295-344): Afterpending_readyresolves successfully withpending_stop == false, callhotkey_rx.try_recv()to check for a buffered stop event that lost theselect!race. If aReleased(orPressedtoggle-stop) is found, apply the existing deferred stop mechanism (MIN_RECORDING_AFTER_SETUP = 1500ms) instead of treating the recording as live.Diagnostic logging (
server.rs): Addedpipeline_id(UUID v4 prefix) tospawn_process_recordingand threaded it throughprocess_recording_bg. Every pipeline stage now logs with[pipeline=<id>] stage=<name>format for end-to-end correlation. Drop points explicitly logDROPPEDwith the gate that triggered.Receiver visibility (
dictation_listener.rs):publish_transcriptionnow returnsusize(receiver count) and logs a warning when the broadcast has zero subscribers, making "delivered but nobody listening" scenarios visible.Submission Checklist
process_recording_bgtests updated for newpipeline_idparameter;publish_transcriptiontests still pass with new return typepublish_transcriptiondocumented with return value semanticsImpact
publish_transcriptionreturn type changed from()tousizebut all call sites either ignore or now use the valuetry_recv()(non-blocking) per recording setup completionRelated
Summary by CodeRabbit
New Features
Improvements