Skip to content

fix(voice): recover buffered hotkey events after select! race (#527)#545

Merged
senamakel merged 2 commits intotinyhumansai:mainfrom
oxoxDev:fix/527-voice-dictation-drops-transcripts
Apr 13, 2026
Merged

fix(voice): recover buffered hotkey events after select! race (#527)#545
senamakel merged 2 commits intotinyhumansai:mainfrom
oxoxDev:fix/527-voice-dictation-drops-transcripts

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Apr 13, 2026

Summary

  • Fix tokio::select! race condition that caused voice transcripts to be silently dropped on second+ recordings
  • Add try_recv() check for buffered hotkey events when recording setup completes, recovering stop intents that lost the race
  • Add pipeline-level diagnostic logging with UUID correlation IDs across all processing stages
  • Make publish_transcription return receiver count and log warnings when no subscribers are connected

Problem

After the first successful voice recording, subsequent recordings would silently fail to produce transcriptions. The root cause is a race condition in the hotkey event loop's tokio::select!:

On warm CPAL init (second+ recording), audio_capture::start_recording() completes fast enough that both pending_ready and hotkey_rx branches are simultaneously ready. select! picks pseudo-randomly — when it picks pending_ready, the Released event sits unprocessed in hotkey_rx. The recording is stored as "live" (no deferred stop), then the next loop iteration immediately processes the buffered Released, stopping the recording after near-zero duration. The duration gate silently drops the clip with only a warn! log.

Additionally, multiple pipeline stages (duration gate, silence gate, hallucination filter, empty transcription) all silently discard recordings without structured logging, making diagnosis difficult.

Solution

Core fix (server.rs:295-344): After pending_ready resolves successfully with pending_stop == false, call hotkey_rx.try_recv() to check for a buffered stop event that lost the select! race. If a Released (or Pressed toggle-stop) is found, apply the existing deferred stop mechanism (MIN_RECORDING_AFTER_SETUP = 1500ms) instead of treating the recording as live.

Diagnostic logging (server.rs): Added pipeline_id (UUID v4 prefix) to spawn_process_recording and threaded it through process_recording_bg. Every pipeline stage now logs with [pipeline=<id>] stage=<name> format for end-to-end correlation. Drop points explicitly log DROPPED with the gate that triggered.

Receiver visibility (dictation_listener.rs): publish_transcription now returns usize (receiver count) and logs a warning when the broadcast has zero subscribers, making "delivered but nobody listening" scenarios visible.

Submission Checklist

  • Unit tests — existing process_recording_bg tests updated for new pipeline_id parameter; publish_transcription tests still pass with new return type
  • E2E / integration — voice dictation tested manually on macOS debug build; automated E2E for hotkey flows not yet available
  • Doc commentspublish_transcription documented with return value semantics
  • Inline comments — select! race recovery logic documented with explanation of the race condition

Impact

  • Desktop only — voice dictation is desktop-only (cpal + rdev)
  • No breaking API changespublish_transcription return type changed from () to usize but all call sites either ignore or now use the value
  • Performance — negligible; one try_recv() (non-blocking) per recording setup completion
  • Known limitation — an alternating delivery pattern (transcription arrives every other recording) was observed during testing and appears to be a pre-existing issue unrelated to this fix

Related

Summary by CodeRabbit

  • New Features

    • Added hotkey buffering during recording setup for improved stop signal handling.
  • Improvements

    • Transcription delivery now tracks and reports successful message routing count.
    • Enhanced logging with pipeline tracking and detailed stage information for better diagnostics.

oxoxDev and others added 2 commits April 14, 2026 02:01
publish_transcription now returns the number of active TRANSCRIPTION_BUS
subscribers that received the message. When no receivers are connected
(e.g. Socket.IO bridge not yet subscribed), the function logs a warning
instead of silently discarding the broadcast.

This makes it possible to diagnose "transcription produced but never
delivered" scenarios from logs alone.

Closes tinyhumansai#527 (partial)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On warm CPAL init (second+ recording), audio_capture::start_recording()
completes fast enough that the pending_ready branch and the hotkey_rx
branch are both ready simultaneously inside tokio::select!. select!
picks one pseudo-randomly — when it picks pending_ready, the Released
event sits unprocessed in hotkey_rx. The recording is stored as "live"
with no deferred stop, then the next loop iteration processes the
buffered Released and stops the recording almost immediately, producing
a near-zero-length clip that the duration gate silently drops.

Fix: after pending_ready resolves, call hotkey_rx.try_recv() to check
for a buffered stop event that lost the race. If found, apply the
deferred stop mechanism (MIN_RECORDING_AFTER_SETUP = 1500ms) instead
of treating the recording as live.

Also adds pipeline_id (UUID prefix) to all process_recording_bg log
lines for end-to-end correlation, and labels each pipeline stage
(stop_recording, gate_duration, gate_silence, transcribe, deliver)
so dropped recordings are diagnosable from logs.

Closes tinyhumansai#527

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

The changes modify the transcription publishing mechanism to return receiver count alongside added structured pipeline logging, identifier tracking, and buffered hotkey recovery during recording setup completion.

Changes

Cohort / File(s) Summary
Transcription Publishing
src/openhuman/voice/dictation_listener.rs
Modified publish_transcription to return usize (receiver count) instead of (). Added receiver count check, logging with delivery status, and error handling for failed sends.
Recording Pipeline Infrastructure
src/openhuman/voice/server.rs
Introduced per-recording pipeline_id parameter (UUID-based tracking). Added structured stage-based logging across recording setup, processing, and completion. Implemented buffered hotkey recovery during setup completion. Updated process_recording_bg signature to accept pipeline_id and capture transcription receiver count. Updated all test invocations accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

Poem

🐰 A pipeline traced with glowing ID,
Hotkeys buffered, recovery set free,
Transcription counts now returned with care,
Structured stages logged with flair! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main fix: recovering buffered hotkey events after a select! race condition, which is the core issue addressed in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/voice/server.rs (1)

646-656: ⚠️ Potential issue | 🟠 Major

Redact transcription text from pipeline logs.

These logs now emit recognized text on both the normal and drop paths. Even truncated snippets can contain user secrets or PII, and this code runs for every recording. Please log metadata only here.

Suggested change
                     info!(
-                        "{LOG_PREFIX} [pipeline={pipeline_id}] stage=transcription_result text='{}' chars={} elapsed_ms={}",
-                        truncate_for_log(text, 80),
+                        "{LOG_PREFIX} [pipeline={pipeline_id}] stage=transcription_result chars={} elapsed_ms={}",
                         text.len(),
                         transcribe_elapsed.as_millis()
                     );
@@
                         warn!(
-                            "{LOG_PREFIX} [pipeline={pipeline_id}] stage=gate_hallucination DROPPED text='{}'",
-                            truncate_for_log(text, 60)
+                            "{LOG_PREFIX} [pipeline={pipeline_id}] stage=gate_hallucination DROPPED chars={}",
+                            text.len()
                         );

As per coding guidelines "Never log secrets, raw JWTs, API keys, or full PII in debug logs; redact or omit sensitive fields".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/voice/server.rs` around lines 646 - 656, The logs currently
emit recognized transcription text (via truncate_for_log(text, ...)) in both the
normal transcription_result and the gate_hallucination warn paths; remove/raw
text from these logs and replace with metadata-only fields (e.g., pipeline_id,
stage, chars=text.len(), elapsed_ms=..., and a redaction marker like
"<REDACTED>"). Update the LOG_PREFIX/transcription_result logging call and the
warn! in the is_hallucinated_output branch so they do not include or interpolate
the transcription string (use truncate_for_log only for debugging counters if
absolutely required — better: omit it) and keep use of is_hallucinated_output,
pipeline_id, text.len(), and transcribe_elapsed to provide non-sensitive
context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/voice/server.rs`:
- Around line 301-326: The buffered hotkey read via hotkey_rx.try_recv() is
currently consumed but not rebroadcast, so
dictation_listener::publish_dictation_event(...) is never called for buffered
HotkeyEvent::Released/Pressed; before setting pending_stop = true in the
try_recv() match arm, call the same publish path used in the main
hotkey_rx.recv() branch (i.e., forward the buffered HotkeyEvent to
dictation_listener::publish_dictation_event or the helper that publishes
dictation:toggle events), preserving the event variant (HotkeyEvent::Released or
HotkeyEvent::Pressed) so subscribers see the recovered stop/toggle, then mark
pending_stop.

---

Outside diff comments:
In `@src/openhuman/voice/server.rs`:
- Around line 646-656: The logs currently emit recognized transcription text
(via truncate_for_log(text, ...)) in both the normal transcription_result and
the gate_hallucination warn paths; remove/raw text from these logs and replace
with metadata-only fields (e.g., pipeline_id, stage, chars=text.len(),
elapsed_ms=..., and a redaction marker like "<REDACTED>"). Update the
LOG_PREFIX/transcription_result logging call and the warn! in the
is_hallucinated_output branch so they do not include or interpolate the
transcription string (use truncate_for_log only for debugging counters if
absolutely required — better: omit it) and keep use of is_hallucinated_output,
pipeline_id, text.len(), and transcribe_elapsed to provide non-sensitive
context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1e611f2-ec73-4d4a-bf20-eaf3b13ddf5a

📥 Commits

Reviewing files that changed from the base of the PR and between 57a8bed and 0e3aec5.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • src/openhuman/voice/dictation_listener.rs
  • src/openhuman/voice/server.rs

Comment on lines +301 to +326
// Check for a buffered stop event that lost the
// select! race against pending_ready. On warm CPAL
// init both branches may be ready simultaneously;
// select! picks one pseudo-randomly, so a Released
// event can sit unprocessed in hotkey_rx.
let had_pending_stop = pending_stop;
if !pending_stop {
if let Ok(buffered) = hotkey_rx.try_recv() {
match buffered {
HotkeyEvent::Released => {
info!(
"{LOG_PREFIX} recording handle ready — found buffered Released in hotkey_rx (select! race recovered)"
);
pending_stop = true;
}
HotkeyEvent::Pressed => {
// A second Pressed while pending means
// user wants to stop (tap-style). Treat
// the same as a stop intent.
info!(
"{LOG_PREFIX} recording handle ready — found buffered Pressed in hotkey_rx (treating as stop intent)"
);
pending_stop = true;
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rebroadcast the recovered hotkey event before consuming it.

hotkey_rx.try_recv() removes the buffered Released/Pressed, but only the main hotkey_rx.recv() branch publishes dictation_listener::publish_dictation_event(...). In this raced path, dictation:toggle subscribers never see the recovered stop/toggle event, so the frontend can stay out of sync with the actual recording state. Please route buffered through the same publish path before turning it into pending_stop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/voice/server.rs` around lines 301 - 326, The buffered hotkey
read via hotkey_rx.try_recv() is currently consumed but not rebroadcast, so
dictation_listener::publish_dictation_event(...) is never called for buffered
HotkeyEvent::Released/Pressed; before setting pending_stop = true in the
try_recv() match arm, call the same publish path used in the main
hotkey_rx.recv() branch (i.e., forward the buffered HotkeyEvent to
dictation_listener::publish_dictation_event or the helper that publishes
dictation:toggle events), preserving the event variant (HotkeyEvent::Released or
HotkeyEvent::Pressed) so subscribers see the recovered stop/toggle, then mark
pending_stop.

@senamakel senamakel merged commit 08d9fd2 into tinyhumansai:main Apr 13, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants