fix(voice): reduce dictation hallucinations and improve Fn/focus reliability (#385) by oxoxDev · Pull Request #409 · tinyhumansai/openhuman

oxoxDev · 2026-04-07T12:34:01Z

Summary

Fixes #385 voice dictation reliability issues across STT quality, hotkey timing, and insertion target behavior.
Reduces hallucinated output by combining per-segment confidence filtering, upgraded default STT model, and real-time silence gating.
Fixes Fn push-to-talk race conditions by making recording startup non-blocking and buffering release events during setup.
Improves insertion targeting by capturing focus context on press, validating/restoring app focus before paste, and avoiding transcript drops when restore fails.
Aligns frontend voice server RPC typing/handling with actual response shape so Start/Status flows work correctly in settings.

Problem

Voice dictation produced frequent hallucinations and unstable transcripts in silence/noise conditions.
Fn push-to-talk had a timing race: start_recording() could block long enough that Released arrived before recording became active, causing skipped or too-short captures.
Text insertion sometimes went to the wrong place or was dropped when focus validation failed.
Frontend expected wrapped CommandResponse<T> for server status/start/stop while backend returns flat VoiceServerStatus, breaking UI voice server controls.

Solution

STT hardening:
- Added per-segment confidence filtering in whisper_engine.
- Changed default STT model from tiny to base.
- Added audio silence gate with look-ahead buffering.
- Added 15s sliding-window cap for streaming dictation buffer.
Hotkey race fix:
- Moved recording startup off the hotkey event loop (spawn_blocking + pending handle channel).
- Buffered early Released events and applied deferred-stop handling with minimum post-setup capture window.
Focus/insertion fix:
- Captures expected app on hotkey press and propagates through processing.
- Validates/restores focus before paste.
- If restore fails, still attempts paste (non-fatal fallback) to avoid dropping transcript text.
Frontend parity:
- Updated voice.ts, VoicePanel.tsx, and VoicePanel.test.tsx for flat VoiceServerStatus responses.

Submission Checklist

Unit tests — Vitest (app/) and/or cargo test (core) for logic you add or change
E2E / integration — Where behavior is user-visible or crosses UI → Tauri → sidecar → JSON-RPC; use existing harnesses (app/test/e2e, mock backend, tests/json_rpc_e2e.rs as appropriate)
N/A — If truly not applicable, say why (e.g. change is documentation-only)
Doc comments — /// / //! (Rust), JSDoc or brief file/module headers (TS) on public APIs and non-obvious modules
Inline comments — Where logic, invariants, or edge cases aren’t clear from names alone (keep them grep-friendly; avoid restating the code)

(Any feature related checklist can go in here)

cargo check
cargo fmt --check
Targeted core tests:
- cargo test --lib openhuman::voice::text_input::tests
- cargo test --lib openhuman::voice::server::tests
- cargo test --lib openhuman::local_ai::model_ids::tests::stt_tts_and_quantization_defaults_are_applied
Manual local validation during cargo run voice debugging (Fn race + focus/paste paths)
Full cargo test --lib clean run in this environment (unrelated long-running screen_intelligence noise/failures observed)

Impact

Platform/runtime:
- Affects desktop voice flow (Rust core + Tauri frontend settings panel).
Behavioral impact:
- More reliable push-to-talk capture on Fn.
- Lower chance of silence/noise hallucinations.
- Better paste targeting behavior with focus restoration and non-fatal fallback.
Performance/tradeoffs:
- Default STT model upgrade (tiny → base) improves quality at increased inference cost.
- Focus restore may briefly foreground target app before paste.

Issue(s):
- [Bug] Voice-to-text dictation produces hallucinations / incorrect transcriptions #385
Follow-up PR(s)/TODOs:
- Decide explicit fallback UX when paste cannot be validated/successfully targeted (notification/history/clipboard strategy).
- Add/extend integration coverage for UI → voice server start/stop/status and end-to-end dictation targeting.

Summary by CodeRabbit

New Features
- Silence gating to drop sustained silence from recordings.
- Transcription now filters low-quality segments for clearer results.
- macOS: transcribed text insertion validates and can restore app focus.
Bug Fixes
- More reliable hotkey/recording behavior on macOS during setup.
Improvements
- Upgraded default speech-to-text model for better accuracy.
- Settings synchronization and server-status handling made more robust.

…inyhumansai#385) Reject whisper segments with avg token log-probability below -0.7 or entropy above 2.4. Return TranscriptionResult with confidence metadata instead of plain String. Update callers in speech.rs and streaming.rs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…#385) Base model produces significantly fewer hallucinations than tiny, especially in noisy/quiet conditions. User can still override via config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ai#385) Gate sustained silence (>500ms) from being sent to whisper to prevent hallucinations. Maintain 100ms look-ahead ring buffer so speech onset after pauses is not clipped. Thresholds adapt to source sample rate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nyhumansai#385) start_recording() blocks 1-7s on cpal device init but macOS fires Fn Release almost immediately, causing skipped cycles. Move recording start to spawn_blocking so the event loop stays responsive. Buffer Release events during setup and ensure minimum 1.5s recording duration when release arrives before recording handle is ready. Also includes: capture focused app on hotkey press, pass through pipeline for focus validation before paste. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ss (tinyhumansai#385) Add expected_app parameter to insert_text(). Before Cmd+V, validate focus via accessibility API and restore via AppleScript if shifted. Don't abort paste on focus validation failure — attempt insertion regardless so text is never silently lost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tinyhumansai#385)

coderabbitai · 2026-04-07T12:34:24Z

📝 Walkthrough

Walkthrough

Unwrapped voice RPC return shapes in frontend, changed frontend state reads to use refs, updated tests and tauri RPC wrappers; introduced per-segment transcription results and confidence metrics; added RMS-based silence gating and buffering in audio capture; altered hotkey/recording control flow (macOS focus capture and deferred stop); updated default STT model.

Changes

Cohort / File(s)	Summary
Frontend RPC & UI `app/src/utils/tauriCommands/voice.ts`, `app/src/components/settings/panels/VoicePanel.tsx`, `app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`	Removed `CommandResponse<...>` wrapper from voice RPCs so functions return `VoiceServerStatus` directly; VoicePanel now mirrors state into refs and reads from refs when deciding overwrites; tests updated to match new unwrapped shape.
STT Default Model `src/openhuman/config/schema/local_ai.rs`, `src/openhuman/local_ai/model_ids.rs`	Changed default STT artifact from `ggml-tiny-q5_1.bin` to `ggml-base-q5_1.bin` and updated corresponding test and default URL.
Whisper Engine & Transcription Result `src/openhuman/local_ai/service/whisper_engine.rs`, `src/openhuman/local_ai/service/speech.rs`	Added `TranscriptionResult` (text, avg_logprob, segments_accepted/total); transcription functions now return structured results and propagate/log avg_logprob and accepted-segment counts; speech path updated to handle new result type.
Audio Capture & Silence Gate `src/openhuman/voice/audio_capture.rs`	Added RMS-based `SilenceGate` with lookahead and thresholds; CPAL input handlers call gate.process and drop suppressed silence; adjusted config selection logic (deref best).
Hotkey Loop & Recording Flow (macOS focus handling) `src/openhuman/voice/server.rs`	Decoupled recording setup onto blocking thread; introduced pending-stop buffering and deferred-stop deadline after setup; capture expected app name on hotkey press and thread it into recording processing; explicit idle transitions on setup failure.
Streaming Partial & Final Inference `src/openhuman/voice/streaming.rs`	Replaced length-delta gating with atomic revision counter for partial inference; added 15s sliding window buffer trimming, skip partial until MIN_PARTIAL_SAMPLES, and consume `TranscriptionResult` (using avg_logprob and text).
Text Insertion & macOS Focus Restore `src/openhuman/voice/text_input.rs`	`insert_text` signature now accepts `expected_app: Option<&str>`; on macOS validate/restore focused app via AppleScript before pasting; added focus-restore helpers and tests updated to pass `None`.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Hotkey as Hotkey Loop
    participant Capture as Audio Capture
    participant Gate as Silence Gate
    participant Stream as Streaming Engine
    participant Whisper as Whisper Engine
    participant Server as Voice Server
    participant TextInput as Text Input

    User->>Hotkey: Press hotkey
    Hotkey->>Hotkey: capture_expected_app_name (macOS)
    Hotkey->>Server: request start recording (spawn blocking task)
    Server->>Capture: start_recording (blocking thread)
    Capture->>Capture: CPAL collects chunk
    Capture->>Gate: process(mono_chunk)
    Gate-->>Capture: gated_chunk (or drop)
    Capture->>Stream: append audio to buffers (audio_buf + full_audio_buf)
    Stream->>Whisper: trigger partial transcription (on revision change)
    Whisper-->>Stream: TranscriptionResult { text, avg_logprob, segments_* }
    Stream->>Stream: log avg_logprob, maybe show partial
    User->>Hotkey: Release hotkey
    Hotkey->>Server: signal stop (may be buffered if setup pending)
    Server->>Stream: finalize using full_audio_buf
    Stream->>Whisper: final transcription
    Whisper-->>Stream: final TranscriptionResult
    Stream->>TextInput: insert_text(result.text, expected_app)
    TextInput->>TextInput: validate/restore app focus (macOS) then paste
    TextInput-->>User: Text inserted

sequenceDiagram
    participant CPAL as CPAL Input
    participant Gate as SilenceGate
    participant Lookahead as Lookahead Buffer
    participant Buffer as Recording Buffer

    loop each audio chunk
        CPAL->>Gate: process(chunk)
        Gate->>Gate: compute chunk_rms()
        alt above threshold or within lookahead
            Gate->>Lookahead: preserve chunk
            alt lookahead full
                Lookahead->>Buffer: flush preserved audio
            end
            Gate-->>CPAL: output non-empty
        else sustained silence
            Gate->>Gate: increment silence duration
            alt silence > SILENCE_GATE_MS
                Gate-->>CPAL: drop chunk (suppress)
            else
                Gate-->>CPAL: output from lookahead
            end
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

feat(voice): standalone voice dictation server with hotkey support #368: Related changes to voice RPC return shapes and frontend adaptations; closely matches the tauri/frontend unwrapping work in this PR.
feat(voice): dedicated voice assistance module for STT/TTS #178: Prior work on voice/STT/TTS pipeline and RPCs that this PR extends (transcription integration and streaming changes).

Suggested reviewers

graycyrus
senamakel

Poem

🐰 Soft thumps on keys and quiet queues,

I gate the silence, keep the news.
Segments counted, confidence shown,
From tiny to base the model's grown.
A rabbit's cheer for clearer tones! 🎙️🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	❓ Inconclusive	The PR implements key improvements for `#385` including per-segment confidence filtering, upgraded STT model, silence gating, hotkey timing fixes, and app focus validation. However, some acceptance criteria remain incomplete: no documented recommended settings for noisy environments, regression tests with golden audio fixtures are mentioned but limited, and low-confidence UX is not fully surfaced in the frontend.	Add documented recommended settings for noisy environments and implement visible low-confidence feedback in the UI to fully satisfy `#385`'s acceptance criteria.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately reflects the main objectives: reducing STT hallucinations and improving hotkey/focus reliability are the primary focuses across the changeset.
Out of Scope Changes check	✅ Passed	All changes directly support the linked issue `#385` objectives: STT model/filtering improvements, silence gating, hotkey race fixes, focus validation, and frontend/backend parity are all within scope.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/src/components/settings/panels/VoicePanel.tsx (1)
38-57: ⚠️ Potential issue | 🟠 Major

The polling guard still uses mount-time state.

loadData() closes over settings and savedSettings, but the interval is created once in useEffect([]). After the first render, the timer keeps calling the stale closure, so the guard on Lines 49-53 can still overwrite unsaved edits every 2 seconds.

Also applies to: 69-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 38 - 57,
loadData currently closes over mount-time settings and savedSettings so the
polling interval created in useEffect([]) uses stale values and can clobber
edits; update loadData to read the latest values via refs or by moving the
interval creation so it reads current state: create React refs (e.g.,
settingsRef and savedSettingsRef), update them whenever
setSettings/setSavedSettings run, and in loadData (used by the 2s timer) check
settingsRef.current and savedSettingsRef.current instead of the closed-over
settings/savedSettings; alternatively, define the interval inside a useEffect
that depends on settings/savedSettings so the closure is fresh — ensure
references to loadData, setSettings, setSavedSettings, settings, and
savedSettings are updated accordingly.

🧹 Nitpick comments (1)

app/src/utils/tauriCommands/voice.ts (1)

63-82: Prefer arrow exports for these updated RPC helpers.

The return-type change looks fine, but the touched helpers still use function declarations. Converting them to const ... = async () => keeps this module aligned with the repo’s TypeScript style.

♻️ Suggested refactor

-export async function openhumanVoiceServerStatus(): Promise<VoiceServerStatus> {
+export const openhumanVoiceServerStatus = async (): Promise<VoiceServerStatus> => {
   return await callCoreRpc<VoiceServerStatus>({
     method: 'openhuman.voice_server_status',
     params: {},
   });
-}
+};

-export async function openhumanVoiceServerStart(params?: {
+export const openhumanVoiceServerStart = async (params?: {
   hotkey?: string;
   activation_mode?: 'tap' | 'push';
   skip_cleanup?: boolean;
-}): Promise<VoiceServerStatus> {
+}): Promise<VoiceServerStatus> => {
   return await callCoreRpc<VoiceServerStatus>({
     method: 'openhuman.voice_server_start',
     params: params ?? {},
   });
-}
+};

-export async function openhumanVoiceServerStop(): Promise<VoiceServerStatus> {
+export const openhumanVoiceServerStop = async (): Promise<VoiceServerStatus> => {
   return await callCoreRpc<VoiceServerStatus>({
     method: 'openhuman.voice_server_stop',
     params: {},
   });
-}
+};

As per coding guidelines, **/*.{js,jsx,ts,tsx}: Prefer arrow functions over function declarations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/utils/tauriCommands/voice.ts` around lines 63 - 82, Convert the three
exported RPC helper functions to arrow-function exports: change the function
declarations openhumanVoiceServerStatus, openhumanVoiceServerStart, and
openhumanVoiceServerStop into const <name> = async () => style (exported),
preserving their signatures, return types (Promise<VoiceServerStatus>), and the
existing callCoreRpc calls/params; ensure exports remain named and behavior is
unchanged so the module follows the repo's TypeScript arrow-function style.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/local_ai/model_ids.rs`:
- Around line 49-52: The effective_stt_model_id() fallback was changed to
"ggml-base-q5_1.bin" but the schema defaults for stt_model_id and
stt_download_url still set "ggml-tiny-q5_1.bin", causing new/empty configs to be
inconsistent; update the defaults in the local_ai schema (the stt_model_id and
corresponding stt_download_url default values) to match the new fallback
("ggml-base-q5_1.bin") so downloader config and effective_stt_model_id() are
consistent for fresh and empty-ID configs.

In `@src/openhuman/voice/audio_capture.rs`:
- Around line 21-22: The gate currently uses a hardcoded SILENCE_RMS_THRESHOLD
(const) while the UI exposes a configurable silence_threshold; update the code
to accept and pass that configured value into the live gate: add a threshold:
f32 parameter to SilenceGate::new (and any SilenceGate constructors), remove or
keep the const as a default only, and thread that parameter through
start_recording (and any call sites that construct SilenceGate) so the saved
silence_threshold from the UI is passed when creating the SilenceGate instance
instead of comparing against 0.002.

In `@src/openhuman/voice/server.rs`:
- Around line 253-273: The sleep that enforces MIN_RECORDING_AFTER_SETUP must be
removed from the main hotkey select! loop to avoid blocking cancellation and
hotkey handling; instead, when pending_release is observed set recording and
recording_expected_app as you do, then spawn a detached async task (or create a
deadline branch) that awaits tokio::time::sleep(MIN_RECORDING_AFTER_SETUP) and
after the delay calls self.spawn_process_recording(handle, app_config,
expected_app) with the moved handle/expected_app (or fails fast if cancelled via
a shared cancellation channel/flag). Apply the same pattern for the similar
block around lines 297-300 so both deferred-stop timers run off-thread and do
not block the hotkey/select! loop.
- Around line 192-205: The hotkey handler currently drops stop/toggle presses
when recording_pending_rx.is_some(), losing real stop requests; change it to
record the stop intent and apply it once the pending start completes: introduce
a small flag or channel (e.g., recording_stop_pending boolean or
recording_stop_tx) that you set when a hotkey arrives while
recording_pending_rx.is_some(), and modify the start_recording flow (where the
new Recording handle and recording_expected_app are installed) to check that
flag and immediately call spawn_process_recording(handle, app_config,
recording_expected_app.take()) if set; do the same fix for the analogous block
at lines 235-239 so pending stop presses are honored.

In `@src/openhuman/voice/streaming.rs`:
- Around line 132-143: The current sliding-window logic trims audio_buf
(protected by audio_buf.lock()) to MAX_STREAM_BUFFER_SAMPLES which is then later
used to produce the final "final" transcript, causing the start of long
dictations to be lost; fix it by keeping two separate buffers: keep the existing
capped buffer (audio_buf or sliding_buffer) for interim/partial results and add
a persistent full_audio buffer (e.g., full_audio_vec protected by its own mutex
or appended before any drain) that accumulates all incoming samples without
trimming; update producers/consumers so interim logic uses
audio_buf/sliding_buffer and the final-result generation (the code that
currently reads audio_buf to produce "final") reads from full_audio_vec and
audio_revision handling remains consistent.

---

Outside diff comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 38-57: loadData currently closes over mount-time settings and
savedSettings so the polling interval created in useEffect([]) uses stale values
and can clobber edits; update loadData to read the latest values via refs or by
moving the interval creation so it reads current state: create React refs (e.g.,
settingsRef and savedSettingsRef), update them whenever
setSettings/setSavedSettings run, and in loadData (used by the 2s timer) check
settingsRef.current and savedSettingsRef.current instead of the closed-over
settings/savedSettings; alternatively, define the interval inside a useEffect
that depends on settings/savedSettings so the closure is fresh — ensure
references to loadData, setSettings, setSavedSettings, settings, and
savedSettings are updated accordingly.

---

Nitpick comments:
In `@app/src/utils/tauriCommands/voice.ts`:
- Around line 63-82: Convert the three exported RPC helper functions to
arrow-function exports: change the function declarations
openhumanVoiceServerStatus, openhumanVoiceServerStart, and
openhumanVoiceServerStop into const <name> = async () => style (exported),
preserving their signatures, return types (Promise<VoiceServerStatus>), and the
existing callCoreRpc calls/params; ensure exports remain named and behavior is
unchanged so the module follows the repo's TypeScript arrow-function style.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4d6b247f-ab83-4284-aa7e-9cca9d6a4314

📥 Commits

Reviewing files that changed from the base of the PR and between db7eeee and 6c881b6.

⛔ Files ignored due to path filters (1)

app/src-tauri/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

app/src/components/settings/panels/VoicePanel.tsx
app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
app/src/utils/tauriCommands/voice.ts
src/openhuman/local_ai/model_ids.rs
src/openhuman/local_ai/service/speech.rs
src/openhuman/local_ai/service/whisper_engine.rs
src/openhuman/voice/audio_capture.rs
src/openhuman/voice/server.rs
src/openhuman/voice/streaming.rs
src/openhuman/voice/text_input.rs

coderabbitai · 2026-04-07T12:48:44Z

+/// RMS threshold below which audio is considered silence.
+const SILENCE_RMS_THRESHOLD: f32 = 0.002;


⚠️ Potential issue | 🟠 Major

Wire the configured silence threshold into the live gate.

The new gate compares against a fixed 0.002 on Lines 21-22/65-66, but SilenceGate::new() and start_recording() take no threshold input. The UI still exposes a configurable silence threshold on Lines 300-318 of app/src/components/settings/panels/VoicePanel.tsx, so quieter users can lower the setting and still have speech discarded here before Whisper ever sees it.

💡 Minimal shape for making the gate configurable

-const SILENCE_RMS_THRESHOLD: f32 = 0.002; - struct SilenceGate { + threshold: f32, source_sample_rate: u32, @@ - fn new(source_sample_rate: u32) -> Self { + fn new(source_sample_rate: u32, threshold: f32) -> Self { let gate_samples = ((source_sample_rate as usize * SILENCE_GATE_MS) / 1000).max(1); let lookahead_samples = ((source_sample_rate as usize * LOOKAHEAD_MS) / 1000).max(1); Self { + threshold, source_sample_rate, @@ - let is_silent = rms < SILENCE_RMS_THRESHOLD; + let is_silent = rms < self.threshold;

Then pass the saved silence_threshold down when constructing SilenceGate.

Also applies to: 48-66, 223-229

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/voice/audio_capture.rs` around lines 21 - 22, The gate currently uses a hardcoded SILENCE_RMS_THRESHOLD (const) while the UI exposes a configurable silence_threshold; update the code to accept and pass that configured value into the live gate: add a threshold: f32 parameter to SilenceGate::new (and any SilenceGate constructors), remove or keep the const as a default only, and thread that parameter through start_recording (and any call sites that construct SilenceGate) so the saved silence_threshold from the UI is passed when creating the SilenceGate instance instead of comparing against 0.002.

…settings polling

coderabbitai

🧹 Nitpick comments (1)

src/openhuman/voice/server.rs (1)
460-469: Consider extracting process_recording_bg arguments into a struct.

The #[allow(clippy::too_many_arguments)] suppress is acceptable for now, but bundling state, transcription_count, last_error, recent_transcripts, and expected_app into a context struct would improve maintainability if more parameters are added later.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/voice/server.rs` around lines 460 - 469, Extract the clustered
arguments into a context struct (e.g., RecordingContext or
ProcessRecordingContext) and replace the multiple parameters in async fn
process_recording_bg(...) with a single context parameter; include fields state:
Arc<Mutex<ServerState>>, transcription_count: Arc<AtomicU64>, last_error:
Arc<Mutex<Option<String>>>, recent_transcripts: Arc<Mutex<Vec<String>>>, and
expected_app: Option<String> in that struct, update all call sites that invoke
process_recording_bg to construct and pass the new context instance, and adjust
any code inside process_recording_bg that referenced the old parameter names to
use the struct field access instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openhuman/voice/server.rs`:
- Around line 460-469: Extract the clustered arguments into a context struct
(e.g., RecordingContext or ProcessRecordingContext) and replace the multiple
parameters in async fn process_recording_bg(...) with a single context
parameter; include fields state: Arc<Mutex<ServerState>>, transcription_count:
Arc<AtomicU64>, last_error: Arc<Mutex<Option<String>>>, recent_transcripts:
Arc<Mutex<Vec<String>>>, and expected_app: Option<String> in that struct, update
all call sites that invoke process_recording_bg to construct and pass the new
context instance, and adjust any code inside process_recording_bg that
referenced the old parameter names to use the struct field access instead.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2231ca11-e488-48eb-b260-17973bde5683

📥 Commits

Reviewing files that changed from the base of the PR and between 6c881b6 and 652ff26.

⛔ Files ignored due to path filters (1)

app/src-tauri/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

app/src/components/settings/panels/VoicePanel.tsx
src/openhuman/config/schema/local_ai.rs
src/openhuman/voice/server.rs
src/openhuman/voice/streaming.rs

✅ Files skipped from review due to trivial changes (1)

src/openhuman/config/schema/local_ai.rs

oxoxDev and others added 7 commits April 7, 2026 17:47

fix(voice): upgrade default STT model from tiny to base (tinyhumansai…

1e0b66a

…#385) Base model produces significantly fewer hallucinations than tiny, especially in noisy/quiet conditions. User can still override via config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(voice-ui): align voice server RPC response shape in settings panel (

f481d7a

tinyhumansai#385)

style(voice): apply rustfmt formatting in text_input

6c881b6

merge(upstream): resolve PR tinyhumansai#409 conflict with upstream/main

eefc5dc

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

fix(voice): address CodeRabbit regressions in server, streaming, and …

652ff26

…settings polling

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

senamakel merged commit 91996a1 into tinyhumansai:main Apr 7, 2026
8 of 9 checks passed

This was referenced Apr 9, 2026

fix(voice): resolve dictation pipeline in embedded Tauri app #466

Merged

fix(voice): cross-platform microphone permission handling (#489) #491

Merged

Fix stale voice server state between dictation sessions #517

Merged

This was referenced Apr 13, 2026

[Bug] Voice-to-text dictation produces hallucinations / incorrect transcriptions #385

Closed

fix(voice): recover buffered hotkey events after select! race (#527) #545

Merged

This was referenced Apr 14, 2026

fix(voice): add hallucination filter to chat voice path (#553) #556

Merged

fix(voice): enable GPU detection and Metal acceleration for whisper (#558) #571

Merged

fix(overlay): fullscreen visibility, voice server reliability, and resize (#528) #585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice): reduce dictation hallucinations and improve Fn/focus reliability (#385)#409

fix(voice): reduce dictation hallucinations and improve Fn/focus reliability (#385)#409
senamakel merged 9 commits intotinyhumansai:mainfrom
oxoxDev:fix/385-voice-hallucinations

oxoxDev commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/// RMS threshold below which audio is considered silence.
		const SILENCE_RMS_THRESHOLD: f32 = 0.002;

Conversation

oxoxDev commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oxoxDev commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading