Skip to content

fix(voice): reduce dictation hallucinations and improve Fn/focus reliability (#385)#409

Merged
senamakel merged 9 commits intotinyhumansai:mainfrom
oxoxDev:fix/385-voice-hallucinations
Apr 7, 2026
Merged

fix(voice): reduce dictation hallucinations and improve Fn/focus reliability (#385)#409
senamakel merged 9 commits intotinyhumansai:mainfrom
oxoxDev:fix/385-voice-hallucinations

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Apr 7, 2026

Summary

  • Fixes #385 voice dictation reliability issues across STT quality, hotkey timing, and insertion target behavior.
  • Reduces hallucinated output by combining per-segment confidence filtering, upgraded default STT model, and real-time silence gating.
  • Fixes Fn push-to-talk race conditions by making recording startup non-blocking and buffering release events during setup.
  • Improves insertion targeting by capturing focus context on press, validating/restoring app focus before paste, and avoiding transcript drops when restore fails.
  • Aligns frontend voice server RPC typing/handling with actual response shape so Start/Status flows work correctly in settings.

Problem

  • Voice dictation produced frequent hallucinations and unstable transcripts in silence/noise conditions.
  • Fn push-to-talk had a timing race: start_recording() could block long enough that Released arrived before recording became active, causing skipped or too-short captures.
  • Text insertion sometimes went to the wrong place or was dropped when focus validation failed.
  • Frontend expected wrapped CommandResponse<T> for server status/start/stop while backend returns flat VoiceServerStatus, breaking UI voice server controls.

Solution

  • STT hardening:
    • Added per-segment confidence filtering in whisper_engine.
    • Changed default STT model from tiny to base.
    • Added audio silence gate with look-ahead buffering.
    • Added 15s sliding-window cap for streaming dictation buffer.
  • Hotkey race fix:
    • Moved recording startup off the hotkey event loop (spawn_blocking + pending handle channel).
    • Buffered early Released events and applied deferred-stop handling with minimum post-setup capture window.
  • Focus/insertion fix:
    • Captures expected app on hotkey press and propagates through processing.
    • Validates/restores focus before paste.
    • If restore fails, still attempts paste (non-fatal fallback) to avoid dropping transcript text.
  • Frontend parity:
    • Updated voice.ts, VoicePanel.tsx, and VoicePanel.test.tsx for flat VoiceServerStatus responses.

Submission Checklist

  • Unit tests — Vitest (app/) and/or cargo test (core) for logic you add or change
  • E2E / integration — Where behavior is user-visible or crosses UI → Tauri → sidecar → JSON-RPC; use existing harnesses (app/test/e2e, mock backend, tests/json_rpc_e2e.rs as appropriate)
  • N/A — If truly not applicable, say why (e.g. change is documentation-only)
  • Doc comments/// / //! (Rust), JSDoc or brief file/module headers (TS) on public APIs and non-obvious modules
  • Inline comments — Where logic, invariants, or edge cases aren’t clear from names alone (keep them grep-friendly; avoid restating the code)

(Any feature related checklist can go in here)

  • cargo check
  • cargo fmt --check
  • Targeted core tests:
    • cargo test --lib openhuman::voice::text_input::tests
    • cargo test --lib openhuman::voice::server::tests
    • cargo test --lib openhuman::local_ai::model_ids::tests::stt_tts_and_quantization_defaults_are_applied
  • Manual local validation during cargo run voice debugging (Fn race + focus/paste paths)
  • Full cargo test --lib clean run in this environment (unrelated long-running screen_intelligence noise/failures observed)

Impact

  • Platform/runtime:
    • Affects desktop voice flow (Rust core + Tauri frontend settings panel).
  • Behavioral impact:
    • More reliable push-to-talk capture on Fn.
    • Lower chance of silence/noise hallucinations.
    • Better paste targeting behavior with focus restoration and non-fatal fallback.
  • Performance/tradeoffs:
    • Default STT model upgrade (tiny → base) improves quality at increased inference cost.
    • Focus restore may briefly foreground target app before paste.

Related

Summary by CodeRabbit

  • New Features

    • Silence gating to drop sustained silence from recordings.
    • Transcription now filters low-quality segments for clearer results.
    • macOS: transcribed text insertion validates and can restore app focus.
  • Bug Fixes

    • More reliable hotkey/recording behavior on macOS during setup.
  • Improvements

    • Upgraded default speech-to-text model for better accuracy.
    • Settings synchronization and server-status handling made more robust.

oxoxDev and others added 7 commits April 7, 2026 17:47
…inyhumansai#385)

Reject whisper segments with avg token log-probability below -0.7 or
entropy above 2.4. Return TranscriptionResult with confidence metadata
instead of plain String. Update callers in speech.rs and streaming.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#385)

Base model produces significantly fewer hallucinations than tiny,
especially in noisy/quiet conditions. User can still override via config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ai#385)

Gate sustained silence (>500ms) from being sent to whisper to prevent
hallucinations. Maintain 100ms look-ahead ring buffer so speech onset
after pauses is not clipped. Thresholds adapt to source sample rate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nyhumansai#385)

start_recording() blocks 1-7s on cpal device init but macOS fires Fn
Release almost immediately, causing skipped cycles. Move recording
start to spawn_blocking so the event loop stays responsive. Buffer
Release events during setup and ensure minimum 1.5s recording duration
when release arrives before recording handle is ready.

Also includes: capture focused app on hotkey press, pass through
pipeline for focus validation before paste.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ss (tinyhumansai#385)

Add expected_app parameter to insert_text(). Before Cmd+V, validate
focus via accessibility API and restore via AppleScript if shifted.
Don't abort paste on focus validation failure — attempt insertion
regardless so text is never silently lost.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

Unwrapped voice RPC return shapes in frontend, changed frontend state reads to use refs, updated tests and tauri RPC wrappers; introduced per-segment transcription results and confidence metrics; added RMS-based silence gating and buffering in audio capture; altered hotkey/recording control flow (macOS focus capture and deferred stop); updated default STT model.

Changes

Cohort / File(s) Summary
Frontend RPC & UI
app/src/utils/tauriCommands/voice.ts, app/src/components/settings/panels/VoicePanel.tsx, app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
Removed CommandResponse<...> wrapper from voice RPCs so functions return VoiceServerStatus directly; VoicePanel now mirrors state into refs and reads from refs when deciding overwrites; tests updated to match new unwrapped shape.
STT Default Model
src/openhuman/config/schema/local_ai.rs, src/openhuman/local_ai/model_ids.rs
Changed default STT artifact from ggml-tiny-q5_1.bin to ggml-base-q5_1.bin and updated corresponding test and default URL.
Whisper Engine & Transcription Result
src/openhuman/local_ai/service/whisper_engine.rs, src/openhuman/local_ai/service/speech.rs
Added TranscriptionResult (text, avg_logprob, segments_accepted/total); transcription functions now return structured results and propagate/log avg_logprob and accepted-segment counts; speech path updated to handle new result type.
Audio Capture & Silence Gate
src/openhuman/voice/audio_capture.rs
Added RMS-based SilenceGate with lookahead and thresholds; CPAL input handlers call gate.process and drop suppressed silence; adjusted config selection logic (deref best).
Hotkey Loop & Recording Flow (macOS focus handling)
src/openhuman/voice/server.rs
Decoupled recording setup onto blocking thread; introduced pending-stop buffering and deferred-stop deadline after setup; capture expected app name on hotkey press and thread it into recording processing; explicit idle transitions on setup failure.
Streaming Partial & Final Inference
src/openhuman/voice/streaming.rs
Replaced length-delta gating with atomic revision counter for partial inference; added 15s sliding window buffer trimming, skip partial until MIN_PARTIAL_SAMPLES, and consume TranscriptionResult (using avg_logprob and text).
Text Insertion & macOS Focus Restore
src/openhuman/voice/text_input.rs
insert_text signature now accepts expected_app: Option<&str>; on macOS validate/restore focused app via AppleScript before pasting; added focus-restore helpers and tests updated to pass None.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Hotkey as Hotkey Loop
    participant Capture as Audio Capture
    participant Gate as Silence Gate
    participant Stream as Streaming Engine
    participant Whisper as Whisper Engine
    participant Server as Voice Server
    participant TextInput as Text Input

    User->>Hotkey: Press hotkey
    Hotkey->>Hotkey: capture_expected_app_name (macOS)
    Hotkey->>Server: request start recording (spawn blocking task)
    Server->>Capture: start_recording (blocking thread)
    Capture->>Capture: CPAL collects chunk
    Capture->>Gate: process(mono_chunk)
    Gate-->>Capture: gated_chunk (or drop)
    Capture->>Stream: append audio to buffers (audio_buf + full_audio_buf)
    Stream->>Whisper: trigger partial transcription (on revision change)
    Whisper-->>Stream: TranscriptionResult { text, avg_logprob, segments_* }
    Stream->>Stream: log avg_logprob, maybe show partial
    User->>Hotkey: Release hotkey
    Hotkey->>Server: signal stop (may be buffered if setup pending)
    Server->>Stream: finalize using full_audio_buf
    Stream->>Whisper: final transcription
    Whisper-->>Stream: final TranscriptionResult
    Stream->>TextInput: insert_text(result.text, expected_app)
    TextInput->>TextInput: validate/restore app focus (macOS) then paste
    TextInput-->>User: Text inserted
Loading
sequenceDiagram
    participant CPAL as CPAL Input
    participant Gate as SilenceGate
    participant Lookahead as Lookahead Buffer
    participant Buffer as Recording Buffer

    loop each audio chunk
        CPAL->>Gate: process(chunk)
        Gate->>Gate: compute chunk_rms()
        alt above threshold or within lookahead
            Gate->>Lookahead: preserve chunk
            alt lookahead full
                Lookahead->>Buffer: flush preserved audio
            end
            Gate-->>CPAL: output non-empty
        else sustained silence
            Gate->>Gate: increment silence duration
            alt silence > SILENCE_GATE_MS
                Gate-->>CPAL: drop chunk (suppress)
            else
                Gate-->>CPAL: output from lookahead
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • graycyrus
  • senamakel

Poem

🐰 Soft thumps on keys and quiet queues,

I gate the silence, keep the news.
Segments counted, confidence shown,
From tiny to base the model's grown.
A rabbit's cheer for clearer tones! 🎙️🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ❓ Inconclusive The PR implements key improvements for #385 including per-segment confidence filtering, upgraded STT model, silence gating, hotkey timing fixes, and app focus validation. However, some acceptance criteria remain incomplete: no documented recommended settings for noisy environments, regression tests with golden audio fixtures are mentioned but limited, and low-confidence UX is not fully surfaced in the frontend. Add documented recommended settings for noisy environments and implement visible low-confidence feedback in the UI to fully satisfy #385's acceptance criteria.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main objectives: reducing STT hallucinations and improving hotkey/focus reliability are the primary focuses across the changeset.
Out of Scope Changes check ✅ Passed All changes directly support the linked issue #385 objectives: STT model/filtering improvements, silence gating, hotkey race fixes, focus validation, and frontend/backend parity are all within scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/src/components/settings/panels/VoicePanel.tsx (1)

38-57: ⚠️ Potential issue | 🟠 Major

The polling guard still uses mount-time state.

loadData() closes over settings and savedSettings, but the interval is created once in useEffect([]). After the first render, the timer keeps calling the stale closure, so the guard on Lines 49-53 can still overwrite unsaved edits every 2 seconds.

Also applies to: 69-75

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 38 - 57,
loadData currently closes over mount-time settings and savedSettings so the
polling interval created in useEffect([]) uses stale values and can clobber
edits; update loadData to read the latest values via refs or by moving the
interval creation so it reads current state: create React refs (e.g.,
settingsRef and savedSettingsRef), update them whenever
setSettings/setSavedSettings run, and in loadData (used by the 2s timer) check
settingsRef.current and savedSettingsRef.current instead of the closed-over
settings/savedSettings; alternatively, define the interval inside a useEffect
that depends on settings/savedSettings so the closure is fresh — ensure
references to loadData, setSettings, setSavedSettings, settings, and
savedSettings are updated accordingly.
🧹 Nitpick comments (1)
app/src/utils/tauriCommands/voice.ts (1)

63-82: Prefer arrow exports for these updated RPC helpers.

The return-type change looks fine, but the touched helpers still use function declarations. Converting them to const ... = async () => keeps this module aligned with the repo’s TypeScript style.

♻️ Suggested refactor
-export async function openhumanVoiceServerStatus(): Promise<VoiceServerStatus> {
+export const openhumanVoiceServerStatus = async (): Promise<VoiceServerStatus> => {
   return await callCoreRpc<VoiceServerStatus>({
     method: 'openhuman.voice_server_status',
     params: {},
   });
-}
+};

-export async function openhumanVoiceServerStart(params?: {
+export const openhumanVoiceServerStart = async (params?: {
   hotkey?: string;
   activation_mode?: 'tap' | 'push';
   skip_cleanup?: boolean;
-}): Promise<VoiceServerStatus> {
+}): Promise<VoiceServerStatus> => {
   return await callCoreRpc<VoiceServerStatus>({
     method: 'openhuman.voice_server_start',
     params: params ?? {},
   });
-}
+};

-export async function openhumanVoiceServerStop(): Promise<VoiceServerStatus> {
+export const openhumanVoiceServerStop = async (): Promise<VoiceServerStatus> => {
   return await callCoreRpc<VoiceServerStatus>({
     method: 'openhuman.voice_server_stop',
     params: {},
   });
-}
+};

As per coding guidelines, **/*.{js,jsx,ts,tsx}: Prefer arrow functions over function declarations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/utils/tauriCommands/voice.ts` around lines 63 - 82, Convert the three
exported RPC helper functions to arrow-function exports: change the function
declarations openhumanVoiceServerStatus, openhumanVoiceServerStart, and
openhumanVoiceServerStop into const <name> = async () => style (exported),
preserving their signatures, return types (Promise<VoiceServerStatus>), and the
existing callCoreRpc calls/params; ensure exports remain named and behavior is
unchanged so the module follows the repo's TypeScript arrow-function style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/local_ai/model_ids.rs`:
- Around line 49-52: The effective_stt_model_id() fallback was changed to
"ggml-base-q5_1.bin" but the schema defaults for stt_model_id and
stt_download_url still set "ggml-tiny-q5_1.bin", causing new/empty configs to be
inconsistent; update the defaults in the local_ai schema (the stt_model_id and
corresponding stt_download_url default values) to match the new fallback
("ggml-base-q5_1.bin") so downloader config and effective_stt_model_id() are
consistent for fresh and empty-ID configs.

In `@src/openhuman/voice/audio_capture.rs`:
- Around line 21-22: The gate currently uses a hardcoded SILENCE_RMS_THRESHOLD
(const) while the UI exposes a configurable silence_threshold; update the code
to accept and pass that configured value into the live gate: add a threshold:
f32 parameter to SilenceGate::new (and any SilenceGate constructors), remove or
keep the const as a default only, and thread that parameter through
start_recording (and any call sites that construct SilenceGate) so the saved
silence_threshold from the UI is passed when creating the SilenceGate instance
instead of comparing against 0.002.

In `@src/openhuman/voice/server.rs`:
- Around line 253-273: The sleep that enforces MIN_RECORDING_AFTER_SETUP must be
removed from the main hotkey select! loop to avoid blocking cancellation and
hotkey handling; instead, when pending_release is observed set recording and
recording_expected_app as you do, then spawn a detached async task (or create a
deadline branch) that awaits tokio::time::sleep(MIN_RECORDING_AFTER_SETUP) and
after the delay calls self.spawn_process_recording(handle, app_config,
expected_app) with the moved handle/expected_app (or fails fast if cancelled via
a shared cancellation channel/flag). Apply the same pattern for the similar
block around lines 297-300 so both deferred-stop timers run off-thread and do
not block the hotkey/select! loop.
- Around line 192-205: The hotkey handler currently drops stop/toggle presses
when recording_pending_rx.is_some(), losing real stop requests; change it to
record the stop intent and apply it once the pending start completes: introduce
a small flag or channel (e.g., recording_stop_pending boolean or
recording_stop_tx) that you set when a hotkey arrives while
recording_pending_rx.is_some(), and modify the start_recording flow (where the
new Recording handle and recording_expected_app are installed) to check that
flag and immediately call spawn_process_recording(handle, app_config,
recording_expected_app.take()) if set; do the same fix for the analogous block
at lines 235-239 so pending stop presses are honored.

In `@src/openhuman/voice/streaming.rs`:
- Around line 132-143: The current sliding-window logic trims audio_buf
(protected by audio_buf.lock()) to MAX_STREAM_BUFFER_SAMPLES which is then later
used to produce the final "final" transcript, causing the start of long
dictations to be lost; fix it by keeping two separate buffers: keep the existing
capped buffer (audio_buf or sliding_buffer) for interim/partial results and add
a persistent full_audio buffer (e.g., full_audio_vec protected by its own mutex
or appended before any drain) that accumulates all incoming samples without
trimming; update producers/consumers so interim logic uses
audio_buf/sliding_buffer and the final-result generation (the code that
currently reads audio_buf to produce "final") reads from full_audio_vec and
audio_revision handling remains consistent.

---

Outside diff comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 38-57: loadData currently closes over mount-time settings and
savedSettings so the polling interval created in useEffect([]) uses stale values
and can clobber edits; update loadData to read the latest values via refs or by
moving the interval creation so it reads current state: create React refs (e.g.,
settingsRef and savedSettingsRef), update them whenever
setSettings/setSavedSettings run, and in loadData (used by the 2s timer) check
settingsRef.current and savedSettingsRef.current instead of the closed-over
settings/savedSettings; alternatively, define the interval inside a useEffect
that depends on settings/savedSettings so the closure is fresh — ensure
references to loadData, setSettings, setSavedSettings, settings, and
savedSettings are updated accordingly.

---

Nitpick comments:
In `@app/src/utils/tauriCommands/voice.ts`:
- Around line 63-82: Convert the three exported RPC helper functions to
arrow-function exports: change the function declarations
openhumanVoiceServerStatus, openhumanVoiceServerStart, and
openhumanVoiceServerStop into const <name> = async () => style (exported),
preserving their signatures, return types (Promise<VoiceServerStatus>), and the
existing callCoreRpc calls/params; ensure exports remain named and behavior is
unchanged so the module follows the repo's TypeScript arrow-function style.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4d6b247f-ab83-4284-aa7e-9cca9d6a4314

📥 Commits

Reviewing files that changed from the base of the PR and between db7eeee and 6c881b6.

⛔ Files ignored due to path filters (1)
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • app/src/components/settings/panels/VoicePanel.tsx
  • app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
  • app/src/utils/tauriCommands/voice.ts
  • src/openhuman/local_ai/model_ids.rs
  • src/openhuman/local_ai/service/speech.rs
  • src/openhuman/local_ai/service/whisper_engine.rs
  • src/openhuman/voice/audio_capture.rs
  • src/openhuman/voice/server.rs
  • src/openhuman/voice/streaming.rs
  • src/openhuman/voice/text_input.rs

Comment thread src/openhuman/local_ai/model_ids.rs
Comment on lines +21 to +22
/// RMS threshold below which audio is considered silence.
const SILENCE_RMS_THRESHOLD: f32 = 0.002;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire the configured silence threshold into the live gate.

The new gate compares against a fixed 0.002 on Lines 21-22/65-66, but SilenceGate::new() and start_recording() take no threshold input. The UI still exposes a configurable silence threshold on Lines 300-318 of app/src/components/settings/panels/VoicePanel.tsx, so quieter users can lower the setting and still have speech discarded here before Whisper ever sees it.

💡 Minimal shape for making the gate configurable
-const SILENCE_RMS_THRESHOLD: f32 = 0.002;
-
 struct SilenceGate {
+    threshold: f32,
     source_sample_rate: u32,
@@
-    fn new(source_sample_rate: u32) -> Self {
+    fn new(source_sample_rate: u32, threshold: f32) -> Self {
         let gate_samples = ((source_sample_rate as usize * SILENCE_GATE_MS) / 1000).max(1);
         let lookahead_samples = ((source_sample_rate as usize * LOOKAHEAD_MS) / 1000).max(1);
         Self {
+            threshold,
             source_sample_rate,
@@
-        let is_silent = rms < SILENCE_RMS_THRESHOLD;
+        let is_silent = rms < self.threshold;

Then pass the saved silence_threshold down when constructing SilenceGate.

Also applies to: 48-66, 223-229

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/voice/audio_capture.rs` around lines 21 - 22, The gate
currently uses a hardcoded SILENCE_RMS_THRESHOLD (const) while the UI exposes a
configurable silence_threshold; update the code to accept and pass that
configured value into the live gate: add a threshold: f32 parameter to
SilenceGate::new (and any SilenceGate constructors), remove or keep the const as
a default only, and thread that parameter through start_recording (and any call
sites that construct SilenceGate) so the saved silence_threshold from the UI is
passed when creating the SilenceGate instance instead of comparing against
0.002.

Comment thread src/openhuman/voice/server.rs
Comment thread src/openhuman/voice/server.rs Outdated
Comment thread src/openhuman/voice/streaming.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/voice/server.rs (1)

460-469: Consider extracting process_recording_bg arguments into a struct.

The #[allow(clippy::too_many_arguments)] suppress is acceptable for now, but bundling state, transcription_count, last_error, recent_transcripts, and expected_app into a context struct would improve maintainability if more parameters are added later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/voice/server.rs` around lines 460 - 469, Extract the clustered
arguments into a context struct (e.g., RecordingContext or
ProcessRecordingContext) and replace the multiple parameters in async fn
process_recording_bg(...) with a single context parameter; include fields state:
Arc<Mutex<ServerState>>, transcription_count: Arc<AtomicU64>, last_error:
Arc<Mutex<Option<String>>>, recent_transcripts: Arc<Mutex<Vec<String>>>, and
expected_app: Option<String> in that struct, update all call sites that invoke
process_recording_bg to construct and pass the new context instance, and adjust
any code inside process_recording_bg that referenced the old parameter names to
use the struct field access instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openhuman/voice/server.rs`:
- Around line 460-469: Extract the clustered arguments into a context struct
(e.g., RecordingContext or ProcessRecordingContext) and replace the multiple
parameters in async fn process_recording_bg(...) with a single context
parameter; include fields state: Arc<Mutex<ServerState>>, transcription_count:
Arc<AtomicU64>, last_error: Arc<Mutex<Option<String>>>, recent_transcripts:
Arc<Mutex<Vec<String>>>, and expected_app: Option<String> in that struct, update
all call sites that invoke process_recording_bg to construct and pass the new
context instance, and adjust any code inside process_recording_bg that
referenced the old parameter names to use the struct field access instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2231ca11-e488-48eb-b260-17973bde5683

📥 Commits

Reviewing files that changed from the base of the PR and between 6c881b6 and 652ff26.

⛔ Files ignored due to path filters (1)
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • app/src/components/settings/panels/VoicePanel.tsx
  • src/openhuman/config/schema/local_ai.rs
  • src/openhuman/voice/server.rs
  • src/openhuman/voice/streaming.rs
✅ Files skipped from review due to trivial changes (1)
  • src/openhuman/config/schema/local_ai.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants