feat(voice): dictation config, hotkey lifecycle, and WebSocket streaming (#332) by oxoxDev · Pull Request #371 · tinyhumansai/openhuman

oxoxDev · 2026-04-06T19:10:44Z

Summary

Foundational infrastructure for voice dictation (EPIC #332, Issue #333):

DictationConfig schema — new config module with serde defaults and env var overrides (enabled, hotkey, activation_mode, llm_refinement, streaming, streaming_interval_ms)
RPC surface — config_get_dictation_settings / config_update_dictation_settings controllers following existing registry pattern
WebSocket streaming endpoint — /ws/dictation accepts PCM16 LE audio chunks (16kHz mono), runs periodic whisper inference on accumulated buffer, sends partial/final transcription results as JSON
Frontend hotkey lifecycle — useDictationHotkey hook fetches config from core RPC, auto-registers global hotkey via Tauri shell, listens for dictation://toggle events; DictationHotkeyManager headless component mounted in App.tsx
Voice RPC type fix — voice handlers return flat results (no {result, logs} wrapper), fixed incorrect CommandResponse<T> wrapping on openhumanVoiceStatus, openhumanVoiceTranscribe, openhumanVoiceTranscribeBytes, and openhumanVoiceTts that caused "Could not check voice availability" error

Files changed

File	What
`src/openhuman/config/schema/dictation.rs`	New DictationConfig + DictationActivationMode
`src/openhuman/config/schema/{mod,types,load}.rs`	Wire dictation into Config struct + env overrides
`src/openhuman/config/{mod,ops,schemas}.rs`	RPC get/update handlers for dictation settings
`src/openhuman/voice/streaming.rs`	WebSocket streaming transcription handler
`src/openhuman/voice/mod.rs`	Export streaming module
`src/core/jsonrpc.rs`	Add `/ws/dictation` route + WebSocket upgrade handler
`app/src/hooks/useDictationHotkey.ts`	Frontend hotkey hook
`app/src/components/DictationHotkeyManager.tsx`	Headless hotkey manager component
`app/src/App.tsx`	Mount DictationHotkeyManager
`app/src/utils/tauriCommands.ts`	Fix voice RPC return types
`app/src/pages/Conversations.tsx`	Fix voice status/transcribe/tts consumers

Test plan

cargo check — compiles clean
cargo fmt --check — no formatting issues
yarn lint — 0 errors (6 pre-existing warnings)
yarn typecheck (tsc --noEmit) — passes
Manual: verify dictation hotkey registration in DevTools console ([dictation] logs)
Manual: test WebSocket endpoint ws://127.0.0.1:7788/ws/dictation
Manual: verify voice status shows "Ready" instead of "Could not check voice availability"

Closes #333

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added dictation hotkey support for quick activation via configurable keyboard shortcuts.
- Introduced WebSocket-based streaming transcription for real-time dictation feedback.
- Added dictation configuration options including activation mode (toggle/push), hotkey customization, and optional LLM-based transcription refinement.
- Enabled environment variable configuration for dictation settings.
Improvements
- Simplified voice API response handling for cleaner integration.

…ing (tinyhumansai#332) Add the foundational infrastructure for voice dictation (EPIC tinyhumansai#332): **Rust core:** - New `DictationConfig` schema with serde defaults and env var overrides (enabled, hotkey, activation_mode, llm_refinement, streaming, interval) - RPC controllers: `config_get_dictation_settings` / `config_update_dictation_settings` - WebSocket endpoint `/ws/dictation` for streaming PCM16 transcription with periodic partial inference and final LLM refinement - Microphone permission declaration (`NSMicrophoneUsageDescription`) in Tauri macOS bundle config **Frontend:** - `useDictationHotkey` hook: fetches config from core RPC, auto-registers global hotkey, listens for `dictation://toggle` events - `DictationHotkeyManager` headless component mounted in App.tsx - Fix voice RPC response type mismatch: voice handlers return flat results (no `{result, logs}` wrapper), so remove incorrect `CommandResponse<T>` wrapping from `openhumanVoiceStatus`, `openhumanVoiceTranscribe`, `openhumanVoiceTranscribeBytes`, and `openhumanVoiceTts` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `infoPlist` field in tauri.conf.json expects a string path, not an inline object. Remove it for now — microphone permission will be added via a proper Info.plist supplement in the production build pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-06T19:11:02Z

📝 Walkthrough

Walkthrough

This PR introduces voice dictation with global hotkey support across the full stack. Frontend changes add a dictation hotkey manager component and hook that register/unregister hotkeys and listen for toggle events. Backend adds dictation configuration schema, RPC endpoints for reading/updating settings, and a WebSocket streaming transcription handler. Voice RPC wrappers are refactored to return unwrapped payloads instead of envelope types.

Changes

Cohort / File(s)	Summary
Frontend Dictation Management `app/src/App.tsx`, `app/src/components/DictationHotkeyManager.tsx`, `app/src/hooks/useDictationHotkey.ts`	Added headless DictationHotkeyManager component and useDictationHotkey hook to manage global hotkey registration, event listening for toggle events, and state synchronization with core dictation settings.
Voice RPC Type Unwrapping `app/src/utils/tauriCommands.ts`, `app/src/pages/Conversations.tsx`	Updated voice function signatures (status, transcribe, transcribeBytes, TTS) to return unwrapped payloads instead of CommandResponse envelopes; adjusted call sites to use unwrapped result fields.
Backend Dictation Configuration Schema `src/openhuman/config/schema/dictation.rs`, `src/openhuman/config/schema/mod.rs`, `src/openhuman/config/schema/types.rs`, `src/openhuman/config/schema/load.rs`	Added DictationActivationMode and DictationConfig schema types with serde/JSON-schema support; integrated into Config struct with environment variable override handling for all dictation settings.
Configuration API & Persistence `src/openhuman/config/mod.rs`, `src/openhuman/config/ops.rs`, `src/openhuman/config/schemas.rs`	Exposed public re-exports for dictation types; implemented get/update RPC endpoints for dictation settings; registered controller handlers for config.get_dictation_settings and config.update_dictation_settings.
WebSocket Streaming Transcription `src/core/jsonrpc.rs`, `src/openhuman/voice/mod.rs`, `src/openhuman/voice/streaming.rs`	Added WebSocket route handler and new streaming module; implemented handle_dictation_ws to accumulate PCM16 audio frames, run periodic interim transcription at configured intervals, and send final results with optional LLM refinement.

Sequence Diagrams

sequenceDiagram
    actor User
    participant App as App.tsx
    participant HotkeyMgr as DictationHotkeyManager
    participant Hook as useDictationHotkey
    participant Tauri as Tauri RPC
    participant Core as Core Backend

    User->>App: Launch application
    App->>HotkeyMgr: Mount component
    HotkeyMgr->>Hook: Call useDictationHotkey()
    Hook->>Tauri: callCoreRpc(config_get_dictation_settings)
    Tauri->>Core: openhuman.config_get_dictation_settings
    Core-->>Tauri: { dictationEnabled, hotkey, activationMode }
    Tauri-->>Hook: Settings payload
    Hook->>Hook: Update state (enabled, hotkey)
    
    alt Dictation enabled and hotkey configured
        Hook->>Tauri: registerDictationHotkey(hotkey)
        Tauri->>Core: Register global hotkey
        Core-->>Tauri: Success
        Tauri-->>Hook: hotkeyRegistered = true
    end
    
    Hook->>Tauri: listen('dictation://toggle')
    Tauri-->>Hook: Event listener registered
    HotkeyMgr-->>App: Component mounted (renders null)
    
    User->>User: Press hotkey
    Tauri->>Hook: dictation://toggle event
    Hook->>Hook: Increment toggleCount
    HotkeyMgr->>HotkeyMgr: Log state change

sequenceDiagram
    participant Client as WebSocket Client
    participant Handler as handle_dictation_ws
    participant Buffer as Audio Buffer
    participant Inference as STT Engine
    participant Refine as LLM Refinement
    participant Client2 as Client (response)

    Client->>Handler: Upgrade WebSocket connection
    Handler->>Handler: Load config, create buffer
    Handler->>Handler: Spawn periodic inference task
    
    loop Stream Audio Frames
        Client->>Handler: Send binary PCM16 frame
        Handler->>Buffer: Validate & append samples
        Buffer->>Buffer: Accumulate audio
    end
    
    par Periodic Inference
        Handler->>Handler: Timer fires
        Handler->>Inference: Transcribe buffered audio (partial)
        Inference-->>Handler: Interim result
        Handler->>Client2: Send {"type":"partial","text":"..."}
    and Main Loop Continues
        Client->>Handler: Send {"type":"stop"} text frame
        Handler->>Handler: Trigger finalization
    end
    
    Handler->>Inference: Transcribe full buffer (final)
    Inference-->>Handler: Full transcription result
    
    alt LLM refinement enabled
        Handler->>Refine: cleanup_transcription(text)
        Refine-->>Handler: Refined text
        Handler->>Client2: Send {"type":"final","text":"refined","raw_text":"original"}
    else No refinement
        Handler->>Client2: Send {"type":"final","text":"...","raw_text":"..."}
    end
    
    Client->>Handler: Close or disconnect
    Handler->>Handler: Abort background tasks, cleanup

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[Feature] Voice dictation: global hotkey and overlay start/stop #333: This PR implements the core feature requested—global hotkey for voice dictation toggle, overlay/component lifecycle management, and clean state teardown on app quit, directly addressing all acceptance criteria.

Possibly related PRs

PR #278: Introduces overlapping dictation hotkey registration/unregistration logic and frontend dictation components that are foundational to this PR's DictationHotkeyManager and hook architecture.
PR #178: Earlier voice/dictation streaming feature that establishes the baseline voice pipeline and RPC wrappers extended by this PR's WebSocket streaming endpoint and type unwrapping changes.

Suggested reviewers

YellowSnnowmann
graycyrus

Poem

🐰 Hop, hop! A hotkey is born,
Dictation calls at dawn,
WebSockets sing in streaming tune,
Voice flows like drops of moon. 🎤✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 32.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title clearly and concisely describes the main feature additions: dictation config, hotkey lifecycle, and WebSocket streaming capabilities.
Linked Issues check	✅ Passed	PR addresses all coding requirements from `#333`: configurable hotkey via useDictationHotkey hook, hotkey registration/cleanup lifecycle, dictation enabled/disabled state management, and WebSocket streaming endpoint for voice input.
Out of Scope Changes check	✅ Passed	All changes are scoped to dictation infrastructure: config schema, RPC handlers, WebSocket endpoint, frontend hooks/components, and voice RPC type fixes. No unrelated refactoring or feature creep detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (6)

app/src/components/DictationHotkeyManager.tsx (1)

12-12: Prefer arrow component declaration for consistency.

Line 12 uses a function declaration; convert to a const arrow component to match project style guidelines.

♻️ Suggested refactor

-export default function DictationHotkeyManager() {
+const DictationHotkeyManager = () => {
   const { dictationEnabled, hotkeyRegistered, toggleCount, hotkey } = useDictationHotkey();
@@
   return null;
-}
+};
+
+export default DictationHotkeyManager;

Aligns with project guideline for **/*.{js,jsx,ts,tsx}: "Prefer arrow functions over function declarations".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/components/DictationHotkeyManager.tsx` at line 12, Convert the
function declaration for DictationHotkeyManager into a const arrow component to
match project style: replace "export default function DictationHotkeyManager()"
with a const arrow assignment like "const DictationHotkeyManager = () => { ...
}" and keep the default export (either export default on the const or export
default DictationHotkeyManager at the end); preserve all existing props, return
value and internal logic inside the new arrow function and retain any
TypeScript/React typings if present.

src/openhuman/config/ops.rs (1)

539-546: Missing derive attributes on DictationSettingsPatch.

Other patch structs in this file (e.g., ModelSettingsPatch, MemorySettingsPatch) derive Debug, Clone, and Default. This struct is missing those attributes, which may cause issues if debugging or cloning is needed.

♻️ Proposed fix to add consistent derives

+#[derive(Debug, Clone, Default)]
 pub struct DictationSettingsPatch {
     pub enabled: Option<bool>,
     pub hotkey: Option<String>,
     pub activation_mode: Option<String>,
     pub llm_refinement: Option<bool>,
     pub streaming: Option<bool>,
     pub streaming_interval_ms: Option<u64>,
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/config/ops.rs` around lines 539 - 546, The struct
DictationSettingsPatch is missing the consistent derive attributes used by other
patch structs; update DictationSettingsPatch to derive Debug, Clone, and Default
(matching ModelSettingsPatch and MemorySettingsPatch) so it can be
debug-printed, cloned, and default-constructed where expected; locate the
DictationSettingsPatch definition and add the derives above the struct
declaration.

app/src/hooks/useDictationHotkey.ts (3)

116-129: Potential race condition in listener setup.

If the component unmounts before listen() resolves, unlisten will be undefined and the cleanup won't properly unsubscribe. Consider using an async IIFE with a disposed flag check, similar to the first useEffect.

♻️ Proposed fix with disposal tracking

   useEffect(() => {
     if (!isTauri()) return;

-    let unlisten: (() => void) | undefined;
+    let disposed = false;
+    let unlistenFn: (() => void) | undefined;

-    listen('dictation://toggle', () => {
-      console.debug('[dictation] hotkey toggle event received');
-      setToggleCount(c => c + 1);
-    })
-      .then(fn => {
-        unlisten = fn;
-      })
-      .catch(err => {
-        console.warn('[dictation] failed to listen for dictation toggle', err);
-      });
+    (async () => {
+      try {
+        const fn = await listen('dictation://toggle', () => {
+          console.debug('[dictation] hotkey toggle event received');
+          setToggleCount(c => c + 1);
+        });
+        if (disposed) {
+          fn();
+        } else {
+          unlistenFn = fn;
+        }
+      } catch (err) {
+        console.warn('[dictation] failed to listen for dictation toggle', err);
+      }
+    })();

     return () => {
-      unlisten?.();
+      disposed = true;
+      unlistenFn?.();
     };
   }, []);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/hooks/useDictationHotkey.ts` around lines 116 - 129, The listener
setup can race with unmount because listen(...) is async and may resolve after
cleanup; modify the useDictationHotkey effect that calls listen to use an async
IIFE and a disposed flag (e.g., let disposed = false) so that when listen(...)
resolves you only assign unlisten = fn and call setToggleCount if not disposed,
and if disposed is true call fn() immediately to unsubscribe; ensure the cleanup
sets disposed = true and calls unlisten?.() so the listener is always removed
even if listen resolved after unmount (refer to the listen, unlisten and
setToggleCount symbols).

72-74: Type-unsafe RpcOutcome wrapper handling.

The check 'result' in settings and subsequent cast is fragile. If the backend response shape changes, this could silently extract wrong data. Consider defining a more explicit type for the wrapped response or trusting the backend to return consistent shapes.

♻️ Proposed improvement with explicit typing

+interface RpcOutcomeWrapper<T> {
+  result: T;
+}
+
+type DictationSettingsResponse = DictationSettings | RpcOutcomeWrapper<DictationSettings>;
+
 // Inside init():
-        const settings = await callCoreRpc<DictationSettings>({
+        const response = await callCoreRpc<DictationSettingsResponse>({
           method: 'openhuman.config_get_dictation_settings',
         });

         if (disposed) return;

-        if (!settings || typeof settings !== 'object') {
+        if (!response || typeof response !== 'object') {
           console.debug('[dictation] no dictation settings from core');
           return;
         }

         // Handle RpcOutcome wrapper — the result may be nested in .result
-        const s = (
-          'result' in settings ? (settings as Record<string, unknown>).result : settings
-        ) as DictationSettings;
+        const s: DictationSettings =
+          'result' in response && typeof response.result === 'object'
+            ? (response.result as DictationSettings)
+            : (response as DictationSettings);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/hooks/useDictationHotkey.ts` around lines 72 - 74, The current
extraction of DictationSettings using "'result' in settings" is type-unsafe; add
an explicit RpcOutcome<T> interface and a type guard like isRpcOutcome(obj): obj
is RpcOutcome<DictationSettings>, then in useDictationHotkey replace the ad-hoc
check around the local variable settings (and the assignment to s) with a
guarded extraction that returns settings.result when isRpcOutcome(settings) is
true, otherwise treats settings as DictationSettings; reference the RpcOutcome
generic, the isRpcOutcome type guard, the useDictationHotkey function, the
settings variable, and the local variable s to locate and update the logic.

45-45: Prefer arrow function for hook definition.

As per coding guidelines, arrow functions are preferred over function declarations.

♻️ Proposed fix

-export function useDictationHotkey(): DictationHotkeyState {
+export const useDictationHotkey = (): DictationHotkeyState => {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/hooks/useDictationHotkey.ts` at line 45, The hook is declared with a
function declaration; convert it to an exported arrow function to follow project
guidelines: replace the declaration export function useDictationHotkey():
DictationHotkeyState { ... } with an exported const arrow form (export const
useDictationHotkey = (): DictationHotkeyState => { ... }) preserving the
existing body and return type, and ensure any internal references (closures,
hooks) remain unchanged.

src/openhuman/voice/streaming.rs (1)

57-65: Buffer clone on each inference pass could be optimized.

Cloning the entire accumulated buffer (guard.clone()) for each partial inference may become expensive as audio accumulates (e.g., 30s of audio = ~960KB). Consider tracking processed sample count and only cloning new data, or accepting this trade-off for simplicity in the initial implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/voice/streaming.rs` around lines 57 - 65, The current loop
clones the entire buffer (guard.clone()) into samples each inference which grows
expensive; instead add a processed offset (e.g., processed_samples) and only
clone the new slice: inside the block that locks buf_clone, compute let new_len
= guard.len(); if new_len <= processed_samples || (new_len - processed_samples)
< 8000 { continue; } let samples: Vec<i16> =
guard[processed_samples..new_len].to_vec(); processed_samples = new_len; thereby
replacing last_len and guard.clone() usage in the samples creation logic (update
any conditions that used last_len to use processed_samples and the new_len
delta).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 331-344: The handler dictation_ws_handler currently performs the
WebSocket upgrade before loading config; change it to load the config with
crate::openhuman::config::rpc::load_config_with_timeout().await (creating
Arc::new on success) before calling ws.on_upgrade, and if loading fails return
an appropriate non-upgrade Response (e.g., 500 error) instead of performing the
upgrade; once config is loaded, call ws.on_upgrade(move |socket| async move {
crate::openhuman::voice::streaming::handle_dictation_ws(socket, config).await;
}) so handle_dictation_ws receives the preloaded Arc config.

In `@src/openhuman/config/schema/load.rs`:
- Around line 735-738: When parsing OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS in
the load routine, validate the parsed u64 before assigning to
self.dictation.streaming_interval_ms: reject 0 (and optionally cap to a sensible
max) so you don't create a tight loop; update the parsing block in load.rs (the
section that reads OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS and currently sets
self.dictation.streaming_interval_ms = ms) to only assign when ms >= 1 (or
within your chosen min/max bounds) and log or fallback to a safe default when
out of range.

In `@src/openhuman/voice/streaming.rs`:
- Around line 67-85: The call to whisper_engine::transcribe_pcm_i16 obtains a
synchronous parking_lot::Mutex via service.whisper and can block other WebSocket
sessions; update the streaming task to run the blocking inference on a blocking
thread (e.g., tokio::task::spawn_blocking) instead of calling transcribe_pcm_i16
directly on the async path, or alternatively document the limitation on
concurrent sessions. Locate the code around local_ai::global(&config_clone) and
the transcribe_pcm_i16 usage and move the heavy/locking work into spawn_blocking
(await its JoinHandle) so partial_tx.send(trimmed).await runs only after the
blocking call completes without holding up the async reactor.
- Around line 139-164: The code moves inference_handle twice causing an
ownership error; change all places that destructure it inside the loop and after
the loop to take ownership via Option::take() instead of pattern-matching
directly (i.e., replace uses like "if let Some(h) = inference_handle {
h.abort(); }" inside the match arms and the final stop block with "if let
Some(h) = inference_handle.take() { h.abort(); }") so the Option is emptied on
first use and subsequent checks are safe; reference the variable
inference_handle and the match arms handling Message::Close/None and
Some(Err(e)) as well as the final "Stop the periodic inference task" block.

---

Nitpick comments:
In `@app/src/components/DictationHotkeyManager.tsx`:
- Line 12: Convert the function declaration for DictationHotkeyManager into a
const arrow component to match project style: replace "export default function
DictationHotkeyManager()" with a const arrow assignment like "const
DictationHotkeyManager = () => { ... }" and keep the default export (either
export default on the const or export default DictationHotkeyManager at the
end); preserve all existing props, return value and internal logic inside the
new arrow function and retain any TypeScript/React typings if present.

In `@app/src/hooks/useDictationHotkey.ts`:
- Around line 116-129: The listener setup can race with unmount because
listen(...) is async and may resolve after cleanup; modify the
useDictationHotkey effect that calls listen to use an async IIFE and a disposed
flag (e.g., let disposed = false) so that when listen(...) resolves you only
assign unlisten = fn and call setToggleCount if not disposed, and if disposed is
true call fn() immediately to unsubscribe; ensure the cleanup sets disposed =
true and calls unlisten?.() so the listener is always removed even if listen
resolved after unmount (refer to the listen, unlisten and setToggleCount
symbols).
- Around line 72-74: The current extraction of DictationSettings using "'result'
in settings" is type-unsafe; add an explicit RpcOutcome<T> interface and a type
guard like isRpcOutcome(obj): obj is RpcOutcome<DictationSettings>, then in
useDictationHotkey replace the ad-hoc check around the local variable settings
(and the assignment to s) with a guarded extraction that returns settings.result
when isRpcOutcome(settings) is true, otherwise treats settings as
DictationSettings; reference the RpcOutcome generic, the isRpcOutcome type
guard, the useDictationHotkey function, the settings variable, and the local
variable s to locate and update the logic.
- Line 45: The hook is declared with a function declaration; convert it to an
exported arrow function to follow project guidelines: replace the declaration
export function useDictationHotkey(): DictationHotkeyState { ... } with an
exported const arrow form (export const useDictationHotkey = ():
DictationHotkeyState => { ... }) preserving the existing body and return type,
and ensure any internal references (closures, hooks) remain unchanged.

In `@src/openhuman/config/ops.rs`:
- Around line 539-546: The struct DictationSettingsPatch is missing the
consistent derive attributes used by other patch structs; update
DictationSettingsPatch to derive Debug, Clone, and Default (matching
ModelSettingsPatch and MemorySettingsPatch) so it can be debug-printed, cloned,
and default-constructed where expected; locate the DictationSettingsPatch
definition and add the derives above the struct declaration.

In `@src/openhuman/voice/streaming.rs`:
- Around line 57-65: The current loop clones the entire buffer (guard.clone())
into samples each inference which grows expensive; instead add a processed
offset (e.g., processed_samples) and only clone the new slice: inside the block
that locks buf_clone, compute let new_len = guard.len(); if new_len <=
processed_samples || (new_len - processed_samples) < 8000 { continue; } let
samples: Vec<i16> = guard[processed_samples..new_len].to_vec();
processed_samples = new_len; thereby replacing last_len and guard.clone() usage
in the samples creation logic (update any conditions that used last_len to use
processed_samples and the new_len delta).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 423af558-c16f-40a0-81d6-a0ab9794dc15

📥 Commits

Reviewing files that changed from the base of the PR and between 9b73d00 and 1261874.

📒 Files selected for processing (15)

app/src/App.tsx
app/src/components/DictationHotkeyManager.tsx
app/src/hooks/useDictationHotkey.ts
app/src/pages/Conversations.tsx
app/src/utils/tauriCommands.ts
src/core/jsonrpc.rs
src/openhuman/config/mod.rs
src/openhuman/config/ops.rs
src/openhuman/config/schema/dictation.rs
src/openhuman/config/schema/load.rs
src/openhuman/config/schema/mod.rs
src/openhuman/config/schema/types.rs
src/openhuman/config/schemas.rs
src/openhuman/voice/mod.rs
src/openhuman/voice/streaming.rs

coderabbitai · 2026-04-06T19:19:36Z

+/// WebSocket upgrade handler for streaming voice dictation.
+async fn dictation_ws_handler(ws: WebSocketUpgrade) -> Response {
+    log::info!("[ws] dictation WebSocket upgrade requested");
+    ws.on_upgrade(|socket| async move {
+        let config = match crate::openhuman::config::rpc::load_config_with_timeout().await {
+            Ok(c) => Arc::new(c),
+            Err(e) => {
+                log::error!("[ws] failed to load config for dictation: {e}");
+                return;
+            }
+        };
+        crate::openhuman::voice::streaming::handle_dictation_ws(socket, config).await;
+    })
+}


⚠️ Potential issue | 🟠 Major

Load config before upgrading the WebSocket.

At Line [334], the server upgrades first and only then attempts config load (Lines [335]-[340]). On failure, the socket is dropped post-101, which is a misleading handshake success for clients.

🔧 Proposed fix

async fn dictation_ws_handler(ws: WebSocketUpgrade) -> Response { log::info!("[ws] dictation WebSocket upgrade requested"); - ws.on_upgrade(|socket| async move { - let config = match crate::openhuman::config::rpc::load_config_with_timeout().await { - Ok(c) => Arc::new(c), - Err(e) => { - log::error!("[ws] failed to load config for dictation: {e}"); - return; - } - }; - crate::openhuman::voice::streaming::handle_dictation_ws(socket, config).await; - }) + let config = match crate::openhuman::config::rpc::load_config_with_timeout().await { + Ok(c) => Arc::new(c), + Err(e) => { + log::error!("[ws] failed to load config for dictation: {e}"); + return ( + StatusCode::SERVICE_UNAVAILABLE, + Json(json!({ "ok": false, "error": "dictation_unavailable" })), + ) + .into_response(); + } + }; + ws.on_upgrade(move |socket| async move { + crate::openhuman::voice::streaming::handle_dictation_ws(socket, config).await; + }) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/core/jsonrpc.rs` around lines 331 - 344, The handler dictation_ws_handler currently performs the WebSocket upgrade before loading config; change it to load the config with crate::openhuman::config::rpc::load_config_with_timeout().await (creating Arc::new on success) before calling ws.on_upgrade, and if loading fails return an appropriate non-upgrade Response (e.g., 500 error) instead of performing the upgrade; once config is loaded, call ws.on_upgrade(move |socket| async move { crate::openhuman::voice::streaming::handle_dictation_ws(socket, config).await; }) so handle_dictation_ws receives the preloaded Arc config.

coderabbitai · 2026-04-06T19:19:36Z

+        if let Ok(val) = std::env::var("OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS") {
+            if let Ok(ms) = val.trim().parse::<u64>() {
+                self.dictation.streaming_interval_ms = ms;
+            }


⚠️ Potential issue | 🟠 Major

Add bounds validation for dictation streaming interval.

Lines [736]-[738] accept any u64, including 0. For a periodic inference interval, that can cause a tight loop and severe CPU pressure.

🔧 Proposed fix

if let Ok(val) = std::env::var("OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS") { if let Ok(ms) = val.trim().parse::<u64>() { - self.dictation.streaming_interval_ms = ms; + if (100..=60_000).contains(&ms) { + self.dictation.streaming_interval_ms = ms; + } else { + tracing::warn!( + interval_ms = ms, + "ignoring invalid OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS (valid: 100..=60000)" + ); + } } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if let Ok(val) = std::env::var("OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS") {

if let Ok(ms) = val.trim().parse::<u64>() {

self.dictation.streaming_interval_ms = ms;

}

if let Ok(val) = std::env::var("OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS") {

if let Ok(ms) = val.trim().parse::<u64>() {

if (100..=60_000).contains(&ms) {

self.dictation.streaming_interval_ms = ms;

} else {

tracing::warn!(

interval_ms = ms,

"ignoring invalid OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS (valid: 100..=60000)"

);

}

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/config/schema/load.rs` around lines 735 - 738, When parsing OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS in the load routine, validate the parsed u64 before assigning to self.dictation.streaming_interval_ms: reject 0 (and optionally cap to a sensible max) so you don't create a tight loop; update the parsing block in load.rs (the section that reads OPENHUMAN_DICTATION_STREAMING_INTERVAL_MS and currently sets self.dictation.streaming_interval_ms = ms) to only assign when ms >= 1 (or within your chosen min/max bounds) and log or fallback to a safe default when out of range.

coderabbitai · 2026-04-06T19:19:37Z

+                let service = local_ai::global(&config_clone);
+                match whisper_engine::transcribe_pcm_i16(&service.whisper, &samples, None) {
+                    Ok(text) => {
+                        let trimmed = text.trim().to_string();
+                        if !trimmed.is_empty() {
+                            log::debug!(
+                                "{LOG_PREFIX} partial transcription ({} samples): {}",
+                                samples.len(),
+                                &trimmed[..trimmed.len().min(80)]
+                            );
+                            if partial_tx.send(trimmed).await.is_err() {
+                                break; // receiver dropped
+                            }
+                        }
+                    }
+                    Err(e) => {
+                        log::warn!("{LOG_PREFIX} partial inference error: {e}");
+                    }
+                }


⚠️ Potential issue | 🟡 Minor

Synchronous lock contention on shared Whisper engine.

The transcribe_pcm_i16 call uses a parking_lot::Mutex (per whisper_engine.rs), which is a synchronous blocking lock. Multiple concurrent WebSocket sessions will serialize their inference calls, potentially causing one session to block for seconds while another completes. Consider documenting this limitation or using spawn_blocking if inference is CPU-bound.

This may be acceptable for initial implementation since concurrent dictation sessions are unlikely, but worth noting for future scalability.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/voice/streaming.rs` around lines 67 - 85, The call to whisper_engine::transcribe_pcm_i16 obtains a synchronous parking_lot::Mutex via service.whisper and can block other WebSocket sessions; update the streaming task to run the blocking inference on a blocking thread (e.g., tokio::task::spawn_blocking) instead of calling transcribe_pcm_i16 directly on the async path, or alternatively document the limitation on concurrent sessions. Locate the code around local_ai::global(&config_clone) and the transcribe_pcm_i16 usage and move the heavy/locking work into spawn_blocking (await its JoinHandle) so partial_tx.send(trimmed).await runs only after the blocking call completes without holding up the async reactor.

coderabbitai · 2026-04-06T19:19:37Z

+                    Some(Ok(Message::Close(_))) | None => {
+                        log::info!("{LOG_PREFIX} client disconnected");
+                        if let Some(h) = inference_handle {
+                            h.abort();
+                        }
+                        return;
+                    }
+
+                    Some(Err(e)) => {
+                        log::warn!("{LOG_PREFIX} websocket error: {e}");
+                        if let Some(h) = inference_handle {
+                            h.abort();
+                        }
+                        return;
+                    }
+
+                    _ => {}
+                }
+            }
+        }
+    }
+
+    // Stop the periodic inference task
+    if let Some(h) = inference_handle {
+        h.abort();
+    }


⚠️ Potential issue | 🔴 Critical

Ownership error: inference_handle moved twice.

The inference_handle is moved into the match arms at lines 141-143 and 149-151 (inside the loop), but then accessed again at line 162 after the loop breaks via the stop command path. This will fail to compile because the value may have been moved.

🐛 Proposed fix using Option::take()

- let inference_handle = if do_streaming { + let mut inference_handle = if do_streaming { let handle = tokio::spawn(async move { // ... inference task }); Some(handle) } else { None }; loop { tokio::select! { // ... partial_rx branch unchanged ... msg = socket.recv() => { match msg { // ... Binary and Text branches unchanged ... Some(Ok(Message::Close(_))) | None => { log::info!("{LOG_PREFIX} client disconnected"); - if let Some(h) = inference_handle { + if let Some(h) = inference_handle.take() { h.abort(); } return; } Some(Err(e)) => { log::warn!("{LOG_PREFIX} websocket error: {e}"); - if let Some(h) = inference_handle { + if let Some(h) = inference_handle.take() { h.abort(); } return; } _ => {} } } } } // Stop the periodic inference task - if let Some(h) = inference_handle { + if let Some(h) = inference_handle.take() { h.abort(); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/voice/streaming.rs` around lines 139 - 164, The code moves inference_handle twice causing an ownership error; change all places that destructure it inside the loop and after the loop to take ownership via Option::take() instead of pattern-matching directly (i.e., replace uses like "if let Some(h) = inference_handle { h.abort(); }" inside the match arms and the final stop block with "if let Some(h) = inference_handle.take() { h.abort(); }") so the Option is emptied on first use and subsequent checks are safe; reference the variable inference_handle and the match arms handling Message::Close/None and Some(Err(e)) as well as the final "Stop the periodic inference task" block.

senamakel · 2026-04-06T19:43:42Z

I'm merging this into #368 thanks bro

…cycle, WebSocket streaming) Merge feat/332-voice-dictation into feat/stt to combine: - Our standalone voice server (hotkey → record → transcribe → insert) - PR tinyhumansai#371's DictationConfig, WebSocket streaming endpoint, frontend hotkey hook, and voice RPC type fixes Resolved conflict in src/openhuman/voice/mod.rs — kept both server and streaming modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

oxoxDev and others added 3 commits April 7, 2026 00:28

style: apply Prettier formatting to dictation files

1261874

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

senamakel closed this Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): dictation config, hotkey lifecycle, and WebSocket streaming (#332)#371

feat(voice): dictation config, hotkey lifecycle, and WebSocket streaming (#332)#371
oxoxDev wants to merge 3 commits intotinyhumansai:mainfrom
oxoxDev:feat/332-voice-dictation

oxoxDev commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Uh oh!

coderabbitai Bot Apr 6, 2026

Uh oh!

coderabbitai Bot Apr 6, 2026

Uh oh!

coderabbitai Bot Apr 6, 2026

Uh oh!

senamakel commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oxoxDev commented Apr 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

senamakel commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oxoxDev commented Apr 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading