Skip to content

Feat/add dictation#278

Merged
graycyrus merged 13 commits intotinyhumansai:mainfrom
YellowSnnowmann:feat/add-dictation
Apr 2, 2026
Merged

Feat/add dictation#278
graycyrus merged 13 commits intotinyhumansai:mainfrom
YellowSnnowmann:feat/add-dictation

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented Apr 2, 2026

Summary

  • Adds end-to-end voice dictation: configurable hotkey, floating overlay UI, in-browser audio capture, local Whisper STT, and transcript insertion into any focused field.
  • New voice Rust domain (src/openhuman/voice/) with four RPC endpoints: voice_status, voice_transcribe, voice_transcribe_bytes, and voice_tts — all wired into the shared controller registry.
  • Optional LLM post-processing (voice/postprocess.rs) cleans grammar and filler words via local Ollama with graceful fallback to raw Whisper output.
  • React layer: useDictation hook, DictationOverlay draggable panel, DictationPanel settings page, and dictationSlice Redux store — wired into the existing App/Settings/Store infrastructure.
  • Global hotkey registration via tauri-plugin-global-shortcut; CmdOrCtrl expands to both Cmd and Ctrl variants on macOS automatically.

Problem

Users had no way to dictate text into the app without leaving it to use an external tool. Local STT was partially plumbed in local_ai but had no user-facing dictation surface and no UI entry point. Tracked in #187.

Solution

Rust — voice domain

A dedicated src/openhuman/voice/ module owns all STT/TTS logic cleanly separated from the existing local_ai ops surface:

  • ops.rs — business logic for status check, file-path transcription, byte-array transcription, and TTS synthesis; uses normalize_extension() with strict alphanumeric validation to prevent path traversal.
  • postprocess.rs — optional LLM pass over raw Whisper output; config-gated via voice_llm_cleanup_enabled; always returns raw text on failure.
  • schemas.rs — controller schemas + registered handlers; plugs into the shared registry (core/all.rs).
  • types.rs — serializable DTOs (VoiceStatus, VoiceSpeechResult, VoiceTtsResult).

Tauri shell — global hotkey bridge

register_dictation_hotkey / unregister_dictation_hotkey Tauri commands manage tauri-plugin-global-shortcut. On macOS, CmdOrCtrl+X expands to two shortcuts (Cmd+X and Ctrl+X) since CmdOrCtrl is not a native modifier token there. On press, the shell emits dictation://toggle to all webviews.

React — useDictation hook + overlay

  1. useDictation hooks into dictation://toggle (Tauri event) and a fallback keydown listener (browser). MediaRecorder captures audio; an OfflineAudioContext pipeline downmixes and resamples to 16 kHz mono WAV before sending to openhuman.voice_transcribe_bytes. A session-ID guard prevents stale transcription responses from writing to state.
  2. DictationOverlay is a draggable, viewport-clamped panel rendered directly inside App.tsx. Transcript insertion tries (in order): a custom dictation://insert-text event, direct DOM mutation on the last focused editable element, an accessibility action via openhumanAccessibilityInputAction, and clipboard fallback.
  3. DictationPanel (Settings → Voice Dictation) shows engine status rows, model path guidance, hotkey editor, and the floating-launcher toggle.
  4. dictationSlice manages recording state, transcript, hotkey, and voice_status async thunk; hotkey and launcher preference are persisted to localStorage.

Submission Checklist

  • Unit testsvoice/ops.rs: normalize_extension (6 cases) + voice_status (2 cases); voice/schemas.rs: schema stability + contract tests; voice/postprocess.rs: empty/whitespace/disabled-config edge cases.
  • E2E / integrationapp/test/e2e/specs/voice-mode.spec.ts covers voice status check, recording button visibility, and voice/text mode switching.
  • Doc comments — All public Rust functions carry ///; TypeScript exports have inline comments on non-obvious paths.
  • Inline comments — WAV encoding constants, session-ID staleness guard, and CmdOrCtrl expansion logic are annotated.

Impact

  • Desktop only — overlay and global shortcut silently no-op in browser environments.
  • Additive RPC surface — no existing endpoints modified; voice endpoints are new.
  • Existing local_ai_transcribe_bytes in local_ai/ops.rs is preserved for backward compatibility; voice_transcribe_bytes is the new authoritative path (see follow-up).
  • New config keyslocal_ai.whisper_in_process (default false) and local_ai.voice_llm_cleanup_enabled (default false); both additive with safe defaults.
  • New dependencytauri-plugin-global-shortcut = "2" in app/src-tauri/Cargo.toml.
  • Microphone permission — handled via browser getUserMedia; macOS will prompt once on first use.

Related

  • Issue(s): Add voice-to-text dictation flow (global hotkey + transcription) inspired by OpenWhispr #187
  • Follow-up PR(s)/TODOs:
    • Consolidate local_ai_transcribe_bytes (local_ai/ops.rs) with voice_transcribe_bytes (voice/ops.rs) — both write a UUID-named temp file and call service.transcribe(); the voice variant is canonical.
    • Add E2E coverage for the full insert-into-field flow (Tauri → overlay → DOM mutation).
    • Surface STT model download inside DictationPanel rather than linking away to Local AI settings.

Summary by CodeRabbit

  • New Features

    • Voice dictation UI with floating launcher and full overlay (Insert/Copy actions)
    • Configurable global hotkey (persisted) to toggle dictation; system-wide registration and fallback behavior
    • Dictation settings panel and new Settings page with availability status, hotkey editor, and model guidance
    • Transcribed text inserts into message inputs when used; clipboard/type fallbacks if direct insert fails
  • Bug Fixes

    • Adjusted Screen Intelligence status checks to avoid incorrect warnings

…ettings panel

- Added DictationOverlay component for real-time speech-to-text functionality, including recording, transcribing, and error handling.
- Introduced useDictation hook to manage audio recording and transcription processes.
- Created DictationPanel for configuring dictation settings, including hotkey registration and floating launcher preferences.
- Updated SettingsHome to include a navigation option for the new dictation feature.
- Integrated dictation state management with Redux, allowing for persistent settings and status checks.
- Enhanced Tauri commands for registering global hotkeys and managing dictation state.

This update provides a comprehensive voice dictation experience, enabling users to transcribe speech to text using local AI.
… on floating launcher state

- Reintroduced the useAppDispatch and useAppSelector hooks for state management.
- Added showFloatingLauncher to the state selection, ensuring the overlay only renders when the floating launcher is active.
- Simplified the return statement for position calculations in the drag handler.
- Improved code readability by formatting JSX elements for better clarity.

This update enhances the user experience by ensuring the dictation overlay behaves correctly based on the application's state.
- Updated the SettingsRoute type to include 'dictation' as a new route.
- Integrated DictationPanel into the Settings page for user configuration.
- Enhanced dictation state management by adding showFloatingLauncher to the dictationSlice, allowing for better control of the dictation overlay.

This update improves the settings navigation and user experience by providing access to dictation features directly from the settings menu.
…tings

- Added DictationOverlay component to the main App for improved user interaction with dictation features.
- Updated DictationPanel to include a new preference for showing a floating launcher, enhancing user control over dictation functionality.
- Enhanced state management by incorporating showFloatingLauncher into the dictationSlice, allowing for better configuration of the dictation experience.

This update improves the overall user experience by providing direct access to dictation features and customizable settings.
…ved text insertion

- Added functionality to insert text into editable elements from the DictationOverlay, improving user interaction with dictation features.
- Implemented event listener in Conversations to handle custom dictation insert events, allowing seamless integration of transcribed text into the input field.
- Updated state management to ensure the correct editable target is used for text insertion, enhancing overall user experience.

This update streamlines the dictation process, making it more intuitive and responsive to user actions.
- Updated the logic for determining STT availability to ensure it only considers the model file as available when both the model file exists and there is a method to run inference (either the in-process engine is loaded or a whisper binary is present).
- This change prevents misleading user experiences by avoiding the display of the overlay when the necessary components for transcription are not available.

This update enhances the reliability of the dictation feature by providing clearer conditions for STT availability.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

Adds a voice dictation feature: global hotkey management (Tauri plugin + commands), a floating dictation overlay and hook for microphone capture/transcription, Redux state and settings UI, Tauri frontend wrappers, and lazy in-process Whisper engine loading for local STT.

Changes

Cohort / File(s) Summary
Tauri plugin & hotkey commands
app/src-tauri/Cargo.toml, app/src-tauri/src/lib.rs
Added tauri-plugin-global-shortcut; implemented DictationHotkeyState, register_dictation_hotkey / unregister_dictation_hotkey, shortcut expansion/validation, plugin init and event emission.
Frontend overlay & hook
app/src/components/dictation/DictationOverlay.tsx, app/src/components/dictation/useDictation.ts
New draggable DictationOverlay component and useDictation hook: MediaRecorder capture, WAV conversion/resampling, transcription RPC calls, hotkey handling, DOM insertion/copy fallbacks, and session guards.
Redux slice & store wiring
app/src/store/dictationSlice.ts, app/src/store/index.ts
Added dictation slice, types, checkDictationAvailability thunk, persisted hotkey/showFloatingLauncher, and registered reducer in root store.
Settings UI & routing
app/src/components/settings/panels/DictationPanel.tsx, app/src/components/settings/SettingsHome.tsx, app/src/components/settings/hooks/useSettingsNavigation.ts, app/src/pages/Settings.tsx
New DictationPanel settings page, menu entry, route, hotkey save flow with Tauri registration, STT status UI, and floating-launch toggle.
App integration & insertion event
app/src/App.tsx, app/src/pages/Conversations.tsx
Mounted DictationOverlay at app root; Conversations listens for dictation://insert-text events and appends dictated text into the message textarea.
Tauri command wrappers (frontend)
app/src/utils/tauriCommands.ts
Added registerDictationHotkey / unregisterDictationHotkey wrappers with modifier normalization and Tauri no-op guards.
Local STT engine loading & service guards
src/openhuman/local_ai/service/speech.rs, src/openhuman/local_ai/service/bootstrap.rs, src/openhuman/local_ai/service/mod.rs
Added lazy in-process Whisper load attempt guarded by a whisper_load_lock mutex; bootstrap treats degraded like ready to stop repeated auto-retries.
Tests & misc
app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx, PR_DESCRIPTION.md
Narrowed test selectors; removed PR_DESCRIPTION.md (deleted).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Browser as App (Overlay / Hook)
    participant Tauri
    participant Core
    participant Whisper

    User->>Browser: Activate dictation (hotkey or launcher)
    Browser->>Tauri: (on save) register_dictation_hotkey(shortcut)
    Tauri->>Browser: Emit "dictation://toggle" on hotkey press
    Browser->>Browser: useDictation.toggle() -> startRecording()
    Browser->>User: Show recording UI
    User->>Browser: Speak (Microphone -> MediaRecorder)
    Browser->>Browser: Stop & convert to mono 16kHz WAV bytes
    Browser->>Core: openhuman.voice_transcribe_bytes(audio_bytes)
    Core->>Whisper: Transcribe (in-process or subprocess)
    Whisper-->>Core: Transcript / Error
    Core-->>Browser: Transcript response
    Browser->>Browser: setTranscript -> show Insert/Copy
    Browser->>App (Conversations): dispatch dictation://insert-text event
    App->>App: Append text to focused textarea
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • senamakel

Poem

🐰
I twitched my whiskers at a tiny key,
A hush, then letters hopped out free.
From mic to text they soft‑footed leap,
Hotkey wakes me from my sleep.
Hooray—my carrot types while I nap, whee!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat/add dictation' directly summarizes the main change: adding dictation functionality to the application.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

- Removed redundant state management for the position of the dictation overlay, consolidating logic to initialize and reset the position based on the current status.
- Introduced a new `resetLauncherPosition` function to simplify resetting the overlay's position when necessary.
- Updated event handling to ensure the overlay's position is reset appropriately after text insertion actions.

This update enhances the clarity and efficiency of the DictationOverlay component, improving user experience during dictation interactions.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (2)
src/openhuman/local_ai/service/speech.rs (1)

30-39: Serialize lazy model loading to avoid duplicate heavy allocations.

Line 30 checks is_loaded before spawning a loader, but concurrent requests can all observe “not loaded” and load simultaneously. With current whisper_engine::load_engine behavior (src/openhuman/local_ai/service/whisper_engine.rs Line 31-Line 55), that can allocate multiple contexts and discard earlier ones.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/service/speech.rs` around lines 30 - 39, The check in
speech.rs that calls whisper_engine::is_loaded(&self.whisper) then
tokio::task::spawn_blocking to run whisper_engine::load_engine can race so
multiple tasks load duplicate heavy contexts; serialize the lazy load by adding
a shared load guard (e.g., a Mutex/AsyncMutex or singleflight-like token) on the
same struct that holds self.whisper, acquire the guard before spawning the
blocking load, re-check whisper_engine::is_loaded(&self.whisper) after acquiring
the guard to avoid redundant loads, and only call whisper_engine::load_engine
when still needed; ensure the guard is released after load completes so
concurrent callers wait rather than spawn parallel loads.
app/src/utils/tauriCommands.ts (1)

2204-2213: Don't retry with the raw shortcut string.

The first attempt already normalizes aliases like Command, Control, and Option. Retrying shortcut reintroduces those raw tokens, but expand_dictation_shortcuts() in app/src-tauri/src/lib.rs:21-47 only expands CmdOrCtrl and otherwise forwards the string unchanged. This fallback is not actually compatibility-preserving; it just exercises a less supported path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/utils/tauriCommands.ts` around lines 2204 - 2213, The retry using the
raw shortcut string reintroduces unnormalized tokens and is incorrect; in the
try/catch around invoke('register_dictation_hotkey') you should stop retrying
with the original shortcut variable (remove the second invoke call that uses
shortcut), keep the warning/log that normalized registration failed (including
err), and either rethrow the error or handle it as a terminal failure so the
caller knows registration truly failed; reference the normalizedShortcut
variable and the invoke('register_dictation_hotkey') call and note that
expand_dictation_shortcuts in app/src-tauri/src/lib.rs should remain the single
normalization path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src-tauri/src/lib.rs`:
- Around line 152-199: Do not clear or overwrite DictationHotkeyState until all
unregister/register operations complete successfully: capture the current guard
clone into old_shortcuts, but do not call guard.clear() or assign *guard = ...
yet; attempt to unregister old_shortcuts and register all expanded_shortcuts
(from expand_dictation_shortcuts) while tracking which new variants succeeded,
and if any registration fails roll back by unregistering any newly-registered
variants and leaving the original state intact, returning the error; only after
all registrations succeed, acquire the DictationHotkeyState lock and replace its
contents with expanded_shortcuts. Use the same identifiers
(DictationHotkeyState, old_shortcuts, expanded_shortcuts,
app.global_shortcut().on_shortcut, app_clone) to locate and update the logic.

In `@app/src/components/dictation/DictationOverlay.tsx`:
- Around line 211-219: openhumanAccessibilityInputAction currently treats any
successful RPC response as success and clears the transcript; change this so the
response from openhumanAccessibilityInputAction is awaited into a variable
(response) and you check response.result.accepted (and/or
response.result.blocked) before calling dispatch(resetDictation()). Only call
dispatch(resetDictation()) when accepted === true; if accepted is false (or
blocked), fall back to writing to navigator.clipboard.writeText(transcript) and
then reset state as appropriate. Ensure the existing try/catch still handles
transport errors the same way and that the dispatch/reset occurs only after
confirming acceptance from the response.

In `@app/src/components/dictation/useDictation.ts`:
- Around line 263-276: The RPC response is wrapped in a CommandResponse envelope
so reading response.text causes undefined access; in the
callCoreRpc<TranscribeResult> handling for method
'openhuman.voice_transcribe_bytes' unwrap the envelope and read the
transcription from response.result.text (or use the shared wrapper utility used
elsewhere) before trimming and logging; keep the stale-response guard using
sessionIdRef.current/sessionId unchanged and replace any response.text.trim()
uses with the unwrapped value and a safe null/empty check.
- Around line 330-339: The timeout-based retry in the useEffect that calls
registerDictationHotkey(hotkey) can outlive the current hotkey and re-register a
stale shortcut; fix it by capturing the timer id and clearing it in the effect
cleanup so the delayed retry is cancelled when hotkey changes or the hook
unmounts: store the setTimeout id (e.g., in a local variable or a ref), call
clearTimeout(timerId) in the useEffect return cleanup, and ensure any in-flight
promises don't trigger additional retries after cleanup; update the useEffect
surrounding registerDictationHotkey and its retry to use this cleanup behavior.
- Around line 247-255: The fallback branch in useDictation.ts currently maps
unknown recorder containers (e.g., 'audio/webm') to 'webm', which Rust's
normalize_extension rejects; update the catch block that handles conversionErr
(where blob, bytes, ext and mimeType are set) to only accept supported
extensions ('wav','mp3','m4a','ogg','flac') by mapping known container mime
types to one of those (e.g., map 'webm' -> 'ogg') or throw/abort the fallback
when the blob's mimeType is not in that supported set; ensure you change the
assignment to ext accordingly and keep bytes = Array.from(new
Uint8Array(buffer)) as-is so downstream callers (send/upload functions) receive
a supported extension.

In `@app/src/components/settings/panels/DictationPanel.tsx`:
- Around line 133-141: The UI currently hardcodes a Unix-style path in
DictationPanel.tsx using voiceStatus.stt_model_id which is incorrect on Windows;
update the rendering to derive the model directory from the app/core
configuration or a platform-aware helper (e.g., a provided getModelDirectory or
IPC call to return the correct models path) and then join it with
voiceStatus.stt_model_id (or show both platform-specific examples) instead of
"~/.openhuman/...". Locate the JSX block that references
voiceStatus.stt_model_id and isCheckingStatus and replace the hard-coded path
text with the platform-aware path value sourced from the config/helper so the
message shows the correct path on Windows, macOS, and Linux.
- Around line 62-77: The statusLabel and statusColor helpers currently treat
voiceStatus.stt_model_path as "ready" (green) even when
voiceStatus.stt_available is false; update both functions so only
voiceStatus.stt_available yields the green "Ready (model loaded)"/bg-green-400
state. Add an explicit branch for when voiceStatus.stt_model_path is true but
voiceStatus.stt_available is false (e.g., label "Model found — backend
unavailable" and a neutral/amber color like bg-amber-400), and ensure the checks
reference the existing symbols statusLabel, statusColor, voiceStatus,
stt_available, and stt_model_path.

In `@app/src/pages/Conversations.tsx`:
- Around line 226-242: The ESLint no-undef error is caused by calling the global
requestAnimationFrame directly inside onDictationInsert; update the call to use
window.requestAnimationFrame(...) instead. Locate the onDictationInsert handler
in Conversations.tsx (the function using requestAnimationFrame and
textInputRef.current?.focus()) and replace the bare requestAnimationFrame
invocation with window.requestAnimationFrame to match other files (e.g.,
RotatingTetrahedronCanvas.tsx) and satisfy the app/eslint.config.js globals.

In `@app/src/store/dictationSlice.ts`:
- Around line 34-45: The reducer initialState (initialState, DEFAULT_HOTKEY) is
reading localStorage (impure side effect); remove localStorage access from
dictationSlice initialState and set deterministic defaults (e.g., hotkey =
DEFAULT_HOTKEY, showFloatingLauncher = true/false) so reducers remain pure, then
add persistence for dictation.hotkey and dictation.showFloatingLauncher in the
centralized persistence setup by creating a dictationPersistConfig = { key:
'dictation', storage, whitelist: ['hotkey','showFloatingLauncher'] }, wrapping
dictationReducer with persistReducer(dictationPersistConfig, dictationReducer),
and replacing the plain dictation reducer in the root reducer map with the
persisted reducer.

In `@app/src/utils/tauriCommands.ts`:
- Around line 2186-2205: The function registerDictationHotkey currently attempts
a Tauri invoke even when not running in Tauri; change it to return early when
isTauri() is false (treat browsers as a no-op) to avoid calling invoke and
scheduling retries. Locate registerDictationHotkey and add an early return
before normalization/invoke (or immediately after the isTauri() check) so that
when isTauri() is false the function resolves without calling
invoke('register_dictation_hotkey', ...) while preserving existing debug
logging.

---

Nitpick comments:
In `@app/src/utils/tauriCommands.ts`:
- Around line 2204-2213: The retry using the raw shortcut string reintroduces
unnormalized tokens and is incorrect; in the try/catch around
invoke('register_dictation_hotkey') you should stop retrying with the original
shortcut variable (remove the second invoke call that uses shortcut), keep the
warning/log that normalized registration failed (including err), and either
rethrow the error or handle it as a terminal failure so the caller knows
registration truly failed; reference the normalizedShortcut variable and the
invoke('register_dictation_hotkey') call and note that
expand_dictation_shortcuts in app/src-tauri/src/lib.rs should remain the single
normalization path.

In `@src/openhuman/local_ai/service/speech.rs`:
- Around line 30-39: The check in speech.rs that calls
whisper_engine::is_loaded(&self.whisper) then tokio::task::spawn_blocking to run
whisper_engine::load_engine can race so multiple tasks load duplicate heavy
contexts; serialize the lazy load by adding a shared load guard (e.g., a
Mutex/AsyncMutex or singleflight-like token) on the same struct that holds
self.whisper, acquire the guard before spawning the blocking load, re-check
whisper_engine::is_loaded(&self.whisper) after acquiring the guard to avoid
redundant loads, and only call whisper_engine::load_engine when still needed;
ensure the guard is released after load completes so concurrent callers wait
rather than spawn parallel loads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb9f0615-77ec-4c1c-8ce6-c6af86f66ed0

📥 Commits

Reviewing files that changed from the base of the PR and between d930c47 and b4a8fa8.

⛔ Files ignored due to path filters (1)
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • app/src-tauri/Cargo.toml
  • app/src-tauri/src/lib.rs
  • app/src/App.tsx
  • app/src/components/dictation/DictationOverlay.tsx
  • app/src/components/dictation/useDictation.ts
  • app/src/components/settings/SettingsHome.tsx
  • app/src/components/settings/hooks/useSettingsNavigation.ts
  • app/src/components/settings/panels/DictationPanel.tsx
  • app/src/components/settings/panels/ScreenIntelligencePanel.tsx
  • app/src/pages/Conversations.tsx
  • app/src/pages/Settings.tsx
  • app/src/store/dictationSlice.ts
  • app/src/store/index.ts
  • app/src/utils/tauriCommands.ts
  • src/openhuman/local_ai/service/speech.rs

Comment thread app/src-tauri/src/lib.rs Outdated
Comment thread app/src/components/dictation/DictationOverlay.tsx
Comment thread app/src/components/dictation/useDictation.ts Outdated
Comment thread app/src/components/dictation/useDictation.ts Outdated
Comment thread app/src/components/dictation/useDictation.ts
Comment thread app/src/components/settings/panels/DictationPanel.tsx
Comment thread app/src/pages/Conversations.tsx
Comment thread app/src/store/dictationSlice.ts Outdated
Comment thread app/src/utils/tauriCommands.ts
Comment thread src/openhuman/local_ai/service/speech.rs Outdated
…rocess

- Streamlined the logic for registering and unregistering dictation hotkeys, ensuring that old shortcuts are properly managed before new ones are registered.
- Introduced rollback mechanisms to restore previous shortcuts in case of registration failures, enhancing reliability.
- Simplified error handling and logging for better clarity during the hotkey management process.

This update enhances the robustness of the dictation feature by ensuring a smoother transition between hotkey states.
…omponents

- Enhanced formatting in DictationOverlay for better clarity in asynchronous action handling.
- Streamlined text extraction logic in useDictation for improved readability.
- Consolidated model directory setting in DictationPanel to a single line for simplicity.
- Improved logging consistency in tauriCommands and speech service files.

These changes enhance the maintainability and readability of the dictation-related components.
… changes

- Introduced a reference to track the previous status of the dictation overlay.
- Updated the effect to reset the overlay's position when transitioning from 'idle' to any other status, enhancing user experience during dictation sessions.

This change improves the responsiveness of the DictationOverlay component to status changes, ensuring a smoother interaction for users.
- Modified test assertions in MemoryWorkspace.test.tsx to include a selector for 'span' elements when checking for text presence.
- This change enhances the specificity of the tests, ensuring they accurately target the intended elements in the rendered component.

These updates improve the reliability of the MemoryWorkspace component tests.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx (1)

74-79: Avoid tag-name-coupled assertions for relation text.

On Line 74 and Line 77–79, { selector: 'span' } makes the test depend on markup details. Prefer scoping to the Memory Graph area and asserting text behavior within that scope.

♻️ Suggested refactor
-import { screen, waitFor } from '@testing-library/react';
+import { screen, waitFor, within } from '@testing-library/react';
...
-    await waitFor(() => {
-      expect(screen.getByText('Alice', { selector: 'span' })).toBeInTheDocument();
-    });
-
-    expect(screen.getByText('AUTHORED', { selector: 'span' })).toBeInTheDocument();
-    expect(screen.getByText('Bob', { selector: 'span' })).toBeInTheDocument();
-    expect(screen.getByText('REVIEWED', { selector: 'span' })).toBeInTheDocument();
+    const memoryGraphHeading = await screen.findByText('Memory Graph');
+    const memoryGraphSection = memoryGraphHeading.closest('section');
+    expect(memoryGraphSection).not.toBeNull();
+    const graph = within(memoryGraphSection as HTMLElement);
+
+    expect(graph.getByText('Alice')).toBeInTheDocument();
+    expect(graph.getByText('AUTHORED')).toBeInTheDocument();
+    expect(graph.getByText('Bob')).toBeInTheDocument();
+    expect(graph.getByText('REVIEWED')).toBeInTheDocument();

As per coding guidelines, "Prefer testing behavior over implementation details; use existing helpers from app/src/test/ (test-utils.tsx, shared mock backend) before adding new harness code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx` around
lines 74 - 79, The assertions currently couple to markup by using
screen.getByText(..., { selector: 'span' }); update the test in
MemoryWorkspace.test.tsx to scope assertions to the Memory Graph area instead:
locate the Memory Graph container (e.g., via a test-id/role used in the
component or by querying the workspace root returned from your render helper),
call within(container) from `@testing-library/react` and then use
within(container).getByText('Alice'), within(container).getByText('AUTHORED'),
within(container).getByText('Bob'), etc., removing the selector option; also
ensure the test uses your existing test helpers (from test-utils.tsx / shared
mock backend) to render the component so you don't assert implementation/markup
details.
src/openhuman/local_ai/service/bootstrap.rs (1)

105-111: Consider clarifying the local_ai_status spawn condition to align with the new bootstrap behavior.

The early return for "degraded" state in bootstrap is intentional and well-documented. However, local_ai_status (ops.rs:168-177) explicitly checks for "idle" | "degraded" before spawning bootstrap:

if matches!(status.state.as_str(), "idle" | "degraded") {
    // spawns bootstrap...
}

With the new early return, spawning bootstrap when state is "degraded" becomes a silent no-op—the "degraded" check in this condition is now dead code and could confuse maintainers who see an attempted spawn but no actual work performed.

Either remove "degraded" from the condition (since automatic retries should not retry on degraded per the documented intent), or add a comment explaining that the spawn is intentionally benign.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/service/bootstrap.rs` around lines 105 - 111, The
spawn condition in local_ai_status (ops.rs) still checks for "idle" | "degraded"
while bootstrap.rs now returns early for "degraded", making the "degraded"
branch a no-op; update the check in local_ai_status (the if
matches!(status.state.as_str(), "idle" | "degraded")) to only match "idle"
(remove "degraded") so spawns reflect actual bootstrap behavior, or
alternatively add a clear comment in local_ai_status explaining that spawning on
"degraded" is intentionally benign because bootstrap.rs will early-return for
"degraded"; reference the bootstrap.rs early-return and the local_ai_status
spawn condition when making the change.
app/src/components/settings/panels/DictationPanel.tsx (1)

100-100: Success message auto-dismiss timeout is not cleared on unmount.

The setTimeout on line 100 can fire after the component unmounts if the user navigates away quickly after saving, causing a state update on an unmounted component. Consider tracking this timer in a ref and clearing it in an effect cleanup.

💡 Optional fix using useRef
+  const successTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
+
+  useEffect(() => {
+    return () => {
+      if (successTimerRef.current) {
+        clearTimeout(successTimerRef.current);
+      }
+    };
+  }, []);
+
   // In handleSaveHotkey:
   setHotkeySuccess(true);
-  setTimeout(() => setHotkeySuccess(false), 2000);
+  successTimerRef.current = setTimeout(() => setHotkeySuccess(false), 2000);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/settings/panels/DictationPanel.tsx` at line 100, The
setTimeout call that resets setHotkeySuccess in DictationPanel can run after
unmount and cause a state update on an unmounted component; fix it by storing
the timer id in a ref (e.g., hotkeyTimerRef) when calling setTimeout in the
handler that calls setHotkeySuccess, then add a useEffect cleanup in
DictationPanel that clears the timeout (clearTimeout(hotkeyTimerRef.current))
and nulls the ref on unmount, and also clear any existing timer before creating
a new one so multiple saves don’t leak timers.
app/src/components/dictation/DictationOverlay.tsx (1)

111-121: Escape handler dismisses regardless of dictation state.

The Escape key handler calls dismiss() unconditionally, which will reset dictation state and stop any active recording. This is the intended UX, but consider whether dismissing during transcribing state should show a confirmation or at least log that in-progress transcription was cancelled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/dictation/DictationOverlay.tsx` around lines 111 - 121,
The Escape key handler inside the useEffect currently calls dismiss()
unconditionally; update the handler to check the dictation state (e.g., a
prop/state like dictationState or isTranscribing) and only call dismiss()
immediately when not in 'transcribing' state; if in 'transcribing' state, either
invoke a confirmation flow (e.g., showConfirmCancel or openCancelModal) before
calling dismiss() or at minimum log that an in-progress transcription was
cancelled (use console.warn or processLogger) so the cancellation is explicit.
Modify the handler function and any related dismissal logic (handler, useEffect,
dismiss, and the confirmation modal trigger) to implement this conditional
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src-tauri/src/lib.rs`:
- Around line 239-254: The code clears the tracked shortcuts (guard.clear())
before attempting to unregister them, risking leaked OS shortcuts if an
unregister fails; change the logic in the unregister loop (the block using
state.0 lock, guard, old_shortcuts and app.global_shortcut().unregister) so you
clone or drain the current shortcuts, attempt to unregister each via
app.global_shortcut().unregister(old.as_str()), and only clear or update the
shared guard after all unregister calls succeed (mirror the pattern used in
register_dictation_hotkey); ensure errors return without mutating the guard so
remaining shortcuts remain tracked on failure.

---

Nitpick comments:
In `@app/src/components/dictation/DictationOverlay.tsx`:
- Around line 111-121: The Escape key handler inside the useEffect currently
calls dismiss() unconditionally; update the handler to check the dictation state
(e.g., a prop/state like dictationState or isTranscribing) and only call
dismiss() immediately when not in 'transcribing' state; if in 'transcribing'
state, either invoke a confirmation flow (e.g., showConfirmCancel or
openCancelModal) before calling dismiss() or at minimum log that an in-progress
transcription was cancelled (use console.warn or processLogger) so the
cancellation is explicit. Modify the handler function and any related dismissal
logic (handler, useEffect, dismiss, and the confirmation modal trigger) to
implement this conditional behavior.

In `@app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx`:
- Around line 74-79: The assertions currently couple to markup by using
screen.getByText(..., { selector: 'span' }); update the test in
MemoryWorkspace.test.tsx to scope assertions to the Memory Graph area instead:
locate the Memory Graph container (e.g., via a test-id/role used in the
component or by querying the workspace root returned from your render helper),
call within(container) from `@testing-library/react` and then use
within(container).getByText('Alice'), within(container).getByText('AUTHORED'),
within(container).getByText('Bob'), etc., removing the selector option; also
ensure the test uses your existing test helpers (from test-utils.tsx / shared
mock backend) to render the component so you don't assert implementation/markup
details.

In `@app/src/components/settings/panels/DictationPanel.tsx`:
- Line 100: The setTimeout call that resets setHotkeySuccess in DictationPanel
can run after unmount and cause a state update on an unmounted component; fix it
by storing the timer id in a ref (e.g., hotkeyTimerRef) when calling setTimeout
in the handler that calls setHotkeySuccess, then add a useEffect cleanup in
DictationPanel that clears the timeout (clearTimeout(hotkeyTimerRef.current))
and nulls the ref on unmount, and also clear any existing timer before creating
a new one so multiple saves don’t leak timers.

In `@src/openhuman/local_ai/service/bootstrap.rs`:
- Around line 105-111: The spawn condition in local_ai_status (ops.rs) still
checks for "idle" | "degraded" while bootstrap.rs now returns early for
"degraded", making the "degraded" branch a no-op; update the check in
local_ai_status (the if matches!(status.state.as_str(), "idle" | "degraded")) to
only match "idle" (remove "degraded") so spawns reflect actual bootstrap
behavior, or alternatively add a clear comment in local_ai_status explaining
that spawning on "degraded" is intentionally benign because bootstrap.rs will
early-return for "degraded"; reference the bootstrap.rs early-return and the
local_ai_status spawn condition when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c08ae132-5245-4017-ba6d-c51bd414fbd4

📥 Commits

Reviewing files that changed from the base of the PR and between b609fec and 555dd2a.

📒 Files selected for processing (12)
  • PR_DESCRIPTION.md
  • app/src-tauri/src/lib.rs
  • app/src/components/dictation/DictationOverlay.tsx
  • app/src/components/dictation/useDictation.ts
  • app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx
  • app/src/components/settings/panels/DictationPanel.tsx
  • app/src/store/dictationSlice.ts
  • app/src/store/index.ts
  • app/src/utils/tauriCommands.ts
  • src/openhuman/local_ai/service/bootstrap.rs
  • src/openhuman/local_ai/service/mod.rs
  • src/openhuman/local_ai/service/speech.rs
💤 Files with no reviewable changes (1)
  • PR_DESCRIPTION.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • app/src/store/index.ts
  • src/openhuman/local_ai/service/speech.rs
  • app/src/utils/tauriCommands.ts

Comment thread app/src-tauri/src/lib.rs
Comment on lines +239 to +254
let mut guard = state.0.lock().unwrap();
if guard.is_empty() {
log::debug!("[dictation] no shortcut registered — nothing to unregister");
} else {
let old_shortcuts = guard.clone();
guard.clear();
for old in old_shortcuts {
log::debug!("[dictation] unregistering shortcut: {old}");
app.global_shortcut()
.unregister(old.as_str())
.map_err(|e| {
log::warn!("[dictation] failed to unregister '{old}': {e}");
format!("Failed to unregister shortcut '{old}': {e}")
})?;
log::info!("[dictation] shortcut unregistered: {old}");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

State cleared before unregister completes — partial unregister can leak shortcuts.

guard.clear() (line 244) discards the tracked shortcuts before attempting to unregister them. If an unregister call fails mid-loop (line 252 returns Err), the remaining shortcuts stay registered at the OS level but are no longer tracked, so they can never be cleaned up.

Consider clearing state only after all shortcuts are successfully unregistered, mirroring the pattern used in register_dictation_hotkey.

🛠️ Suggested fix
 async fn unregister_dictation_hotkey(app: AppHandle) -> Result<(), String> {
     log::info!("[dictation] unregister_dictation_hotkey: called");
-    let state = app.state::<DictationHotkeyState>();
-    let mut guard = state.0.lock().unwrap();
-    if guard.is_empty() {
+    let old_shortcuts = {
+        let state = app.state::<DictationHotkeyState>();
+        let guard = state.0.lock().unwrap();
+        guard.clone()
+    };
+
+    if old_shortcuts.is_empty() {
         log::debug!("[dictation] no shortcut registered — nothing to unregister");
-    } else {
-        let old_shortcuts = guard.clone();
-        guard.clear();
-        for old in old_shortcuts {
-            log::debug!("[dictation] unregistering shortcut: {old}");
-            app.global_shortcut()
-                .unregister(old.as_str())
-                .map_err(|e| {
-                    log::warn!("[dictation] failed to unregister '{old}': {e}");
-                    format!("Failed to unregister shortcut '{old}': {e}")
-                })?;
-            log::info!("[dictation] shortcut unregistered: {old}");
-        }
+        return Ok(());
     }
+
+    for old in &old_shortcuts {
+        log::debug!("[dictation] unregistering shortcut: {old}");
+        app.global_shortcut()
+            .unregister(old.as_str())
+            .map_err(|e| {
+                log::warn!("[dictation] failed to unregister '{old}': {e}");
+                format!("Failed to unregister shortcut '{old}': {e}")
+            })?;
+        log::info!("[dictation] shortcut unregistered: {old}");
+    }
+
+    // Clear state only after all unregistrations succeed.
+    {
+        let state = app.state::<DictationHotkeyState>();
+        let mut guard = state.0.lock().unwrap();
+        guard.clear();
+    }
+
     Ok(())
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src-tauri/src/lib.rs` around lines 239 - 254, The code clears the tracked
shortcuts (guard.clear()) before attempting to unregister them, risking leaked
OS shortcuts if an unregister fails; change the logic in the unregister loop
(the block using state.0 lock, guard, old_shortcuts and
app.global_shortcut().unregister) so you clone or drain the current shortcuts,
attempt to unregister each via app.global_shortcut().unregister(old.as_str()),
and only clear or update the shared guard after all unregister calls succeed
(mirror the pattern used in register_dictation_hotkey); ensure errors return
without mutating the guard so remaining shortcuts remain tracked on failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add voice-to-text dictation flow (global hotkey + transcription) inspired by OpenWhispr

2 participants