Feat/add dictation#278
Conversation
…ettings panel - Added DictationOverlay component for real-time speech-to-text functionality, including recording, transcribing, and error handling. - Introduced useDictation hook to manage audio recording and transcription processes. - Created DictationPanel for configuring dictation settings, including hotkey registration and floating launcher preferences. - Updated SettingsHome to include a navigation option for the new dictation feature. - Integrated dictation state management with Redux, allowing for persistent settings and status checks. - Enhanced Tauri commands for registering global hotkeys and managing dictation state. This update provides a comprehensive voice dictation experience, enabling users to transcribe speech to text using local AI.
… on floating launcher state - Reintroduced the useAppDispatch and useAppSelector hooks for state management. - Added showFloatingLauncher to the state selection, ensuring the overlay only renders when the floating launcher is active. - Simplified the return statement for position calculations in the drag handler. - Improved code readability by formatting JSX elements for better clarity. This update enhances the user experience by ensuring the dictation overlay behaves correctly based on the application's state.
- Updated the SettingsRoute type to include 'dictation' as a new route. - Integrated DictationPanel into the Settings page for user configuration. - Enhanced dictation state management by adding showFloatingLauncher to the dictationSlice, allowing for better control of the dictation overlay. This update improves the settings navigation and user experience by providing access to dictation features directly from the settings menu.
…tings - Added DictationOverlay component to the main App for improved user interaction with dictation features. - Updated DictationPanel to include a new preference for showing a floating launcher, enhancing user control over dictation functionality. - Enhanced state management by incorporating showFloatingLauncher into the dictationSlice, allowing for better configuration of the dictation experience. This update improves the overall user experience by providing direct access to dictation features and customizable settings.
…ved text insertion - Added functionality to insert text into editable elements from the DictationOverlay, improving user interaction with dictation features. - Implemented event listener in Conversations to handle custom dictation insert events, allowing seamless integration of transcribed text into the input field. - Updated state management to ensure the correct editable target is used for text insertion, enhancing overall user experience. This update streamlines the dictation process, making it more intuitive and responsive to user actions.
- Updated the logic for determining STT availability to ensure it only considers the model file as available when both the model file exists and there is a method to run inference (either the in-process engine is loaded or a whisper binary is present). - This change prevents misleading user experiences by avoiding the display of the overlay when the necessary components for transcription are not available. This update enhances the reliability of the dictation feature by providing clearer conditions for STT availability.
📝 WalkthroughWalkthroughAdds a voice dictation feature: global hotkey management (Tauri plugin + commands), a floating dictation overlay and hook for microphone capture/transcription, Redux state and settings UI, Tauri frontend wrappers, and lazy in-process Whisper engine loading for local STT. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Browser as App (Overlay / Hook)
participant Tauri
participant Core
participant Whisper
User->>Browser: Activate dictation (hotkey or launcher)
Browser->>Tauri: (on save) register_dictation_hotkey(shortcut)
Tauri->>Browser: Emit "dictation://toggle" on hotkey press
Browser->>Browser: useDictation.toggle() -> startRecording()
Browser->>User: Show recording UI
User->>Browser: Speak (Microphone -> MediaRecorder)
Browser->>Browser: Stop & convert to mono 16kHz WAV bytes
Browser->>Core: openhuman.voice_transcribe_bytes(audio_bytes)
Core->>Whisper: Transcribe (in-process or subprocess)
Whisper-->>Core: Transcript / Error
Core-->>Browser: Transcript response
Browser->>Browser: setTranscript -> show Insert/Copy
Browser->>App (Conversations): dispatch dictation://insert-text event
App->>App: Append text to focused textarea
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
- Removed redundant state management for the position of the dictation overlay, consolidating logic to initialize and reset the position based on the current status. - Introduced a new `resetLauncherPosition` function to simplify resetting the overlay's position when necessary. - Updated event handling to ensure the overlay's position is reset appropriately after text insertion actions. This update enhances the clarity and efficiency of the DictationOverlay component, improving user experience during dictation interactions.
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (2)
src/openhuman/local_ai/service/speech.rs (1)
30-39: Serialize lazy model loading to avoid duplicate heavy allocations.Line 30 checks
is_loadedbefore spawning a loader, but concurrent requests can all observe “not loaded” and load simultaneously. With currentwhisper_engine::load_enginebehavior (src/openhuman/local_ai/service/whisper_engine.rs Line 31-Line 55), that can allocate multiple contexts and discard earlier ones.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/local_ai/service/speech.rs` around lines 30 - 39, The check in speech.rs that calls whisper_engine::is_loaded(&self.whisper) then tokio::task::spawn_blocking to run whisper_engine::load_engine can race so multiple tasks load duplicate heavy contexts; serialize the lazy load by adding a shared load guard (e.g., a Mutex/AsyncMutex or singleflight-like token) on the same struct that holds self.whisper, acquire the guard before spawning the blocking load, re-check whisper_engine::is_loaded(&self.whisper) after acquiring the guard to avoid redundant loads, and only call whisper_engine::load_engine when still needed; ensure the guard is released after load completes so concurrent callers wait rather than spawn parallel loads.app/src/utils/tauriCommands.ts (1)
2204-2213: Don't retry with the raw shortcut string.The first attempt already normalizes aliases like
Command,Control, andOption. Retryingshortcutreintroduces those raw tokens, butexpand_dictation_shortcuts()inapp/src-tauri/src/lib.rs:21-47only expandsCmdOrCtrland otherwise forwards the string unchanged. This fallback is not actually compatibility-preserving; it just exercises a less supported path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/utils/tauriCommands.ts` around lines 2204 - 2213, The retry using the raw shortcut string reintroduces unnormalized tokens and is incorrect; in the try/catch around invoke('register_dictation_hotkey') you should stop retrying with the original shortcut variable (remove the second invoke call that uses shortcut), keep the warning/log that normalized registration failed (including err), and either rethrow the error or handle it as a terminal failure so the caller knows registration truly failed; reference the normalizedShortcut variable and the invoke('register_dictation_hotkey') call and note that expand_dictation_shortcuts in app/src-tauri/src/lib.rs should remain the single normalization path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src-tauri/src/lib.rs`:
- Around line 152-199: Do not clear or overwrite DictationHotkeyState until all
unregister/register operations complete successfully: capture the current guard
clone into old_shortcuts, but do not call guard.clear() or assign *guard = ...
yet; attempt to unregister old_shortcuts and register all expanded_shortcuts
(from expand_dictation_shortcuts) while tracking which new variants succeeded,
and if any registration fails roll back by unregistering any newly-registered
variants and leaving the original state intact, returning the error; only after
all registrations succeed, acquire the DictationHotkeyState lock and replace its
contents with expanded_shortcuts. Use the same identifiers
(DictationHotkeyState, old_shortcuts, expanded_shortcuts,
app.global_shortcut().on_shortcut, app_clone) to locate and update the logic.
In `@app/src/components/dictation/DictationOverlay.tsx`:
- Around line 211-219: openhumanAccessibilityInputAction currently treats any
successful RPC response as success and clears the transcript; change this so the
response from openhumanAccessibilityInputAction is awaited into a variable
(response) and you check response.result.accepted (and/or
response.result.blocked) before calling dispatch(resetDictation()). Only call
dispatch(resetDictation()) when accepted === true; if accepted is false (or
blocked), fall back to writing to navigator.clipboard.writeText(transcript) and
then reset state as appropriate. Ensure the existing try/catch still handles
transport errors the same way and that the dispatch/reset occurs only after
confirming acceptance from the response.
In `@app/src/components/dictation/useDictation.ts`:
- Around line 263-276: The RPC response is wrapped in a CommandResponse envelope
so reading response.text causes undefined access; in the
callCoreRpc<TranscribeResult> handling for method
'openhuman.voice_transcribe_bytes' unwrap the envelope and read the
transcription from response.result.text (or use the shared wrapper utility used
elsewhere) before trimming and logging; keep the stale-response guard using
sessionIdRef.current/sessionId unchanged and replace any response.text.trim()
uses with the unwrapped value and a safe null/empty check.
- Around line 330-339: The timeout-based retry in the useEffect that calls
registerDictationHotkey(hotkey) can outlive the current hotkey and re-register a
stale shortcut; fix it by capturing the timer id and clearing it in the effect
cleanup so the delayed retry is cancelled when hotkey changes or the hook
unmounts: store the setTimeout id (e.g., in a local variable or a ref), call
clearTimeout(timerId) in the useEffect return cleanup, and ensure any in-flight
promises don't trigger additional retries after cleanup; update the useEffect
surrounding registerDictationHotkey and its retry to use this cleanup behavior.
- Around line 247-255: The fallback branch in useDictation.ts currently maps
unknown recorder containers (e.g., 'audio/webm') to 'webm', which Rust's
normalize_extension rejects; update the catch block that handles conversionErr
(where blob, bytes, ext and mimeType are set) to only accept supported
extensions ('wav','mp3','m4a','ogg','flac') by mapping known container mime
types to one of those (e.g., map 'webm' -> 'ogg') or throw/abort the fallback
when the blob's mimeType is not in that supported set; ensure you change the
assignment to ext accordingly and keep bytes = Array.from(new
Uint8Array(buffer)) as-is so downstream callers (send/upload functions) receive
a supported extension.
In `@app/src/components/settings/panels/DictationPanel.tsx`:
- Around line 133-141: The UI currently hardcodes a Unix-style path in
DictationPanel.tsx using voiceStatus.stt_model_id which is incorrect on Windows;
update the rendering to derive the model directory from the app/core
configuration or a platform-aware helper (e.g., a provided getModelDirectory or
IPC call to return the correct models path) and then join it with
voiceStatus.stt_model_id (or show both platform-specific examples) instead of
"~/.openhuman/...". Locate the JSX block that references
voiceStatus.stt_model_id and isCheckingStatus and replace the hard-coded path
text with the platform-aware path value sourced from the config/helper so the
message shows the correct path on Windows, macOS, and Linux.
- Around line 62-77: The statusLabel and statusColor helpers currently treat
voiceStatus.stt_model_path as "ready" (green) even when
voiceStatus.stt_available is false; update both functions so only
voiceStatus.stt_available yields the green "Ready (model loaded)"/bg-green-400
state. Add an explicit branch for when voiceStatus.stt_model_path is true but
voiceStatus.stt_available is false (e.g., label "Model found — backend
unavailable" and a neutral/amber color like bg-amber-400), and ensure the checks
reference the existing symbols statusLabel, statusColor, voiceStatus,
stt_available, and stt_model_path.
In `@app/src/pages/Conversations.tsx`:
- Around line 226-242: The ESLint no-undef error is caused by calling the global
requestAnimationFrame directly inside onDictationInsert; update the call to use
window.requestAnimationFrame(...) instead. Locate the onDictationInsert handler
in Conversations.tsx (the function using requestAnimationFrame and
textInputRef.current?.focus()) and replace the bare requestAnimationFrame
invocation with window.requestAnimationFrame to match other files (e.g.,
RotatingTetrahedronCanvas.tsx) and satisfy the app/eslint.config.js globals.
In `@app/src/store/dictationSlice.ts`:
- Around line 34-45: The reducer initialState (initialState, DEFAULT_HOTKEY) is
reading localStorage (impure side effect); remove localStorage access from
dictationSlice initialState and set deterministic defaults (e.g., hotkey =
DEFAULT_HOTKEY, showFloatingLauncher = true/false) so reducers remain pure, then
add persistence for dictation.hotkey and dictation.showFloatingLauncher in the
centralized persistence setup by creating a dictationPersistConfig = { key:
'dictation', storage, whitelist: ['hotkey','showFloatingLauncher'] }, wrapping
dictationReducer with persistReducer(dictationPersistConfig, dictationReducer),
and replacing the plain dictation reducer in the root reducer map with the
persisted reducer.
In `@app/src/utils/tauriCommands.ts`:
- Around line 2186-2205: The function registerDictationHotkey currently attempts
a Tauri invoke even when not running in Tauri; change it to return early when
isTauri() is false (treat browsers as a no-op) to avoid calling invoke and
scheduling retries. Locate registerDictationHotkey and add an early return
before normalization/invoke (or immediately after the isTauri() check) so that
when isTauri() is false the function resolves without calling
invoke('register_dictation_hotkey', ...) while preserving existing debug
logging.
---
Nitpick comments:
In `@app/src/utils/tauriCommands.ts`:
- Around line 2204-2213: The retry using the raw shortcut string reintroduces
unnormalized tokens and is incorrect; in the try/catch around
invoke('register_dictation_hotkey') you should stop retrying with the original
shortcut variable (remove the second invoke call that uses shortcut), keep the
warning/log that normalized registration failed (including err), and either
rethrow the error or handle it as a terminal failure so the caller knows
registration truly failed; reference the normalizedShortcut variable and the
invoke('register_dictation_hotkey') call and note that
expand_dictation_shortcuts in app/src-tauri/src/lib.rs should remain the single
normalization path.
In `@src/openhuman/local_ai/service/speech.rs`:
- Around line 30-39: The check in speech.rs that calls
whisper_engine::is_loaded(&self.whisper) then tokio::task::spawn_blocking to run
whisper_engine::load_engine can race so multiple tasks load duplicate heavy
contexts; serialize the lazy load by adding a shared load guard (e.g., a
Mutex/AsyncMutex or singleflight-like token) on the same struct that holds
self.whisper, acquire the guard before spawning the blocking load, re-check
whisper_engine::is_loaded(&self.whisper) after acquiring the guard to avoid
redundant loads, and only call whisper_engine::load_engine when still needed;
ensure the guard is released after load completes so concurrent callers wait
rather than spawn parallel loads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eb9f0615-77ec-4c1c-8ce6-c6af86f66ed0
⛔ Files ignored due to path filters (1)
app/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
app/src-tauri/Cargo.tomlapp/src-tauri/src/lib.rsapp/src/App.tsxapp/src/components/dictation/DictationOverlay.tsxapp/src/components/dictation/useDictation.tsapp/src/components/settings/SettingsHome.tsxapp/src/components/settings/hooks/useSettingsNavigation.tsapp/src/components/settings/panels/DictationPanel.tsxapp/src/components/settings/panels/ScreenIntelligencePanel.tsxapp/src/pages/Conversations.tsxapp/src/pages/Settings.tsxapp/src/store/dictationSlice.tsapp/src/store/index.tsapp/src/utils/tauriCommands.tssrc/openhuman/local_ai/service/speech.rs
…rocess - Streamlined the logic for registering and unregistering dictation hotkeys, ensuring that old shortcuts are properly managed before new ones are registered. - Introduced rollback mechanisms to restore previous shortcuts in case of registration failures, enhancing reliability. - Simplified error handling and logging for better clarity during the hotkey management process. This update enhances the robustness of the dictation feature by ensuring a smoother transition between hotkey states.
…omponents - Enhanced formatting in DictationOverlay for better clarity in asynchronous action handling. - Streamlined text extraction logic in useDictation for improved readability. - Consolidated model directory setting in DictationPanel to a single line for simplicity. - Improved logging consistency in tauriCommands and speech service files. These changes enhance the maintainability and readability of the dictation-related components.
… changes - Introduced a reference to track the previous status of the dictation overlay. - Updated the effect to reset the overlay's position when transitioning from 'idle' to any other status, enhancing user experience during dictation sessions. This change improves the responsiveness of the DictationOverlay component to status changes, ensuring a smoother interaction for users.
- Modified test assertions in MemoryWorkspace.test.tsx to include a selector for 'span' elements when checking for text presence. - This change enhances the specificity of the tests, ensuring they accurately target the intended elements in the rendered component. These updates improve the reliability of the MemoryWorkspace component tests.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx (1)
74-79: Avoid tag-name-coupled assertions for relation text.On Line 74 and Line 77–79,
{ selector: 'span' }makes the test depend on markup details. Prefer scoping to the Memory Graph area and asserting text behavior within that scope.♻️ Suggested refactor
-import { screen, waitFor } from '@testing-library/react'; +import { screen, waitFor, within } from '@testing-library/react'; ... - await waitFor(() => { - expect(screen.getByText('Alice', { selector: 'span' })).toBeInTheDocument(); - }); - - expect(screen.getByText('AUTHORED', { selector: 'span' })).toBeInTheDocument(); - expect(screen.getByText('Bob', { selector: 'span' })).toBeInTheDocument(); - expect(screen.getByText('REVIEWED', { selector: 'span' })).toBeInTheDocument(); + const memoryGraphHeading = await screen.findByText('Memory Graph'); + const memoryGraphSection = memoryGraphHeading.closest('section'); + expect(memoryGraphSection).not.toBeNull(); + const graph = within(memoryGraphSection as HTMLElement); + + expect(graph.getByText('Alice')).toBeInTheDocument(); + expect(graph.getByText('AUTHORED')).toBeInTheDocument(); + expect(graph.getByText('Bob')).toBeInTheDocument(); + expect(graph.getByText('REVIEWED')).toBeInTheDocument();As per coding guidelines, "Prefer testing behavior over implementation details; use existing helpers from
app/src/test/(test-utils.tsx, shared mock backend) before adding new harness code".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx` around lines 74 - 79, The assertions currently couple to markup by using screen.getByText(..., { selector: 'span' }); update the test in MemoryWorkspace.test.tsx to scope assertions to the Memory Graph area instead: locate the Memory Graph container (e.g., via a test-id/role used in the component or by querying the workspace root returned from your render helper), call within(container) from `@testing-library/react` and then use within(container).getByText('Alice'), within(container).getByText('AUTHORED'), within(container).getByText('Bob'), etc., removing the selector option; also ensure the test uses your existing test helpers (from test-utils.tsx / shared mock backend) to render the component so you don't assert implementation/markup details.src/openhuman/local_ai/service/bootstrap.rs (1)
105-111: Consider clarifying thelocal_ai_statusspawn condition to align with the new bootstrap behavior.The early return for
"degraded"state in bootstrap is intentional and well-documented. However,local_ai_status(ops.rs:168-177) explicitly checks for"idle" | "degraded"before spawning bootstrap:if matches!(status.state.as_str(), "idle" | "degraded") { // spawns bootstrap... }With the new early return, spawning bootstrap when state is
"degraded"becomes a silent no-op—the"degraded"check in this condition is now dead code and could confuse maintainers who see an attempted spawn but no actual work performed.Either remove
"degraded"from the condition (since automatic retries should not retry on degraded per the documented intent), or add a comment explaining that the spawn is intentionally benign.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/local_ai/service/bootstrap.rs` around lines 105 - 111, The spawn condition in local_ai_status (ops.rs) still checks for "idle" | "degraded" while bootstrap.rs now returns early for "degraded", making the "degraded" branch a no-op; update the check in local_ai_status (the if matches!(status.state.as_str(), "idle" | "degraded")) to only match "idle" (remove "degraded") so spawns reflect actual bootstrap behavior, or alternatively add a clear comment in local_ai_status explaining that spawning on "degraded" is intentionally benign because bootstrap.rs will early-return for "degraded"; reference the bootstrap.rs early-return and the local_ai_status spawn condition when making the change.app/src/components/settings/panels/DictationPanel.tsx (1)
100-100: Success message auto-dismiss timeout is not cleared on unmount.The
setTimeouton line 100 can fire after the component unmounts if the user navigates away quickly after saving, causing a state update on an unmounted component. Consider tracking this timer in a ref and clearing it in an effect cleanup.💡 Optional fix using useRef
+ const successTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null); + + useEffect(() => { + return () => { + if (successTimerRef.current) { + clearTimeout(successTimerRef.current); + } + }; + }, []); + // In handleSaveHotkey: setHotkeySuccess(true); - setTimeout(() => setHotkeySuccess(false), 2000); + successTimerRef.current = setTimeout(() => setHotkeySuccess(false), 2000);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/settings/panels/DictationPanel.tsx` at line 100, The setTimeout call that resets setHotkeySuccess in DictationPanel can run after unmount and cause a state update on an unmounted component; fix it by storing the timer id in a ref (e.g., hotkeyTimerRef) when calling setTimeout in the handler that calls setHotkeySuccess, then add a useEffect cleanup in DictationPanel that clears the timeout (clearTimeout(hotkeyTimerRef.current)) and nulls the ref on unmount, and also clear any existing timer before creating a new one so multiple saves don’t leak timers.app/src/components/dictation/DictationOverlay.tsx (1)
111-121: Escape handler dismisses regardless of dictation state.The
Escapekey handler callsdismiss()unconditionally, which will reset dictation state and stop any active recording. This is the intended UX, but consider whether dismissing duringtranscribingstate should show a confirmation or at least log that in-progress transcription was cancelled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/dictation/DictationOverlay.tsx` around lines 111 - 121, The Escape key handler inside the useEffect currently calls dismiss() unconditionally; update the handler to check the dictation state (e.g., a prop/state like dictationState or isTranscribing) and only call dismiss() immediately when not in 'transcribing' state; if in 'transcribing' state, either invoke a confirmation flow (e.g., showConfirmCancel or openCancelModal) before calling dismiss() or at minimum log that an in-progress transcription was cancelled (use console.warn or processLogger) so the cancellation is explicit. Modify the handler function and any related dismissal logic (handler, useEffect, dismiss, and the confirmation modal trigger) to implement this conditional behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src-tauri/src/lib.rs`:
- Around line 239-254: The code clears the tracked shortcuts (guard.clear())
before attempting to unregister them, risking leaked OS shortcuts if an
unregister fails; change the logic in the unregister loop (the block using
state.0 lock, guard, old_shortcuts and app.global_shortcut().unregister) so you
clone or drain the current shortcuts, attempt to unregister each via
app.global_shortcut().unregister(old.as_str()), and only clear or update the
shared guard after all unregister calls succeed (mirror the pattern used in
register_dictation_hotkey); ensure errors return without mutating the guard so
remaining shortcuts remain tracked on failure.
---
Nitpick comments:
In `@app/src/components/dictation/DictationOverlay.tsx`:
- Around line 111-121: The Escape key handler inside the useEffect currently
calls dismiss() unconditionally; update the handler to check the dictation state
(e.g., a prop/state like dictationState or isTranscribing) and only call
dismiss() immediately when not in 'transcribing' state; if in 'transcribing'
state, either invoke a confirmation flow (e.g., showConfirmCancel or
openCancelModal) before calling dismiss() or at minimum log that an in-progress
transcription was cancelled (use console.warn or processLogger) so the
cancellation is explicit. Modify the handler function and any related dismissal
logic (handler, useEffect, dismiss, and the confirmation modal trigger) to
implement this conditional behavior.
In `@app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx`:
- Around line 74-79: The assertions currently couple to markup by using
screen.getByText(..., { selector: 'span' }); update the test in
MemoryWorkspace.test.tsx to scope assertions to the Memory Graph area instead:
locate the Memory Graph container (e.g., via a test-id/role used in the
component or by querying the workspace root returned from your render helper),
call within(container) from `@testing-library/react` and then use
within(container).getByText('Alice'), within(container).getByText('AUTHORED'),
within(container).getByText('Bob'), etc., removing the selector option; also
ensure the test uses your existing test helpers (from test-utils.tsx / shared
mock backend) to render the component so you don't assert implementation/markup
details.
In `@app/src/components/settings/panels/DictationPanel.tsx`:
- Line 100: The setTimeout call that resets setHotkeySuccess in DictationPanel
can run after unmount and cause a state update on an unmounted component; fix it
by storing the timer id in a ref (e.g., hotkeyTimerRef) when calling setTimeout
in the handler that calls setHotkeySuccess, then add a useEffect cleanup in
DictationPanel that clears the timeout (clearTimeout(hotkeyTimerRef.current))
and nulls the ref on unmount, and also clear any existing timer before creating
a new one so multiple saves don’t leak timers.
In `@src/openhuman/local_ai/service/bootstrap.rs`:
- Around line 105-111: The spawn condition in local_ai_status (ops.rs) still
checks for "idle" | "degraded" while bootstrap.rs now returns early for
"degraded", making the "degraded" branch a no-op; update the check in
local_ai_status (the if matches!(status.state.as_str(), "idle" | "degraded")) to
only match "idle" (remove "degraded") so spawns reflect actual bootstrap
behavior, or alternatively add a clear comment in local_ai_status explaining
that spawning on "degraded" is intentionally benign because bootstrap.rs will
early-return for "degraded"; reference the bootstrap.rs early-return and the
local_ai_status spawn condition when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c08ae132-5245-4017-ba6d-c51bd414fbd4
📒 Files selected for processing (12)
PR_DESCRIPTION.mdapp/src-tauri/src/lib.rsapp/src/components/dictation/DictationOverlay.tsxapp/src/components/dictation/useDictation.tsapp/src/components/intelligence/__tests__/MemoryWorkspace.test.tsxapp/src/components/settings/panels/DictationPanel.tsxapp/src/store/dictationSlice.tsapp/src/store/index.tsapp/src/utils/tauriCommands.tssrc/openhuman/local_ai/service/bootstrap.rssrc/openhuman/local_ai/service/mod.rssrc/openhuman/local_ai/service/speech.rs
💤 Files with no reviewable changes (1)
- PR_DESCRIPTION.md
🚧 Files skipped from review as they are similar to previous changes (3)
- app/src/store/index.ts
- src/openhuman/local_ai/service/speech.rs
- app/src/utils/tauriCommands.ts
| let mut guard = state.0.lock().unwrap(); | ||
| if guard.is_empty() { | ||
| log::debug!("[dictation] no shortcut registered — nothing to unregister"); | ||
| } else { | ||
| let old_shortcuts = guard.clone(); | ||
| guard.clear(); | ||
| for old in old_shortcuts { | ||
| log::debug!("[dictation] unregistering shortcut: {old}"); | ||
| app.global_shortcut() | ||
| .unregister(old.as_str()) | ||
| .map_err(|e| { | ||
| log::warn!("[dictation] failed to unregister '{old}': {e}"); | ||
| format!("Failed to unregister shortcut '{old}': {e}") | ||
| })?; | ||
| log::info!("[dictation] shortcut unregistered: {old}"); | ||
| } |
There was a problem hiding this comment.
State cleared before unregister completes — partial unregister can leak shortcuts.
guard.clear() (line 244) discards the tracked shortcuts before attempting to unregister them. If an unregister call fails mid-loop (line 252 returns Err), the remaining shortcuts stay registered at the OS level but are no longer tracked, so they can never be cleaned up.
Consider clearing state only after all shortcuts are successfully unregistered, mirroring the pattern used in register_dictation_hotkey.
🛠️ Suggested fix
async fn unregister_dictation_hotkey(app: AppHandle) -> Result<(), String> {
log::info!("[dictation] unregister_dictation_hotkey: called");
- let state = app.state::<DictationHotkeyState>();
- let mut guard = state.0.lock().unwrap();
- if guard.is_empty() {
+ let old_shortcuts = {
+ let state = app.state::<DictationHotkeyState>();
+ let guard = state.0.lock().unwrap();
+ guard.clone()
+ };
+
+ if old_shortcuts.is_empty() {
log::debug!("[dictation] no shortcut registered — nothing to unregister");
- } else {
- let old_shortcuts = guard.clone();
- guard.clear();
- for old in old_shortcuts {
- log::debug!("[dictation] unregistering shortcut: {old}");
- app.global_shortcut()
- .unregister(old.as_str())
- .map_err(|e| {
- log::warn!("[dictation] failed to unregister '{old}': {e}");
- format!("Failed to unregister shortcut '{old}': {e}")
- })?;
- log::info!("[dictation] shortcut unregistered: {old}");
- }
+ return Ok(());
}
+
+ for old in &old_shortcuts {
+ log::debug!("[dictation] unregistering shortcut: {old}");
+ app.global_shortcut()
+ .unregister(old.as_str())
+ .map_err(|e| {
+ log::warn!("[dictation] failed to unregister '{old}': {e}");
+ format!("Failed to unregister shortcut '{old}': {e}")
+ })?;
+ log::info!("[dictation] shortcut unregistered: {old}");
+ }
+
+ // Clear state only after all unregistrations succeed.
+ {
+ let state = app.state::<DictationHotkeyState>();
+ let mut guard = state.0.lock().unwrap();
+ guard.clear();
+ }
+
Ok(())
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src-tauri/src/lib.rs` around lines 239 - 254, The code clears the tracked
shortcuts (guard.clear()) before attempting to unregister them, risking leaked
OS shortcuts if an unregister fails; change the logic in the unregister loop
(the block using state.0 lock, guard, old_shortcuts and
app.global_shortcut().unregister) so you clone or drain the current shortcuts,
attempt to unregister each via app.global_shortcut().unregister(old.as_str()),
and only clear or update the shared guard after all unregister calls succeed
(mirror the pattern used in register_dictation_hotkey); ensure errors return
without mutating the guard so remaining shortcuts remain tracked on failure.
Summary
voiceRust domain (src/openhuman/voice/) with four RPC endpoints:voice_status,voice_transcribe,voice_transcribe_bytes, andvoice_tts— all wired into the shared controller registry.voice/postprocess.rs) cleans grammar and filler words via local Ollama with graceful fallback to raw Whisper output.useDictationhook,DictationOverlaydraggable panel,DictationPanelsettings page, anddictationSliceRedux store — wired into the existing App/Settings/Store infrastructure.tauri-plugin-global-shortcut;CmdOrCtrlexpands to bothCmdandCtrlvariants on macOS automatically.Problem
Users had no way to dictate text into the app without leaving it to use an external tool. Local STT was partially plumbed in
local_aibut had no user-facing dictation surface and no UI entry point. Tracked in #187.Solution
Rust —
voicedomainA dedicated
src/openhuman/voice/module owns all STT/TTS logic cleanly separated from the existinglocal_aiops surface:ops.rs— business logic for status check, file-path transcription, byte-array transcription, and TTS synthesis; usesnormalize_extension()with strict alphanumeric validation to prevent path traversal.postprocess.rs— optional LLM pass over raw Whisper output; config-gated viavoice_llm_cleanup_enabled; always returns raw text on failure.schemas.rs— controller schemas + registered handlers; plugs into the shared registry (core/all.rs).types.rs— serializable DTOs (VoiceStatus,VoiceSpeechResult,VoiceTtsResult).Tauri shell — global hotkey bridge
register_dictation_hotkey/unregister_dictation_hotkeyTauri commands managetauri-plugin-global-shortcut. On macOS,CmdOrCtrl+Xexpands to two shortcuts (Cmd+XandCtrl+X) sinceCmdOrCtrlis not a native modifier token there. On press, the shell emitsdictation://toggleto all webviews.React —
useDictationhook + overlayuseDictationhooks intodictation://toggle(Tauri event) and a fallbackkeydownlistener (browser).MediaRecordercaptures audio; anOfflineAudioContextpipeline downmixes and resamples to 16 kHz mono WAV before sending toopenhuman.voice_transcribe_bytes. A session-ID guard prevents stale transcription responses from writing to state.DictationOverlayis a draggable, viewport-clamped panel rendered directly insideApp.tsx. Transcript insertion tries (in order): a customdictation://insert-textevent, direct DOM mutation on the last focused editable element, an accessibility action viaopenhumanAccessibilityInputAction, and clipboard fallback.DictationPanel(Settings → Voice Dictation) shows engine status rows, model path guidance, hotkey editor, and the floating-launcher toggle.dictationSlicemanages recording state, transcript, hotkey, andvoice_statusasync thunk; hotkey and launcher preference are persisted tolocalStorage.Submission Checklist
voice/ops.rs:normalize_extension(6 cases) +voice_status(2 cases);voice/schemas.rs: schema stability + contract tests;voice/postprocess.rs: empty/whitespace/disabled-config edge cases.app/test/e2e/specs/voice-mode.spec.tscovers voice status check, recording button visibility, and voice/text mode switching.///; TypeScript exports have inline comments on non-obvious paths.CmdOrCtrlexpansion logic are annotated.Impact
local_ai_transcribe_bytesinlocal_ai/ops.rsis preserved for backward compatibility;voice_transcribe_bytesis the new authoritative path (see follow-up).local_ai.whisper_in_process(defaultfalse) andlocal_ai.voice_llm_cleanup_enabled(defaultfalse); both additive with safe defaults.tauri-plugin-global-shortcut = "2"inapp/src-tauri/Cargo.toml.getUserMedia; macOS will prompt once on first use.Related
local_ai_transcribe_bytes(local_ai/ops.rs) withvoice_transcribe_bytes(voice/ops.rs) — both write a UUID-named temp file and callservice.transcribe(); the voice variant is canonical.DictationPanelrather than linking away to Local AI settings.Summary by CodeRabbit
New Features
Bug Fixes