fix(voice): anti-hallucination, clipboard paste, Fn key reliability#380
Merged
senamakel merged 18 commits intotinyhumansai:mainfrom Apr 7, 2026
Merged
fix(voice): anti-hallucination, clipboard paste, Fn key reliability#380senamakel merged 18 commits intotinyhumansai:mainfrom
senamakel merged 18 commits intotinyhumansai:mainfrom
Conversation
- Changed the default global hotkey for dictation from "CmdOrCtrl+Shift+D" to "Fn" in both the configuration schema and the associated documentation. - Updated the hotkey parsing function to recognize "Fn" as a valid key, enhancing the flexibility of hotkey configurations. - Added a test case to ensure the "Fn" key can be parsed correctly, improving the robustness of the hotkey handling functionality.
- Changed the default activation mode for voice and dictation from "tap" to "push" in the respective configuration schemas. - Updated the default hotkey for voice commands from "ctrl+shift+space" to "Fn" across various modules and documentation. - Adjusted related tests to reflect the new defaults, ensuring consistency in behavior and expectations.
- Added a new asynchronous function `start_if_enabled` to the voice server module, which initializes the embedded voice server based on configuration settings. - Updated the server run logic to check if the voice server should auto-start, enhancing the startup process for the core application. - Integrated the new server startup function into the main server run logic, ensuring the voice server is launched if enabled in the configuration.
- Introduced a new `VoicePanel` component to handle voice server configurations, including startup options, hotkeys, and runtime controls. - Updated routing in the settings page to include the new voice settings section. - Enhanced the `useSettingsNavigation` hook to support navigation to the voice settings. - Added tests for the `VoicePanel` to ensure functionality and reliability of the voice server management features.
…alization - Revised comments in `DictationHotkeyManager` to clarify the component's mounting process within the app tree. - Removed unused imports and unnecessary state management from `ServiceBlockingGate`, streamlining the component's logic. - Updated tests for `ServiceBlockingGate` to reflect changes in behavior, ensuring accurate rendering of child components based on service status. - Enhanced the `Cargo.lock` file by updating dependencies to their latest versions for improved stability and security.
…l options - Changed the default value of `skip_cleanup` in the voice server configuration from `false` to `true` to improve transcription handling. - Reordered options in the `VoicePanel` component to ensure "Natural cleanup" is displayed alongside "Verbatim transcription" for better user clarity. - Updated tests to reflect the new default settings and ensure proper functionality of the VoicePanel component.
- Introduced a new module `window.ts` containing functions for managing window visibility and state in a Tauri application. - Implemented commands to show, hide, toggle visibility, minimize, maximize, close, and set the title of the main window. - Added checks to ensure commands are only executed in a Tauri environment, enhancing compatibility with web contexts.
- Introduced multiple new modules for Tauri commands, including `accessibility`, `autocomplete`, `config`, `conscious`, `core`, `cron`, `hardware`, `localAi`, and `window`. - Each module contains functions for managing specific functionalities such as accessibility permissions, autocomplete suggestions, configuration settings, and hardware interactions. - Implemented checks to ensure commands are executed only in a Tauri environment, enhancing compatibility and reliability. - This addition significantly expands the command capabilities of the Tauri application, providing a robust framework for future development.
- Added functionality to transcribe audio with an optional initial prompt, allowing for vocabulary bias and improved conversational continuity. - Updated the `transcribe_pcm_f32` and `transcribe_wav_file` functions to accept an `initial_prompt` parameter, enhancing recognition of specific vocabulary. - Implemented peak RMS energy tracking during audio recording for silence detection, ensuring recordings below a defined threshold are skipped. - Enhanced the voice server configuration to include a silence threshold and custom dictionary for better transcription context. - Introduced methods to build and manage recent transcripts for improved continuity across consecutive recordings. - Updated tests to validate new features and ensure proper functionality of the transcription process.
…tom dictionary - Added `silence_threshold` to the `VoiceServerConfig` for improved silence detection, allowing recordings with low RMS energy to be skipped. - Introduced `custom_dictionary` to bias transcription towards specific vocabulary, enhancing recognition of names and technical terms. - Updated the `voice_transcribe` and `voice_transcribe_bytes` functions to utilize the new `initial_prompt` parameter for better context during transcription. - Adjusted the `transcribe_pcm_i16` function to accept additional parameters for improved flexibility in handling audio input.
…VoicePanel - Implemented a new input for setting the silence threshold, allowing recordings with low RMS energy to be skipped. - Added functionality for a custom dictionary, enabling users to add specific vocabulary words to improve transcription accuracy. - Updated the VoiceServerSettings interface and related functions to support the new features, ensuring seamless integration with existing settings. - Enhanced the UI in the VoicePanel to facilitate user interaction with the new settings.
…ce server command - Added `silence_threshold` and `custom_dictionary` parameters to the `run_voice_server_command` function, ensuring these settings are utilized during voice server operations. - Enhanced integration with the existing voice server configuration to support improved transcription accuracy and silence detection.
- Adjusted import paths for `callCoreRpc` in multiple Tauri command modules to ensure correct referencing from the updated directory structure. - This change enhances module organization and maintains consistency across the codebase.
… and versions - Added new dependencies including `arboard`, `fax`, `fax_derive`, `gethostname`, `half`, `quick-error`, `tiff`, and `x11rb` to enhance functionality and support for clipboard operations, transcription improvements, and system interactions. - Updated existing dependencies to their latest versions for better performance and compatibility. - Modified `VoicePanel` to utilize the updated settings and ensure proper handling of voice server configurations. - Enhanced the text input mechanism to use clipboard-paste for improved reliability in text insertion.
- Changed the default value of `skip_cleanup` in `VoiceServerConfig` from `true` to `false` to align with expected behavior and improve transcription handling. - Added detailed logging in the transcription cleanup process to provide better insights into the LLM state and cleanup decisions. - Removed unused functions related to unreliable key releases in hotkey handling to simplify the codebase. - Updated tests to reflect the new default settings for `skip_cleanup` and ensure proper functionality across components.
…l tests - Updated tests for the VoicePanel component to include new parameters: `silence_threshold` and `custom_dictionary`. - Ensured that the tests reflect the latest configuration settings for improved transcription accuracy and functionality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[BLANK_AUDIO], "thank you", repeated words, single noise words)initial_promptsupport with custom dictionary and recent transcript context for better accuracy across consecutive recordingstext()) to clipboard-paste (Cmd+V) — fixes garbled/repeated outputskip_cleanup: false) and align cleanup prompt with OpenWhispr's CLEANUP_PROMPTsilence_thresholdandcustom_dictionaryto voice server config (TOML + RPC + UI)Problem
The voice server had several issues making it unreliable for production use:
enigo.text()typed character-by-character, causing macOS to duplicate/reorder characters at speed.KeyReleasefor the Fn key. Theis_activestate machine got stuck, eating every other press.skip_cleanupdefaulted totrue, so the LLM post-processing pass never ran — raw whisper output went straight to the text field.Solution
Anti-hallucination (inspired by OpenWhispr):
peak_rms < silence_threshold(default 0.002, matching OpenWhispr)temperature_inc=0to disable temperature fallback (was retrying with randomness, producing "Thank you")suppress_blank,suppress_nst,single_segmentmode, and explicitno_speech_thold=0.6initial_promptto whisper with custom dictionary words + rolling recent transcriptsClipboard-paste insertion (matching OpenWhispr's approach):
arboardcrate for clipboard access,enigoonly for the single Cmd+V keystrokeFn key fix:
is_activetracks press/release normallyis_active=trueand sends fallbackReleasedtokio::spawn) so the hotkey loop is never blockedLLM cleanup:
skip_cleanupdefault tofalseCLEANUP_PROMPT(handles self-corrections, spoken punctuation, prompt injection protection)info!-level logs showing LLM state so issues are diagnosableConfig & UI:
silence_thresholdandcustom_dictionarytoVoiceServerConfig(TOML schema, RPC, controller schema inputs)Submission Checklist
cargo test -p openhuman -- voice::)Impact
src/openhuman/voice/) and React settings UIarboard = "3"for cross-platform clipboard accessskip_cleanup: truewill keep that value (serde default only applies to missing fields)skip_cleanup: truein settings.Related
/references/openwhispr) — silence detection, clipboard paste, CLEANUP_PROMPT, custom dictionary patterns