fix(voice): anti-hallucination, clipboard paste, Fn key reliability by senamakel · Pull Request #380 · tinyhumansai/openhuman

senamakel · 2026-04-07T01:09:20Z

Summary

Add silence/energy detection (RMS threshold) to skip transcription of silent audio — the Feat/gitbooks #1 cause of whisper hallucinations
Add hallucination output filtering (40+ patterns: [BLANK_AUDIO], "thank you", repeated words, single noise words)
Add whisper initial_prompt support with custom dictionary and recent transcript context for better accuracy across consecutive recordings
Switch text insertion from character-by-character typing (enigo text()) to clipboard-paste (Cmd+V) — fixes garbled/repeated output
Fix Fn key reliability on macOS — missed release events no longer break subsequent presses
Enable LLM cleanup by default (skip_cleanup: false) and align cleanup prompt with OpenWhispr's CLEANUP_PROMPT
Add silence_threshold and custom_dictionary to voice server config (TOML + RPC + UI)

Problem

The voice server had several issues making it unreliable for production use:

Hallucinations: Pressing Fn with no speech produced "Thank you", "Hello world" repeated dozens of times, or other fabricated text. Whisper.cpp hallucinates when fed silent/noisy audio.
Garbled output: Text insertion via enigo.text() typed character-by-character, causing macOS to duplicate/reorder characters at speed.
Missed Fn presses: macOS doesn't reliably deliver KeyRelease for the Fn key. The is_active state machine got stuck, eating every other press.
No LLM cleanup: skip_cleanup defaulted to true, so the LLM post-processing pass never ran — raw whisper output went straight to the text field.
No vocabulary bias: Whisper had no context about what words to expect, making it guess wildly on ambiguous audio.

Solution

Anti-hallucination (inspired by OpenWhispr):

Compute peak RMS energy during recording via lock-free atomic tracking in the audio capture callback
Skip transcription entirely when peak_rms < silence_threshold (default 0.002, matching OpenWhispr)
Filter whisper output against 40+ known hallucination patterns before insertion
Set whisper.cpp temperature_inc=0 to disable temperature fallback (was retrying with randomness, producing "Thank you")
Enable suppress_blank, suppress_nst, single_segment mode, and explicit no_speech_thold=0.6
Pass initial_prompt to whisper with custom dictionary words + rolling recent transcripts

Clipboard-paste insertion (matching OpenWhispr's approach):

Save clipboard → write text → 120ms settle delay → simulate Cmd+V → 450ms restore delay → restore clipboard
Uses arboard crate for clipboard access, enigo only for the single Cmd+V keystroke

Fn key fix:

Simplified hotkey callback: is_active tracks press/release normally
When release IS delivered (your logs showed it does sometimes), it works
When release is missed, next press sees is_active=true and sends fallback Released
Transcription runs in background (tokio::spawn) so the hotkey loop is never blocked

LLM cleanup:

Changed skip_cleanup default to false
Replaced 2-line cleanup prompt with OpenWhispr's full CLEANUP_PROMPT (handles self-corrections, spoken punctuation, prompt injection protection)
Added info!-level logs showing LLM state so issues are diagnosable

Config & UI:

Added silence_threshold and custom_dictionary to VoiceServerConfig (TOML schema, RPC, controller schema inputs)
VoicePanel: custom dictionary tag editor, silence threshold control
Fixed 2s polling timer overwriting in-progress form edits

Submission Checklist

Unit tests — 57+ voice tests pass (cargo test -p openhuman -- voice::)
E2E / integration — Manual testing of Fn key press/release, silence detection, hallucination filtering, clipboard paste
Doc comments — All new public functions and config fields documented
Inline comments — Whisper params, hallucination patterns, clipboard timing documented with rationale

Impact

Desktop only — all changes in Rust core (src/openhuman/voice/) and React settings UI
New dependency: arboard = "3" for cross-platform clipboard access
Config migration: existing configs with skip_cleanup: true will keep that value (serde default only applies to missing fields)
Breaking behavior: LLM cleanup now runs by default if local LLM is ready. Users who prefer raw output can set skip_cleanup: true in settings.

- Changed the default global hotkey for dictation from "CmdOrCtrl+Shift+D" to "Fn" in both the configuration schema and the associated documentation. - Updated the hotkey parsing function to recognize "Fn" as a valid key, enhancing the flexibility of hotkey configurations. - Added a test case to ensure the "Fn" key can be parsed correctly, improving the robustness of the hotkey handling functionality.

- Changed the default activation mode for voice and dictation from "tap" to "push" in the respective configuration schemas. - Updated the default hotkey for voice commands from "ctrl+shift+space" to "Fn" across various modules and documentation. - Adjusted related tests to reflect the new defaults, ensuring consistency in behavior and expectations.

- Added a new asynchronous function `start_if_enabled` to the voice server module, which initializes the embedded voice server based on configuration settings. - Updated the server run logic to check if the voice server should auto-start, enhancing the startup process for the core application. - Integrated the new server startup function into the main server run logic, ensuring the voice server is launched if enabled in the configuration.

- Introduced a new `VoicePanel` component to handle voice server configurations, including startup options, hotkeys, and runtime controls. - Updated routing in the settings page to include the new voice settings section. - Enhanced the `useSettingsNavigation` hook to support navigation to the voice settings. - Added tests for the `VoicePanel` to ensure functionality and reliability of the voice server management features.

…alization - Revised comments in `DictationHotkeyManager` to clarify the component's mounting process within the app tree. - Removed unused imports and unnecessary state management from `ServiceBlockingGate`, streamlining the component's logic. - Updated tests for `ServiceBlockingGate` to reflect changes in behavior, ensuring accurate rendering of child components based on service status. - Enhanced the `Cargo.lock` file by updating dependencies to their latest versions for improved stability and security.

…l options - Changed the default value of `skip_cleanup` in the voice server configuration from `false` to `true` to improve transcription handling. - Reordered options in the `VoicePanel` component to ensure "Natural cleanup" is displayed alongside "Verbatim transcription" for better user clarity. - Updated tests to reflect the new default settings and ensure proper functionality of the VoicePanel component.

- Introduced a new module `window.ts` containing functions for managing window visibility and state in a Tauri application. - Implemented commands to show, hide, toggle visibility, minimize, maximize, close, and set the title of the main window. - Added checks to ensure commands are only executed in a Tauri environment, enhancing compatibility with web contexts.

- Introduced multiple new modules for Tauri commands, including `accessibility`, `autocomplete`, `config`, `conscious`, `core`, `cron`, `hardware`, `localAi`, and `window`. - Each module contains functions for managing specific functionalities such as accessibility permissions, autocomplete suggestions, configuration settings, and hardware interactions. - Implemented checks to ensure commands are executed only in a Tauri environment, enhancing compatibility and reliability. - This addition significantly expands the command capabilities of the Tauri application, providing a robust framework for future development.

- Added functionality to transcribe audio with an optional initial prompt, allowing for vocabulary bias and improved conversational continuity. - Updated the `transcribe_pcm_f32` and `transcribe_wav_file` functions to accept an `initial_prompt` parameter, enhancing recognition of specific vocabulary. - Implemented peak RMS energy tracking during audio recording for silence detection, ensuring recordings below a defined threshold are skipped. - Enhanced the voice server configuration to include a silence threshold and custom dictionary for better transcription context. - Introduced methods to build and manage recent transcripts for improved continuity across consecutive recordings. - Updated tests to validate new features and ensure proper functionality of the transcription process.

…tom dictionary - Added `silence_threshold` to the `VoiceServerConfig` for improved silence detection, allowing recordings with low RMS energy to be skipped. - Introduced `custom_dictionary` to bias transcription towards specific vocabulary, enhancing recognition of names and technical terms. - Updated the `voice_transcribe` and `voice_transcribe_bytes` functions to utilize the new `initial_prompt` parameter for better context during transcription. - Adjusted the `transcribe_pcm_i16` function to accept additional parameters for improved flexibility in handling audio input.

…VoicePanel - Implemented a new input for setting the silence threshold, allowing recordings with low RMS energy to be skipped. - Added functionality for a custom dictionary, enabling users to add specific vocabulary words to improve transcription accuracy. - Updated the VoiceServerSettings interface and related functions to support the new features, ensuring seamless integration with existing settings. - Enhanced the UI in the VoicePanel to facilitate user interaction with the new settings.

…ce server command - Added `silence_threshold` and `custom_dictionary` parameters to the `run_voice_server_command` function, ensuring these settings are utilized during voice server operations. - Enhanced integration with the existing voice server configuration to support improved transcription accuracy and silence detection.

- Adjusted import paths for `callCoreRpc` in multiple Tauri command modules to ensure correct referencing from the updated directory structure. - This change enhances module organization and maintains consistency across the codebase.

… and versions - Added new dependencies including `arboard`, `fax`, `fax_derive`, `gethostname`, `half`, `quick-error`, `tiff`, and `x11rb` to enhance functionality and support for clipboard operations, transcription improvements, and system interactions. - Updated existing dependencies to their latest versions for better performance and compatibility. - Modified `VoicePanel` to utilize the updated settings and ensure proper handling of voice server configurations. - Enhanced the text input mechanism to use clipboard-paste for improved reliability in text insertion.

- Changed the default value of `skip_cleanup` in `VoiceServerConfig` from `true` to `false` to align with expected behavior and improve transcription handling. - Added detailed logging in the transcription cleanup process to provide better insights into the LLM state and cleanup decisions. - Removed unused functions related to unreliable key releases in hotkey handling to simplify the codebase. - Updated tests to reflect the new default settings for `skip_cleanup` and ensure proper functionality across components.

…l tests - Updated tests for the VoicePanel component to include new parameters: `silence_threshold` and `custom_dictionary`. - Ensured that the tests reflect the latest configuration settings for improved transcription accuracy and functionality.

senamakel added 18 commits April 6, 2026 16:18

style: apply linter formatting fixes

b32b0f9

fix(voice): remove unused warn import in hotkey module

7ee5ae9

senamakel merged commit 3851d1e into tinyhumansai:main Apr 7, 2026
4 checks passed

senamakel deleted the fix/monday-prod branch April 7, 2026 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice): anti-hallucination, clipboard paste, Fn key reliability#380

fix(voice): anti-hallucination, clipboard paste, Fn key reliability#380
senamakel merged 18 commits intotinyhumansai:mainfrom
senamakel:fix/monday-prod

senamakel commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented Apr 7, 2026

Summary

Problem

Solution

Submission Checklist

Impact

Related

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant