Transform your voice into optimized prompts with AI-powered speech-to-text
A professional VSCode/Cursor extension that captures audio from your microphone, transcribes it using OpenAI Whisper, and intelligently transforms natural speech into structured, optimized prompts ready for LLM agents.
- Install the extension (VSIX or Marketplace when available)
- Run Setup Wizard — Command Palette →
Promptimize: Setup Wizard - Configure OpenAI API key — Required for Whisper voice-to-text
- Optionally choose optimization provider — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, or Cursor
- Press
Cmd+Alt+V(Transcribe) orCmd+Alt+P(Promptimize) and speak
See the full Quick Start Guide and Recording Modes.
| Service | Provider | Required | Credentials |
|---|---|---|---|
| Transcription | OpenAI Whisper | Yes | OpenAI API key |
| Prompt optimization | Your choice | No | Provider-specific API key |
graph LR
Voice[Your Voice] --> Whisper[OpenAI Whisper<br/>Transcription]
Whisper --> RawText[Raw Text]
RawText --> Choice{Optimization<br/>Enabled?}
Choice -->|No| Editor[Insert to Editor]
Choice -->|Yes| Provider[Your Chosen Provider]
Provider --> OptimizedText[Optimized Prompt]
OptimizedText --> Editor
Eliminate the friction between thinking and coding.
Developers often have complex architectural ideas, detailed requirements, or intricate technical explanations that are tedious to type but natural to speak. Promptimize bridges this gap by:
- Capturing your spoken thoughts in real-time
- Transcribing them with high accuracy using OpenAI Whisper
- Transforming natural speech into structured, technical prompts
- Inserting them automatically into your editor or Cursor chat
1. Think about complex architecture requirements
2. Struggle to type everything out
3. Lose train of thought while typing
4. End up with unstructured, verbose prompts
5. LLM misunderstands due to poor formatting
1. Press Cmd+Alt+V
2. Speak naturally about your requirements
3. Extension transcribes and optimizes automatically
4. Structured prompt appears in your editor/chat
5. LLM understands perfectly
- ✅ Two Recording Modes — Transcribe (raw text) and Promptimize (optimized prompts)
- ✅ One-Click Recording — Dual status bar buttons or keyboard shortcuts
- ✅ High-Quality Transcription — OpenAI Whisper API integration
- ✅ Prompt Transformation — AI-powered optimization via 8 providers
- ✅ Multiple AI Providers — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, and Cursor
- ✅ Configuration Webview — Interactive setup panel with provider comparison and system prompt editor
- ✅ Smart Insertion — Chat → editor → clipboard fallback chain
- ✅ Visual Feedback — Status bar states and progress notifications
- ✅ Secure Configuration — API keys stored in VSCode SecretStorage
- ✅ Cross-Platform — Works on macOS, Windows, and Linux
- 🔄 Real-time Streaming — See transcription as you speak
- 🔄 Custom Vocabulary UI — Project-specific terms in configuration webview
- 🔄 Recording History — Review and re-use past transcriptions
- 🔄 Planned settings —
audioQuality,maxRecordingDuration,showNotifications(defined but not yet applied)
Promptimize follows Clean/Hexagonal Architecture for maximum maintainability, testability, and scalability.
┌─────────────────────────────────────────────────────┐
│ Presentation Layer │
│ (Commands, Status Bar) │
└────────────┬────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────┐
│ Application Layer │
│ (Use Cases, Ports/Interfaces, DTOs) │
└────────────┬────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────┐
│ Domain Layer │
│ (Entities, Value Objects, Business Logic) │
└─────────────────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────┐
│ Infrastructure Layer │
│ (OpenAI Whisper, Native Audio Capture, Config, Storage) │
└─────────────────────────────────────────────────────┘
See docs/architecture/ for detailed architecture documentation.
- TypeScript 5.4+ - Type-safe development
- VSCode Extension API 1.120+ - Extension foundation
- Node.js 22 LTS - Runtime environment
- Webpack 5 - Bundling and optimization
- OpenAI API - Whisper for transcription, GPT-4 for prompt transformation
- @kstonekuan/audio-capture - Native cross-platform microphone capture
- VSCode SecretStorage - Secure credential management
- Jest - Unit testing
- ESLint + Prettier - Code quality and formatting
- Husky - Git hooks for pre-commit checks
- Open VSCode/Cursor
- Go to Extensions (
Cmd+Shift+X/Ctrl+Shift+X) - Search for "Promptimize"
- Click Install
- Download the latest
.vsixfile from Releases - Open VSCode/Cursor
- Go to Extensions
- Click "..." menu → "Install from VSIX..."
- Select the downloaded file
The extension was renamed to Promptimize (vypdev publisher). If you previously installed cursor-whisper:
- Uninstall the old Cursor Whisper extension
- Install
promptimize-*.vsix(or the new Marketplace listing when available) - Re-enter API keys (SecretStorage keys changed to
promptimize.apiKey.*) - Update
settings.json: replacecursorWhisper.*withpromptimize.* - Update custom keybindings that reference
cursor-whisper.*commands
- After installation, run Promptimize: Setup Wizard (opens automatically on first launch)
- Enter your OpenAI API key — required for Whisper transcription
- Choose whether to enable prompt optimization and select a provider
- Provide provider credentials when prompted (Anthropic, Google, Azure, etc.)
- Test your configuration with Promptimize: Test Configuration
Note: Whisper transcription always uses OpenAI. Prompt optimization is optional and can use a different provider with its own API key.
Open Settings (Cmd+, / Ctrl+,) and search for "Promptimize":
{
"promptimize.transcriptionLanguage": "en",
"promptimize.enablePromptTransformation": true,
"promptimize.transformationProvider": "openai",
"promptimize.transformationModel": "gpt-4o",
"promptimize.audioQuality": "high",
"promptimize.maxRecordingDuration": 120,
"promptimize.showNotifications": true
}| Setting | Description |
|---|---|
| OpenAI API key | Required for voice-to-text. Configure via Setup Wizard or Configure OpenAI API Key (Whisper) |
transcriptionLanguage |
Language for transcription (en, es, auto, etc.) |
Cost: ~$0.006/minute of audio
Prompt optimization converts transcribed speech into structured prompts. Choose a provider and supply credentials when required.
| Setting | Description |
|---|---|
enablePromptTransformation |
Enable/disable optimization |
transformationProvider |
openai, anthropic, google, azure, ollama, opencode, openrouter, cursor |
transformationModel |
OpenAI model (when provider is openai) |
anthropicModel |
Claude model (when provider is anthropic) |
googleModel |
Gemini model (when provider is google) |
azureEndpoint / azureDeployment |
Azure OpenAI resource settings |
ollamaBaseUrl / ollamaModel |
Local Ollama server settings |
openCodeBaseUrl / openCodeModel |
Local OpenCode proxy settings |
openRouterModel |
OpenRouter model (when provider is openrouter) |
cursorModel |
Cursor model (when provider is cursor) |
Use Promptimize: Configure Prompt Optimization Provider to set up interactively. See docs/configuration/ for provider setup.
| Setting | Type | Default | Description |
|---|---|---|---|
transcriptionLanguage |
string | "auto" |
Language for transcription (en, es, fr, de, auto) |
enablePromptTransformation |
boolean | true |
Transform transcription into optimized prompts |
transformationProvider |
string | "openai" |
LLM provider for transformation (openai, anthropic, google, azure, ollama, opencode, openrouter, cursor) |
transformationModel |
string | "gpt-4o" |
OpenAI model for transformation |
transcriptionHint |
string | "" |
Optional Whisper vocabulary hint (Settings only) |
audioQuality |
string | "high" |
Planned — not yet applied (always 16 kHz mono) |
maxRecordingDuration |
number | 120 |
Planned — not yet applied |
showNotifications |
boolean | true |
Planned — not yet applied |
- Node.js 22+ installed (via nvm; see
.nvmrc) - VSCode or Cursor IDE
- OpenAI API key
# Clone the repository
git clone https://github.com/vypdev/promptimize
cd promptimize
# Install dependencies (requires Node 22 — run `nvm use` first)
pnpm install
# Compile TypeScript
pnpm run compile- Open the project in VSCode/Cursor
- Press
F5to start debugging - A new "Extension Development Host" window will open
- The extension will be loaded in this window
- In the Extension Development Host window:
- Open Command Palette (
Cmd/Ctrl+Shift+P) - Type: "Promptimize: Configure API Key"
- Paste your OpenAI API key (starts with
sk-...) - The key is securely stored in your system's Keychain/Credential Manager
- Open Command Palette (
-
Start Recording:
- Press
Cmd/Ctrl+Alt+V(or click "Voice" in the status bar) - Recording starts immediately in the background
- Press
-
Record Audio:
- Speak clearly into your microphone
- Ensure Cursor has microphone access in System Settings (macOS) or Privacy settings (Windows)
-
Stop Recording:
- Press the stop command or status bar action when done
-
Wait for Processing:
- Audio is transcribed (~5-10 seconds)
- Text is optimized with GPT-4 (optional)
- Text is automatically inserted into the active editor
-
Check Status:
- Status bar shows current state
- Notifications show progress and errors
# Compile TypeScript
pnpm run compile
# Run linter
pnpm run lint
# Run tests (when available)
pnpm test
# Package extension (includes all platform native binaries)
pnpm run package
# Verify VSIX contains all platform binaries
pnpm run package:verifyTo create a VSIX that works across all platforms (macOS, Linux, Windows):
pnpm run packageThis will:
- Install all platform-specific native binaries (
darwin-arm64,darwin-x64,linux-x64-gnu,win32-x64-msvc) - Bundle them into the VSIX (~2.5MB total)
- Create
promptimize-X.X.X.vsix
To verify all binaries are included:
pnpm run package:verifyExpected output:
audio-capture-darwin-arm64audio-capture-darwin-x64audio-capture-linux-x64-gnuaudio-capture-win32-x64-msvc
Current Build: ✅ SUCCESS (577 KB bundle)
Promptimize has two modes — see Recording Modes for full details.
| Mode | Shortcut | Output |
|---|---|---|
| Transcribe | Cmd/Ctrl+Alt+V |
Raw Whisper transcription |
| Promptimize | Cmd/Ctrl+Alt+P |
Optimized structured prompt |
- Open your editor or Cursor chat
- Press
Cmd+Alt+V(Transcribe) orCmd+Alt+P(Promptimize) - Speak naturally about your requirements
- Click the status bar (Recording...) to stop
- Transcribed or optimized text appears automatically
Three items appear in the status bar (right side):
| Item | Idle | Recording |
|---|---|---|
| Transcribe | $(mic) Transcribe | $(record) Recording... (click to stop) |
| Promptimize | $(sparkle) Promptimize | $(record) Recording... (click to stop) |
| Settings | $(gear) Settings | Available during recording |
During processing, progress appears in notifications (Transcribing..., Optimizing..., Inserting...).
Spoken Input:
"I need to refactor the authentication service to support JWT tokens instead of sessions. We should maintain backward compatibility with existing session-based auth for 6 months. Also need unit tests for the new JWT validation logic and integration tests for the auth flow."
Optimized Output:
## Refactor Authentication Service to JWT
### Context
- Current implementation: session-based authentication
- Target implementation: JWT tokens
### Objectives
1. Implement JWT token generation and validation
2. Maintain backward compatibility with session-based auth
3. Provide 6-month deprecation period for sessions
### Technical Requirements
- JWT library integration
- Token validation middleware
- Session-to-JWT migration path
### Testing Requirements
- Unit tests for JWT validation logic
- Integration tests for complete auth flow
- Backward compatibility tests for sessions
### Timeline
- 6-month deprecation period for session-based authThe status bar reflects recorder states; fine-grained progress (Transcribing, Optimizing) appears in notifications.
| State | Status Bar | Description |
|---|---|---|
| Idle |
|
Ready to record |
| Recording | $(record) Recording... | Actively recording (click to stop) |
| Processing | $(sync~spin) Processing... | Preparing audio after stop |
| Error | Error styling | Something went wrong |
See UX States for the full state reference.
| Shortcut | Action |
|---|---|
Cmd+Alt+V / Ctrl+Alt+V |
Start Transcribe recording |
Cmd+Alt+P / Ctrl+Alt+P |
Start Promptimize recording |
Escape |
Cancel recording (while recording) |
Shortcuts start recording only — stop by clicking the status bar. See Keyboard Shortcuts.
| Command | Purpose |
|---|---|
Promptimize: Start Transcribe Recording |
Start raw transcription |
Promptimize: Stop Transcribe Recording |
Stop and process Transcribe |
Promptimize: Start Promptimize Recording |
Start optimized prompt |
Promptimize: Stop Promptimize Recording |
Stop and process Promptimize |
Promptimize: Cancel Recording |
Discard recording |
Promptimize: Open Configuration |
Configuration webview |
Promptimize: Configure OpenAI API Key (Whisper) |
Set Whisper API key |
Promptimize: Configure Prompt Optimization Provider |
Provider setup wizard |
Promptimize: Configure OpenAI Optimization Model |
Pick GPT model (OpenAI only) |
Promptimize: Test Configuration |
Test setup; opens results webview |
Promptimize: Setup Wizard |
Opens configuration panel |
Deprecated: (Deprecated) Start Recording and (Deprecated) Stop Recording — use mode-specific commands instead.
- Audio files are temporary - Deleted immediately after transcription
- No local storage - Audio is never written to disk
- API keys are encrypted - Stored in VSCode SecretStorage
- No telemetry - Zero analytics or usage tracking
- HTTPS only - All API calls are encrypted
Your OpenAI API key is:
- Stored in VSCode's secure credential storage (SecretStorage)
- Never exposed in logs or error messages
- Never sent anywhere except OpenAI's official API
- Accessible only by this extension
The extension requests microphone access:
- macOS: System Settings → Privacy & Security → Microphone
- Windows: Settings → Privacy → Microphone
- Linux: System-dependent, usually automatic
- Node.js 22+ (via nvm; see
.nvmrc) - pnpm
- VSCode 1.120+ for testing
# Clone the repository
git clone https://github.com/vypdev/promptimize.git
cd promptimize
# Install dependencies (requires Node 22 — run `nvm use` first)
pnpm install
# Build the extension
pnpm run compile
# Run tests
pnpm test
# Watch mode for development
pnpm run watchpromptimize/
├── src/
│ ├── application/ # Use cases and ports
│ ├── domain/ # Business entities
│ ├── infrastructure/ # External integrations
│ ├── presentation/ # UI and commands
│ ├── shared/ # Utilities and constants
│ └── extension.ts # Entry point
├── docs/ # Comprehensive documentation
├── test/ # Unit and integration tests
└── package.json
See docs/architecture/ for detailed structure documentation.
- Open the project in VSCode
- Press
F5to launch Extension Development Host - The extension will be active in the new window
- Test recording with
Cmd+Alt+V
Automated tests cover use cases, transformers, and UI components — see docs/testing/strategy.md.
source scripts/ensure-node.sh && pnpm test- Unit tests: Use cases and adapters with mocked ports (priority)
- Manual smoke tests: Real recording → transcription → insertion before release
See docs/testing/strategy.md for critical test priorities and manual checklist.
- ✅ Dual recording modes (Transcribe + Promptimize)
- ✅ Whisper transcription
- ✅ Prompt transformation (8 providers)
- ✅ Configuration webview
- ✅ Chat / editor / clipboard insertion
- ✅ API key configuration
- 🔄 Apply planned settings (
audioQuality,maxRecordingDuration,showNotifications) - 🔄 Transformation preview before insert
- 🔄 Transcription language in configuration webview
- 🔄 Context-aware insertion improvements
- 🔄 Push-to-talk mode
- 🔄 Real-time streaming transcription
- 🔄 Recording history
- 🔄 Edit before insert
- 🔄 Custom vocabulary UI
- 🔄 Technical term correction
- 🔄 Full production release
- 🔄 Performance optimization
- 🔄 Extensive testing
See PROGRESS.md for current project status.
We welcome contributions! See docs/standards/coding-conventions.md for coding standards and development workflow.
- Clean Architecture - Maintain clear layer separation
- Type Safety - Strong TypeScript typing everywhere
- Testability - Write testable, pure functions
- Documentation - Document decisions and complex logic
- User Experience - Prioritize UX over technical complexity
- Compatibility First - Real-world compatibility over theoretical solutions
- User Experience - Minimal friction, maximum productivity
- Maintainability - Clean code over clever hacks
- Scalability - Built to grow and evolve
- Privacy - User data never leaves their control
- Testability: Business logic independent of frameworks
- Flexibility: Easy to swap implementations (e.g., different STT providers)
- Maintainability: Clear responsibilities and boundaries
- Scalability: Add features without breaking existing code
- Testability: Easy to mock dependencies
- Flexibility: Configure different implementations
- Maintainability: Clear dependency graph
See the full Troubleshooting Guide with decision trees.
macOS:
- Go to System Settings → Privacy & Security → Microphone
- Ensure VSCode/Cursor is enabled
Windows:
- Go to Settings → Privacy → Microphone
- Ensure VSCode/Cursor has permission
Linux:
- Permissions are usually automatic
- Check
pavucontrolif using PulseAudio
- Verify your OpenAI API key is valid
- Check you have credits in your OpenAI account
- Ensure audio duration is between 0.1s and 5 minutes
- Check file size doesn't exceed 25MB
- Ensure you have an active editor or chat input focused
- Check the status bar for error messages
- Try manually pasting from clipboard (fallback behavior)
Promptimize works best in:
- Classic Mode (
cursor --classic) - Editor Window
Transcriptions and optimized prompts are never written to logs. For troubleshooting, use the status bar, progress notifications, and error dialogs. Enable the Promptimize output channel only for operational messages (timestamps, durations, error types)—not user speech content.
MIT License - see LICENSE file for details.
- OpenAI - Whisper and GPT-4 APIs
- VSCode Team - Excellent extension API and documentation
- Cursor Team - Innovation in AI-powered development
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@promptimize.dev
- Documentation
- Recording Modes
- Configuration Webview Guide
- Architecture Docs
- Configuration Guide
- Troubleshooting
- Project Progress
Made with ❤️ for developers who think faster than they type