Muesli

Local-first dictation & meeting transcription for macOS
100% on-device speech-to-text · Zero cloud costs · Privacy by default

What is Muesli?

Muesli is a lightweight native macOS app that combines WisprFlow-style dictation and Granola-style meeting transcription in one tool. All transcription runs locally on Apple Silicon — your audio never leaves your device unless you want to (meeting summaries).

Dictation

Hold your hotkey (or double-tap for hands-free mode) → speak → release → transcribed text is pasted at your cursor. ~0.13 second latency via Parakeet TDT on the Apple Neural Engine.

Meeting Transcription

Start a meeting recording → Muesli captures your mic (You) and system audio (Others) simultaneously → VAD-driven chunked transcription happens during the meeting at natural speech boundaries → speaker diarization identifies individual remote speakers (Speaker 1, Speaker 2, etc.) → when you stop, the transcript is ready in seconds, not minutes. Generate structured meeting notes via OpenAI, free OpenRouter models, or your ChatGPT Plus/Pro subscription.

Features

Native Swift, zero Python — Pure Swift app with CoreML and Metal backends. No bundled runtimes, no subprocess IPC.
Multiple ASR models — Parakeet TDT (Neural Engine), Cohere Transcribe 2B (mixed precision CoreML), Whisper Small/Medium/Large Turbo (Metal via whisper.cpp), and Qwen3 ASR (52 languages, CoreML).
Hold-to-talk & hands-free — Hold hotkey for quick dictation, or double-tap for sustained recording.
Meeting recording — Captures mic + system audio (including Bluetooth/AirPods) via ScreenCaptureKit.
VAD-driven chunk rotation — Silero VAD detects natural speech boundaries in real-time, splitting mic audio at pauses instead of fixed intervals. No mid-sentence cuts.
Speaker diarization — Identifies individual speakers in system audio (Speaker 1, Speaker 2, etc.) using FluidAudio's pyannote-based CoreML diarization model.
Camera-based meeting detection — Instantly detects when your webcam turns on (CoreMediaIO event listener). Camera active = meeting detected, no matter which app.
Filler word removal — Automatically strips "uh", "um", "er", "hmm" and verbal disfluencies.
AI meeting notes — BYOK with OpenAI or OpenRouter, or sign in with your ChatGPT Plus/Pro subscription (no API key needed). Auto-generated meeting titles. Re-summarize any meeting.
ChatGPT OAuth — Sign in with your existing ChatGPT subscription via browser-based OAuth (PKCE). Tokens stored in the app support directory with owner-only file permissions.
Personal dictionary — Add custom words and replacement pairs. Jaro-Winkler fuzzy matching auto-corrects transcription output.
Model management — Download, delete, and switch between models from the Models tab. Background downloads that don't block the app.
Meeting auto-detection — Detects when Zoom, Chrome, Teams, FaceTime, or Slack activates the mic or camera. Shows a notification to start recording.
Configurable hotkeys — Choose any modifier key (Cmd, Option, Ctrl, Fn, Shift) for dictation.
Onboarding — First-launch wizard with model selection, permissions setup, hotkey configuration, and optional API key entry.
Dark & light mode — Adaptive theme with toggle in Settings.
SwiftUI dashboard — Dictation history, meeting notes (Notes-style split view), dictionary, models, shortcuts, settings, about page.
Floating indicator — Draggable pill showing recording state, waveform animation, click-to-stop for meetings.

Install

Download (recommended)

Download the latest .dmg from Releases, open it, and drag Muesli to your Applications folder.

Homebrew

brew tap pHequals7/muesli
brew install --cask muesli

Build from source

Requirements: macOS 14.2+, Xcode 16+

# Clone
git clone https://github.com/pHequals7/muesli.git
cd muesli

# Build and install to /Applications
./scripts/build_native_app.sh

The transcription model (~450MB for Parakeet v3) downloads automatically on first use.

Agent CLI

Muesli bundles an agent-friendly local CLI inside the app bundle:

Installed path: /Applications/Muesli.app/Contents/MacOS/muesli-cli
Dev path: native/MuesliNative/.build/arm64-apple-macosx/debug/muesli-cli

The CLI is designed for coding agents such as Codex and Claude Code. It exposes meetings, dictations, raw transcripts, and stored notes as stable JSON so an agent can analyze them with its own model and write notes back without requiring a user-supplied OpenAI or OpenRouter key.

What agents should do

Discover the CLI:

command -v muesli-cli || echo "/Applications/Muesli.app/Contents/MacOS/muesli-cli"

Inspect the command contract:

/Applications/Muesli.app/Contents/MacOS/muesli-cli spec

List recent meetings or dictations:

/Applications/Muesli.app/Contents/MacOS/muesli-cli meetings list --limit 10
/Applications/Muesli.app/Contents/MacOS/muesli-cli dictations list --limit 10

Fetch a full record:

/Applications/Muesli.app/Contents/MacOS/muesli-cli meetings get 125
/Applications/Muesli.app/Contents/MacOS/muesli-cli dictations get 42

Summarize or analyze locally in the agent.

Write improved meeting notes back:

cat notes.md | /Applications/Muesli.app/Contents/MacOS/muesli-cli meetings update-notes 125 --stdin

Commands

muesli-cli spec
muesli-cli info
muesli-cli meetings list [--limit N] [--folder-id ID]
muesli-cli meetings get <id>
muesli-cli meetings update-notes <id> (--stdin | --file <path>)
muesli-cli dictations list [--limit N]
muesli-cli dictations get <id>

JSON contract

All CLI commands return JSON on stdout.

Success shape:

{
  "ok": true,
  "command": "muesli-cli meetings get",
  "data": {},
  "meta": {
    "schemaVersion": 1,
    "generatedAt": "2026-03-17T00:00:00Z",
    "dbPath": "/Users/example/Library/Application Support/Muesli/muesli.db",
    "warnings": []
  }
}

Failure shape:

{
  "ok": false,
  "command": "muesli-cli meetings get 999",
  "error": {
    "code": "not_found",
    "message": "No meeting exists with id 999.",
    "fix": "Run `muesli-cli meetings list` to find a valid ID."
  },
  "meta": {
    "schemaVersion": 1,
    "generatedAt": "2026-03-17T00:00:00Z",
    "dbPath": "",
    "warnings": []
  }
}

Important meeting fields:

rawTranscript
formattedNotes
notesState
calendarEventID
micAudioPath
systemAudioPath

notesState values:

missing
raw_transcript_fallback
structured_notes

Notes for agent authors

The CLI is JSON-first and intended to be machine-consumed.
formattedNotes is the only write-back surface in v1.
rawTranscript is read-only and should be treated as source material.
If notesState is missing or raw_transcript_fallback, agents should prefer summarizing from rawTranscript.
Use --db-path or --support-dir only when the default Muesli data location is wrong.

Models

Model	Backend	Runtime	Size	Languages	Latency
Parakeet v3 (recommended)	FluidAudio	CoreML / Neural Engine	~450 MB	25 languages	~0.13s
Parakeet v2	FluidAudio	CoreML / Neural Engine	~450 MB	English only	~0.13s
Cohere Transcribe 2B	CoreML	FP16 encoder + INT8 decoder	~3.8 GB	English	~1s
Qwen3 ASR	FluidAudio	CoreML / Neural Engine	~1.3 GB	52 languages	~2-3s
Whisper Small	whisper.cpp	Metal / CPU	~190 MB	English only	~1-2s
Whisper Medium	whisper.cpp	Metal / CPU	~1.5 GB	English only	~2-3s
Whisper Large Turbo	whisper.cpp	Metal / CPU	~600 MB	Multilingual	~2-4s

Cohere Transcribe is a 2B parameter model (#1 on Open ASR Leaderboard) running in mixed precision — FP16 FastConformer encoder on the Neural Engine with INT8 quantized decoders. Includes VAD-gated silence detection to prevent hallucination. Best for high-accuracy English dictation.

Models download on demand from HuggingFace. Manage them from the Models tab in the dashboard.

Permissions

Muesli needs these macOS permissions (guided during onboarding):

Permission	Why
Microphone	Record audio for dictation and meetings
System Audio Recording	Capture call audio from Zoom/Meet/Teams
Accessibility	Simulate Cmd+V to paste transcribed text
Input Monitoring	Detect hotkey presses globally
Calendar (optional)	Auto-detect upcoming meetings

Architecture

┌──────────────────────────────────────────────────────┐
│  Native Swift / SwiftUI App                          │
│  ├── FluidAudio (Parakeet TDT + Qwen3 ASR on ANE)   │
│  ├── Cohere Transcribe (FP16+INT8 CoreML on ANE)     │
│  ├── SwiftWhisper (whisper.cpp on Metal/CPU)          │
│  ├── Silero VAD (streaming voice activity detection)  │
│  ├── Speaker Diarization (pyannote CoreML on ANE)     │
│  ├── ChatGPTAuthManager (OAuth PKCE + WHAM API)       │
│  ├── CameraActivityMonitor (CoreMediaIO listeners)    │
│  ├── StreamingMicRecorder (AVAudioEngine real-time)    │
│  ├── FillerWordFilter (uh/um removal)                 │
│  ├── CustomWordMatcher (Jaro-Winkler fuzzy)           │
│  ├── HotkeyMonitor (configurable modifier keys)       │
│  ├── SystemAudioRecorder (ScreenCaptureKit)           │
│  ├── MeetingSession (VAD-driven chunked transcription)│
│  ├── MeetingSummaryClient (OpenAI / OpenRouter / ChatGPT) │
│  ├── FloatingIndicatorController (UI pill)            │
│  └── SwiftUI Dashboard (dictations, meetings,         │
│       dictionary, models, shortcuts, settings)        │
└──────────────────────────────────────────────────────┘

Everything runs in-process. No subprocesses, no IPC, no Python runtime.

Tech Stack

Component	Technology
App	Swift, AppKit, SwiftUI
Primary ASR	FluidAudio (Parakeet TDT + Qwen3 ASR on CoreML/ANE)
Cohere ASR	Cohere Transcribe (FP16 encoder + INT8 decoder on CoreML)
Whisper ASR	SwiftWhisper (whisper.cpp on Metal)
Voice activity	Silero VAD via FluidAudio (streaming, event-driven)
Speaker diarization	pyannote via FluidAudio (CoreML on ANE)
Camera detection	CoreMediaIO property listeners (event-driven)
System audio	ScreenCaptureKit (`SCStream`)
Meeting notes	OpenAI / OpenRouter (BYOK) or ChatGPT subscription (OAuth)
Word correction	Jaro-Winkler similarity (native Swift)
Storage	SQLite (WAL mode)
Signing	Developer ID + hardened runtime (notarization ready)

Contributing

Contributions welcome! To get started:

git clone https://github.com/pHequals7/muesli.git
cd muesli
swift build --package-path native/MuesliNative -c release
swift test --package-path native/MuesliNative
./scripts/test_packaged_cli.sh

168 tests covering model configuration, custom word matching, filler removal, transcription routing, data persistence, CLI contract/path-resolution logic, speaker diarization alignment, token consolidation, camera-based meeting detection, and ChatGPT OAuth logic.

Current test scope:

Covered by tests: CLI command contract generation, CLI path-resolution logic, SQLite read/write behavior, note-state classification, and meeting/dictation retrieval/update flows.
Not covered by Swift unit tests: app-bundle packaging and copying muesli-cli into /Applications/Muesli.app/Contents/MacOS.
Packaging is verified by scripts/test_packaged_cli.sh, which builds an isolated app bundle, checks that Contents/MacOS/muesli-cli exists and is executable, and runs muesli-cli spec from the packaged path.

Please open an issue before submitting large PRs.

Support

If Muesli saves you time, consider supporting development:

Acknowledgements

FluidAudio — CoreML speech models for Apple devices (Parakeet TDT, Qwen3 ASR, Silero VAD, speaker diarization)
SwiftWhisper — Swift wrapper for whisper.cpp
whisper.cpp — C/C++ Whisper inference
ScreenCaptureKit by Apple — system audio capture
NVIDIA Parakeet — FastConformer TDT speech recognition model
Cohere Transcribe — 2B parameter autoregressive ASR (#1 Open ASR Leaderboard)
Qwen3-ASR — Multilingual speech recognition (52 languages)
pyannote — Speaker diarization (via FluidAudio CoreML conversion)

License

MIT — free and open source.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github		.github
assets		assets
design-system		design-system
docs		docs
native/MuesliNative		native/MuesliNative
scripts		scripts
skills/muesli-agent		skills/muesli-agent
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Muesli

What is Muesli?

Dictation

Meeting Transcription

Features

Install

Download (recommended)

Homebrew

Build from source

Agent CLI

What agents should do

Commands

JSON contract

Notes for agent authors

Models

Permissions

Architecture

Tech Stack

Contributing

Support

Acknowledgements

License

About

Uh oh!

Releases 7

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Muesli

What is Muesli?

Dictation

Meeting Transcription

Features

Install

Download (recommended)

Homebrew

Build from source

Agent CLI

What agents should do

Commands

JSON contract

Notes for agent authors

Models

Permissions

Architecture

Tech Stack

Contributing

Support

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages