| title | Desktop App (Electrobun) |
|---|---|
| sidebarTitle | Desktop App |
| description | Install and use the Milady desktop application on macOS, Windows, and Linux with native features and configurable local or remote runtime connectivity. |
The Milady desktop app wraps the companion UI in a native Electrobun shell, adding system-level features like tray icons, global keyboard shortcuts, native notifications, and native OS capability bridges. Electrobun can either launch the canonical Milady runtime locally or connect the UI to an already-running local or remote runtime.
Download the .dmg file from the GitHub releases page. Open the DMG and drag Milady to your Applications folder.
- Which file: On Apple Silicon (M1/M2/M3/M4 and later), use Milady-arm64.dmg. On Intel Macs, use Milady-x64.dmg. If you pick the wrong architecture, the app may not run correctly; see Build & release — why two DMGs.
- Build targets: DMG and ZIP.
- Category: Productivity (
public.app-category.productivity). - Code signed and notarized -- hardened runtime with Apple notarization enabled.
Download the .exe installer (NSIS) from the releases page.
- Build target: NSIS installer.
- Options: Choose installation directory, run elevated if needed.
- Code signed via Azure Code Signing (
milady-code-signcertificate profile).
Download the .AppImage or .deb package from the releases page.
- Build targets: AppImage and deb.
- Category: Utility.
git clone https://github.com/milady-ai/milady.git && cd milady
bun install && bun run build
bun run dev:desktopFor why the desktop dev commands spawn multiple processes, how Ctrl-C and Quit behave, environment variables (MILADY_DESKTOP_VITE_WATCH, MILADY_RENDERER_URL, etc.), and IDE/agent observability (GET /api/dev/stack, aggregated console log, screenshot proxy — why loopback, defaults, and opt-out), see Desktop local development.
In development mode, the Electrobun app resolves the Milady distribution from the repository root's dist/ directory. In packaged builds, assets are copied into the app bundle under Resources/app/milady-dist/.
macOS frameless window chrome (hiddenInset)
On macOS, the main window uses hiddenInset (no classic title bar; traffic lights inset). The WKWebView fills the client area, so window move and inner-edge resize are implemented with native NSView overlays above the web view — not with CSS resize cursors alone. Why: WebKit owns the pointer over page pixels; tracking areas on the contentView underneath led to unreliable cursors and flicker when AppKit and WebKit both tried to set NSCursor.
Strip thickness can track the current NSScreen when the host passes height: 0 into native layout (see main-process applyMacOSWindowEffects and FFI setNativeDragRegion). Full architecture, z-order, and file map: Electrobun macOS window chrome.
Electrobun may log [WebGPU Browser] macOS … using os.release() (Darwin). Why document: on macOS 26, Darwin is still 25.x; a naive Darwin − 9 mapping shows 16 and mis-gates WKWebView WebGPU. Milady maps Darwin to the marketing major in code; rationale and table: Darwin vs macOS version (Electrobun WebGPU).
Product framing: Milady targets strong visuals when you are engaged and quiet hardware when you are not—especially on battery—without pretending every workload beats a full IDE shell. See Roadmap — Principles: energy and experience (desktop).
What drives usage
- Continuous GPU work: the companion VRM scene (WebGL or WebGPU) runs an animation/render loop while the scene is active. Why it matters: macOS attributes GPU time to the app even when you are not interacting; idle VRM + lighting + (optional) Spark/world effects are not free.
- Multiple processes in dev:
dev:desktop/dev:desktop:watchrun API + Vite + Electrobun (+ optional screenshot helper). Why: each process has its own baseline CPU wakeups; this is a dev convenience, not the same as a minimal shipped shell. - Dev screenshot proxy: default-on
GET /api/dev/cursor-screenshotpath uses full-screen capture when agents poll it. Why:screencaptureand compositor work are noticeable if something hits that endpoint often — turn it off when you do not need it (MILADY_DESKTOP_SCREENSHOT_SERVER=0); see Desktop local development.
What Milady already does
- Pauses the avatar engine when the companion scene is not active (
VrmEngine.setPaused/VrmViewer), e.g. when you leave companion mode for native tabs (settings, chat shell) so the 3D loop is not running in the background for those routes. Why:requestAnimationFrame/setAnimationLoopat display refresh is the main avoidable steady-state cost. - Page Visibility:
VrmVieweralso pauses whendocument.visibilityState !== "visible"(background tab / hidden document). Why: WKWebView can keep scheduling frames for a visible canvas; aligning with visibility avoids burning GPU when the user is not looking at Milady. - Background tab polling: dashboard/stream/game/fine-tuning views use
useIntervalWhenDocumentVisible(or equivalent) so 5s / 3s refresh timers do not hit the API while the document is hidden. Eliza Cloud credits polling (60s) skips work when hidden. Why: same battery/thermal story as the VRM loop, for network + React wakeups. - Vector memory 3D graph: the Three.js
requestAnimationFrameloop stops while the tab is hidden and resumes on visible. Why: second WebGL context should not animate off-screen. - Battery → lower pixel ratio (Electrobun): the UI calls
desktop:getPowerStateon a 60s timer, when the renderer becomes ready, and whendocument.visibilityStatereturns to visible (so plugging in is noticed without waiting for the next interval). WhenonBatteryis true,VrmEngine.setLowPowerRenderModecaps effective DPR at 1× on top of the usualMAX_RENDERER_PIXEL_RATIOclamp. Why: fewer shaded pixels when unplugged (e.g. HiDPI laptops). The main process resolves AC vs battery usingpmseton macOS,/sys/class/power_supply(Battery +Discharging) on Linux, andSystemInformation.PowerStatus.PowerLineStatuson Windows (Offline= on battery). Opt-out: set localStoragemilady.vrmBatteryPixelCapto"0"to keep full resolution on battery (user Companion efficiency in Settings → Media can still request low-power on AC). - Companion rendering (Settings → Media): persisted
eliza:companion-vrm-powerisquality(never battery low-power),balanced(low-power on battery when the cap is on), orefficiency(always low-power). Legacy boolean keys migrate once. - Animate in background (opt-in,
eliza:companion-animate-when-hidden): when the window or tab is hidden but companion is still the active scene, the engine stays unpaused and only the world + Spark are hidden so the VRM can idle with lower cost than drawing the full splat scene. - Battery → Spark + shadows: on battery,
setLowPowerRenderModealso disables directional shadow maps on the avatar key light and applies tighter Spark splat limits (maxPixelRadius,minAlpha, sort distance, etc.). Why: shadows and splat sorting are a large share of GPU time in companion/world mode. - Half framerate:
VrmEngine.setHalfFramerateModeskips every other main-loop tick (skipped ticks do not advanceClock, so the next tick’s delta is doubled).setLowPowerRenderModeis separate (DPR / shadows / Spark). Default policy ties half-FPS to “saving power” moments; Settings → Media can set full speed, when saving power, or always half. - Lazy-mounts the 3D stack the first time the companion scene is needed, and defers it while the agent is still
starting/ onboarding is loading. Why: avoids paying WebGL/WebGPU init during the boot path when the UI only needs status and loaders. - Caps renderer pixel ratio (see
MAX_RENDERER_PIXEL_RATIOinVrmEngine) so Retina does not always mean 2× shader cost at 3× physical pixels.
What you can do today
- Use native shell (non-companion) when you mostly want chat/settings without the full-screen avatar. Why:
companionSceneActivestays tied to shell/tab state, so the heavy scene is off when you are not in companion or character flows. - If WebGPU is hotter on your Mac than WebGL for this workload, set the renderer override in localStorage key
eliza.avatarRenderertowebgl(orwebgputo experiment the other way). Why: path differs by machine and OS version; the desktop webview defaults WebGPU in the Electrobun runtime — sometimes the fallback is kinder to thermals. - In dev, disable the screenshot and aggregated console hooks if you do not use them (
MILADY_DESKTOP_SCREENSHOT_SERVER,MILADY_DESKTOP_DEV_LOG).
Code: packages/app-core/src/hooks/useDocumentVisibility.ts, VectorBrowserView.tsx (3D graph), ElizaCloudDashboard.tsx, StreamView.tsx, stream/StreamVoiceConfig.tsx, GameView.tsx, ChatView.tsx (game-modal carryover timer), FineTuningView.tsx, state/AppContext.tsx (cloud credits interval), VrmViewer.tsx, VrmEngine.ts, vrm-desktop-energy.ts.
Electrobun is a native shell, not a separate runtime architecture. Desktop, VPS, sandboxed, and CLI/server deployments all use the same Milady runtime entrypoint. The shell chooses one of three runtime modes at startup:
| Mode | Behavior |
|---|---|
local |
Spawn the canonical Milady runtime locally as a child Bun process |
external |
Do not spawn a local runtime; point the renderer at an explicit API base |
disabled |
Do not auto-start a local runtime; still point the renderer at the expected local API base for a manually managed server |
On startup, the Electrobun shell and AgentManager coordinate these steps:
- Resolve the runtime bundle -- In dev mode, Electrobun finds the repository root
dist/bundle. In packaged builds, the runtime is copied intoResources/app/milady-dist/. - Resolve desktop runtime mode -- Environment variables decide whether the shell should use
local,external, ordisabledruntime mode. - Bootstrap the renderer with an API base -- The static renderer server injects
window.__MILADY_API_BASE__intoindex.htmlbefore React mounts so the UI never falls back to the static server for/api/*requests. - If mode is
local, spawn the canonical runtime -- Electrobun launchesbun run entry.js startas a child process, waits for/api/health, and then pushes the actual bound port to the renderer. - If mode is
external, connect only -- Electrobun does not start a child runtime. The renderer uses the normalized external API base and optional API token. - If mode is
disabled, wait for a manually managed local runtime -- Electrobun does not auto-start the child runtime, but the renderer still targets the expected local API base so a separately managed server can satisfy requests.
Embedded local mode (packaged or dev without external API): the Electrobun main process chooses a listen port for the child milady start process as follows:
- Preferred port —
MILADY_PORT(default 2138). The shell probes 127.0.0.1 and, if that port is busy, uses the next free port (same idea asdev-platform, implemented inloopback-port.ts). Why: two Milady instances or another service may legitimately hold 2138; we should not SIGKILL unrelated processes by default (seeMILADY_AGENT_RECLAIM_STALE_PORTin Desktop local development to opt back into reclaim). - Child env — the spawned process receives the chosen port via
MILADY_PORTsoentry.js startbinds there when possible. - Stdout + health — if the runtime still reports a different bind (legacy / upstream behavior), stdout parsing and
waitForHealthyfollow the actual port before marking the agent running. - Renderer + surfaces —
pushApiBaseToRenderer/injectApiBaseuseAgentManager’s resolved port; status listeners refresh main and detached windows. Why: the dashboard must not keep using a stale loopback URL after a dynamic bind.
external mode: no embedded child; the UI uses MILADY_DESKTOP_API_BASE / related env (e.g. dev-platform sets this to http://127.0.0.1:<resolved API port>). Why: the API may already be running under bun run dev with its own port policy.
disabled mode: no auto-start; the renderer still targets the expected local API base for a process you manage yourself—set MILADY_PORT / MILADY_API_PORT to match that server.
CLI milady start (non-Electrobun): after startApiServer returns, Milady syncs MILADY_PORT, MILADY_API_PORT, and ELIZA_PORT to the actual bound port. Why: if the HTTP stack falls forward to another port, shells and scripts reading env see the same port as /api/health.
The OS menu bar template is built in apps/app/electrobun/src/application-menu.ts and wired in index.ts (application-menu-clicked). Why a data file: the same structure is validated by tests and stays free of platform branches scattered through the main process.
| Item (example) | Action id | Behavior |
|---|---|---|
| Reset Milady… | reset-milady |
Main process: shows the window, native confirm, then POST /api/agent/reset, embedded restart or POST /api/agent/restart, poll /api/status, and pushes desktopTrayMenuClick with itemId: "menu-reset-milady-applied" + agentStatus. Renderer: handleResetAppliedFromMain runs the same local UI wipe as the end of Settings handleReset (completeResetLocalStateAfterServerWipe). Why main owns HTTP: after native dialogs, WKWebView can defer renderer fetch/bridge work, so reset looked hung; why renderer still wipes UI: one place for onboarding, MiladyClient base URL, cloud flags, and conversation lists so the menu cannot drift from Settings. |
Settings still uses handleReset (webview confirm + full flow). Legacy: tray may still emit menu-reset-milady for older paths; see Desktop main-process reset for sequence, probes, and tests.
The embedded agent reports its state to the UI via IPC:
| State | Meaning |
|---|---|
not_started |
Agent has not been started yet |
starting |
Agent is initializing (API server may already be available) |
running |
Agent is active and accepting requests |
stopped |
Agent has been shut down |
error |
Agent encountered a fatal error |
For testing, remote connectivity, or locally managed runtime workflows:
| Environment Variable | Effect |
|---|---|
MILADY_DESKTOP_TEST_API_BASE |
Use this API base and switch to external mode |
MILADY_DESKTOP_API_BASE |
Use this API base and switch to external mode |
MILADY_API_BASE_URL / MILADY_API_BASE |
Generic API-base fallback vars; also switch to external mode |
MILADY_DESKTOP_SKIP_EMBEDDED_AGENT=1 |
Switch to disabled mode; do not auto-start the child runtime |
MILADY_API_TOKEN |
Inject an API authentication token into the renderer |
The desktop app registers 10 native modules via IPC, each providing platform-specific capabilities. All modules are initialized in initializeNativeModules() and their IPC handlers are registered in registerAllIPC(). Every module follows a singleton pattern with a dedicated manager class.
Local embedded runtime management via the AgentManager class.
| IPC Channel | Description |
|---|---|
agent:start |
Start the local child runtime when desktop mode is local |
agent:stop |
Stop the local child runtime |
agent:restart |
Stop and restart the runtime, picking up config changes |
agent:status |
Get the current AgentStatus object |
In external and disabled mode, agent:start rejects instead of spawning the embedded runtime. The agent also emits agent:status events to the renderer whenever local-runtime state changes.
Core native desktop features via the DesktopManager class. This is the largest module, covering eight subsystems:
System Tray -- Create, update, and destroy tray icons with context menus. Supports tooltip, title (macOS), icons for menu items, and submenus. Tray events (click, double-click, right-click) are forwarded to the renderer with modifier key state and cursor coordinates.
Global Keyboard Shortcuts -- Register system-wide hotkeys that work even when the app is not focused. Each shortcut has a unique ID and an desktop accelerator string. When pressed, a desktop:shortcutPressed event is sent to the renderer.
| IPC Channel | Description |
|---|---|
desktop:registerShortcut |
Register a global shortcut by ID and accelerator |
desktop:unregisterShortcut |
Unregister a shortcut by ID |
desktop:unregisterAllShortcuts |
Remove all registered shortcuts |
desktop:isShortcutRegistered |
Check if an accelerator is currently registered |
Auto-Launch -- Configure the app to start on system login, optionally hidden, via desktop:setAutoLaunch and desktop:getAutoLaunchStatus.
Window Management -- Programmatic control over the main window. Supports size, position, min/max dimensions, resizability, always-on-top, fullscreen, opacity, vibrancy (macOS), background color, and more. Window events (focus, blur, maximize, minimize, restore, close) are forwarded to the renderer.
Native Notifications -- Rich notifications with actions, reply support, urgency levels, and click handling. Each notification gets a unique auto-incremented ID. Supports click, action, reply, and close event callbacks forwarded to the renderer.
Power Monitoring -- Battery state, idle time detection, and suspend/resume events. Emits desktop:powerSuspend, desktop:powerResume, desktop:powerOnAC, and desktop:powerOnBattery events.
Clipboard Operations -- Read and write text, HTML, RTF, and images to the system clipboard.
Shell Operations -- Open external URLs in the default browser, reveal files in Finder/Explorer, and trigger system beeps.
Network discovery for finding Milady gateway servers on the local network via the GatewayDiscovery class. Uses mDNS/Bonjour for service discovery with the _milady._tcp service type.
The module dynamically loads discovery libraries in priority order:
- mdns (native, faster)
- bonjour-service (pure JS, more portable)
- bonjour or mdns-js (fallback alternatives)
Discovered gateways include metadata from TXT records: stable ID, TLS configuration, gateway port, canvas port, and Tailnet DNS name. Events (found, updated, lost) are forwarded to the renderer via gateway:discovery.
| IPC Channel | Description |
|---|---|
gateway:startDiscovery |
Begin scanning with optional service type and timeout |
gateway:stopDiscovery |
Stop active discovery |
gateway:getDiscoveredGateways |
List all currently known gateways |
gateway:isDiscovering |
Check if discovery is active |
Full conversation mode via the TalkModeManager class, integrating speech-to-text (STT) and text-to-speech (TTS).
STT Engines:
- Whisper (default) -- Offline speech recognition using
whisper-nodewith configurable model sizes:tiny,base,small,medium,large. Supports word-level timing and streaming transcription. - Web Speech API -- Falls back to the browser's built-in speech recognition when Whisper is unavailable.
TTS Engines:
- ElevenLabs -- High-quality streaming TTS via the ElevenLabs API. Configurable voice ID, model ID (default:
eleven_v3), stability, similarity boost, and speed. Audio chunks are streamed to the renderer as base64-encoded data. - System TTS -- Falls back to the renderer's browser speech synthesis.
Voice Activity Detection (VAD): Configurable silence threshold and duration for automatic speech segmentation.
| State | Meaning |
|---|---|
idle |
Talk mode is off |
listening |
Actively capturing and transcribing audio |
processing |
Processing captured speech |
speaking |
TTS is playing audio |
error |
An error occurred |
Audio data flows from the renderer to the main process via talkmode:audioChunk IPC messages as Float32Array samples.
Wake word detection for hands-free activation via the SwabbleManager class. Uses Whisper for continuous speech transcription combined with a WakeWordGate that performs timing-based wake word matching.
Configuration:
triggers-- Array of wake word phrases (e.g.,["milady", "hey milady"])minPostTriggerGap-- Minimum pause (seconds) after the wake word before the command starts (default: 0.45s)minCommandLength-- Minimum number of words in the command after the wake word (default: 1)modelSize-- Whisper model size to use
The wake word gate includes fuzzy matching for common transcription variations (e.g., "melody" matches "milady", "okay" matches "ok").
When a wake word is detected, a swabble:wakeWord event is sent to the renderer containing the matched trigger, extracted command, full transcript, and the post-trigger gap measurement.
Native screenshot and screen recording via the ScreenCaptureManager class.
Screenshots: Capture the primary screen, a specific source, or the main window. Supports PNG and JPEG formats with configurable quality. Screenshots can be saved to the user's Pictures directory.
Screen Recording: Uses a hidden BrowserWindow renderer for MediaRecorder-based recording (since MediaRecorder requires a renderer context). Supports configurable quality presets, FPS, bitrate, system audio, and max duration auto-stop. Recordings are saved as WebM (VP9 preferred) to the system temp directory.
| Quality | Bitrate |
|---|---|
low |
1 Mbps |
medium |
4 Mbps |
high |
8 Mbps |
highest |
16 Mbps |
Recording supports pause/resume and provides real-time state updates including duration and file size.
Camera capture for photo and video via the CameraManager class. Like screen recording, this uses a hidden BrowserWindow renderer for getUserMedia / MediaRecorder access.
Features:
- Device enumeration with direction detection (front/back/external)
- Live preview with configurable resolution and frame rate
- Photo capture in JPEG, PNG, or WebP with quality control
- Video recording with configurable quality, bitrate, audio, and max duration
- Permission checking and requesting
| Quality | Video Bitrate |
|---|---|
low |
1 Mbps |
medium |
2.5 Mbps |
high |
5 Mbps |
highest |
8 Mbps |
Auxiliary BrowserWindow management via the CanvasManager class. Each canvas is a separate window used for web navigation, JavaScript evaluation, page snapshots, and A2UI (Agent-to-UI) message injection.
| IPC Channel | Description |
|---|---|
canvas:createWindow |
Create a new canvas window (default 1280x720, hidden) |
canvas:destroyWindow |
Close and dispose a canvas window |
canvas:navigate |
Navigate a canvas to a URL |
canvas:eval |
Execute JavaScript in the canvas page |
canvas:snapshot |
Capture a screenshot (supports sub-rectangles) |
canvas:a2uiPush |
Inject an A2UI message payload |
canvas:a2uiReset |
Reset A2UI state on the page |
canvas:show / canvas:hide |
Toggle visibility |
canvas:resize |
Resize with optional animation |
canvas:listWindows |
List all active canvas windows |
Canvas windows emit canvas:didFinishLoad, canvas:didFailLoad, and canvas:windowClosed events to the main renderer.
GPS and geolocation services via the LocationManager class using IP-based geolocation.
The module queries multiple IP geolocation services as fallbacks: ip-api.com, ipapi.co, and freegeoip.app. It supports single position queries, position watching (polling at configurable intervals), and caching of the last known location.
System permission management via the PermissionManager class with platform-specific implementations for macOS, Windows, and Linux.
Managed permissions:
| Permission ID | Name | Platforms | Required For |
|---|---|---|---|
accessibility |
Accessibility | macOS | Computer use, browser control |
screen-recording |
Screen Recording | macOS | Computer use, vision |
microphone |
Microphone | All | Talk mode, voice |
camera |
Camera | All | Camera, vision |
shell |
Shell Access | All | Shell/terminal commands |
Permission states are cached for 30 seconds (configurable). The shell permission includes a soft toggle -- it can be disabled in the UI without affecting the OS-level permission.
IPC channels include permissions:getAll, permissions:check, permissions:request, permissions:openSettings, permissions:checkFeature, and permissions:setShellEnabled.
The desktop app registers these global keyboard shortcuts:
| Shortcut | Action |
|---|---|
Cmd/Ctrl+K |
Open the Command Palette |
Cmd/Ctrl+E |
Open the Emote Picker |
These shortcuts work system-wide when the app is running. Additional shortcuts can be registered dynamically via the desktop:registerShortcut IPC channel.
The desktop app supports the milady:// custom URL protocol for deep linking. The protocol is registered via the Electrobun deep linking integration.
The milady://share URL scheme allows external applications to share content with your agent:
milady://share?title=Hello&text=Check+this+out&url=https://example.com
Parameters:
title-- optional title for the shared content.text-- optional text body.url-- optional URL to share.file-- one or more file paths (can be repeated).
File drag-and-drop from the OS is also supported via the desktop runtime open-file event. Share payloads are queued if the main window is not yet ready and flushed once the renderer finishes loading. Events are dispatched as milady:share-target custom DOM events.
The desktop app checks for updates on launch via the Electrobun updater, publishing to GitHub releases under the milady-ai/milady repository.
In development mode:
- A file watcher (chokidar) monitors the web asset directory and auto-reloads the app when files change (1.5-second debounce).
- Content Security Policy is adjusted for development --
localhostanddevtools://*origins are allowed for scripts. - DevTools open automatically on DOM ready.
- Content Security Policy -- Applied to all windows. The policy is intentionally permissive to support third-party embedded apps that may require WebAssembly and external scripts.
- Window navigation -- External URLs are blocked from the main window and opened in the default browser. Only the custom scheme and localhost origins are allowed.
- Context isolation -- All
BrowserWindowinstances usecontextIsolation: trueandnodeIntegration: false. - SSRF protection -- Custom action HTTP handlers block requests to private/internal network addresses. See Custom Actions.