Skip to content

Latest commit

 

History

History
179 lines (131 loc) · 4.89 KB

File metadata and controls

179 lines (131 loc) · 4.89 KB
title Text-to-Speech Plugin
sidebarTitle TTS
description Text-to-speech plugin for Milady — ElevenLabs, OpenAI TTS, and Edge TTS voice synthesis.

The Text-to-Speech (TTS) plugin enables Milady agents to synthesize speech from text, providing voice responses through ElevenLabs, OpenAI TTS, or Microsoft Edge TTS.

Package: @elizaos/plugin-tts

Overview

The TTS plugin registers a TEXT_TO_SPEECH model handler and actions that allow agents to generate audio from text. Generated audio can be played in voice channels (Discord, Telegram voice), saved to files, or streamed to the client.

Installation

milady plugins install tts

Enable via Features

{
  "features": {
    "tts": true
  }
}

Providers

ElevenLabs

High-quality voice synthesis with voice cloning and emotion control.

Package: @elizaos/plugin-elevenlabs

Environment Variable Required Description
ELEVENLABS_API_KEY Yes ElevenLabs API key from elevenlabs.io
ELEVENLABS_VOICE_ID No Voice ID (default: Rachel)
ELEVENLABS_MODEL_ID No Model ID (default: eleven_turbo_v2_5)
{
  "features": {
    "tts": {
      "enabled": true,
      "provider": "elevenlabs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM",
      "modelId": "eleven_turbo_v2_5"
    }
  }
}

OpenAI TTS

{
  "features": {
    "tts": {
      "enabled": true,
      "provider": "openai",
      "voice": "alloy",
      "model": "tts-1"
    }
  }
}

Requires OPENAI_API_KEY.

Edge TTS (Free, No API Key)

Package: @elizaos/plugin-edge-tts

Microsoft Edge TTS is free and requires no API key. Synthesis is performed through Microsoft’s Edge TTS cloud (node-edge-tts talks to Microsoft’s service). Quality is lower than ElevenLabs but suitable for development.

Milady default: When @elizaos/plugin-agent-orchestrator is loaded, Milady automatically adds @elizaos/plugin-edge-tts so swarm / PTY paths that call TEXT_TO_SPEECH have a handler. That means a default install with the orchestrator can make outbound calls to Microsoft whenever those code paths run TTS—even if you never enabled “TTS” in features.

Opt out of auto-load: set MILADY_DISABLE_EDGE_TTS=1 (or ELIZA_DISABLE_EDGE_TTS=1) in the environment or ~/.milady/.env, or disable the plugin entry: plugins.entries["edge-tts"].enabled: false. See Environment variables (MILADY_DISABLE_EDGE_TTS).

{
  "features": {
    "tts": {
      "enabled": true,
      "provider": "edge-tts",
      "voice": "en-US-AriaNeural"
    }
  }
}

ElevenLabs Voice Options

Popular Voices

Voice ID Name Description
21m00Tcm4TlvDq8ikWAM Rachel Calm, professional female
AZnzlk1XvdvUeBnXmlld Domi Strong female
EXAVITQu4vr4xnSDxMaL Bella Soft female
ErXwobaYiN019PkySvjV Antoni Well-rounded male
MF3mGyEYCl7XYWbV9V6O Elli Emotional female
TxGEqnHWrfWFTfGW9XjX Josh Deep male

Browse all voices at elevenlabs.io/voice-library.

ElevenLabs Models

Model ID Description
eleven_turbo_v2_5 Fastest, lowest latency
eleven_turbo_v2 Fast, good quality
eleven_multilingual_v2 Multilingual support
eleven_monolingual_v1 English only, high quality

OpenAI TTS Options

Voices

Voice Description
alloy Neutral
echo Male
fable British male
onyx Deep male
nova Female
shimmer Soft female

Models

Model Description
tts-1 Faster, lower latency
tts-1-hd Higher quality

Actions

Action Description
SPEAK Convert text to speech and play/return audio
GENERATE_AUDIO Generate an audio file from text
SET_VOICE Change the active voice

Usage Examples

After the plugin is loaded:

"Read this article to me"

"Say the following in a cheerful voice: Welcome to Milady!"

"Generate an audio file from this text"

Voice Channel Integration

When combined with Discord or Telegram connectors, the TTS plugin enables voice channel support:

  • Discord: Agent joins voice channels and speaks responses
  • Telegram: Agent sends voice messages as .ogg files

Output Formats

Format Use Case
mp3 Streaming, Discord, general
ogg_vorbis Telegram voice messages
pcm Low-latency streaming
wav Archival, high quality

Related