Skip to content

DavidValin/vtmate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

287 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

vtmate

The final AI voice conversational system all running in your terminal! vtmate is a Powerful terminal-based voice ai toolkit with many realistic voices, extremely low latency, 28 languages supported. Allows you to voice conversate with local ai models, pipe data and save into files.

The program self contains (1.2GB) all TTS models and voices and necessary files to recognize speech and speak with voice with no external intallations ensuring maximum portability.

Video demonstration

(๐Ÿ‡ฌ๐Ÿ‡ง English) Conversation mode demo
en.-.demo.-.conversation.mp4
(๐Ÿ‡ฌ๐Ÿ‡ง English) Debate mode demo
en.-.demo.-.debate.mp4
(๐Ÿ‡ฌ๐Ÿ‡ง English) Reading mode demo
en.-.demo.-.reading.mode.mp4

Sponsor this project

Sponsor vtmate

vtmate screenshot

how it works

Features

  • ๐Ÿ“Œ Continuous Voice chat (LIVE conversation) with voice interruption
  • ๐Ÿš€ AI agents debates (2 agents talking to each other; use can also participate in between)
  • ๐Ÿ“Œ Realtime agent swap
  • ๐Ÿ“Œ Mid interrupt response via keyboard
  • ๐Ÿ“Œ Mid interrupt response via voice
  • ๐Ÿ“Œ Reset session (fresh history)
  • ๐Ÿ“Œ "Undo" last response (remove last response from history)
  • ๐Ÿ“Œ Recording Pause / Resume via keyboard in LIVE conversation mode
  • ๐Ÿ“Œ Push to Talk mode (PTT)
  • ๐Ÿ“Œ Save conversation as audio and text
  • ๐Ÿ“Œ Read a text file with voice, phrase by phrase, with keyboard navigation and pause/resume
  • ๐Ÿ“Œ Read text with voice from STDIN, phrase by phrase, with keyboard navigation and pause/resume
  • ๐Ÿ“Œ Save audio speech of a text file or STDIN content
  • ๐Ÿ“Œ Load separate settings file with different agents
  • ๐Ÿ“Œ Integrated whisper speech recognition system (no external intallation required)
  • ๐Ÿ“Œ Integrated kokoro TTS and supersonic 2 TTS systems (no external intallation required)
  • ๐Ÿ“Œ Interface with OpenTTS system (requires external docker service)
  • ๐Ÿ“Œ Use any gguf model from huggingface.com (using llama-server) or any ollama model

How it works

- You start the program and start talking
- Once audio is detected (based on sound-threshold-peak option) it will start recording
- As soon as there is a time of silence (based on end_silence_ms option), it will transcribe the recorded audio using speech to text system (whisper). In ptt mode, this option is ignored, the program will wait for SPACE key to be released to submit the audio
- The transcribed text will be sent to the ai model
- The ai model will reply with text
- The text converted to audio using text to speech system
- You can interrupt the ai agent at any moment by start speaking, this will cause the response and audio to stop and you can continue talking.
- In debate mode, the agents reply to each other automatically, playing the audio in each turn

LLM integration

  • โœ… ollama (default)
  • โœ… llama-server

You can run the models locally (by default) or remotely by configuring the base urls via cli option.

TTS engine support

  • โœ… Kokoro (integrated)
  • โœ… Supersonic 2 (integrated)
  • โœ… OpenTTS (requires external docker service)

Installation

๐Ÿ“Œ 1. Download vtmate

  • https://github.com/DavidValin/vtmate/releases
  • Move the binary to a folder in your $PATH so you can use vtmate command anywhere

๐Ÿ“Œ 2. Install llm engine (needed for ai responses)

Option A- ollama (the default)

  • Install https://ollama.com/download.
  • Pull the model you want to use with vtmate, for instance: ollama pull llama3.2:3b.

Option B- llama-server support.

  • Install llama.cpp: https://github.com/ggml-org/llama.cpp.
  • Download a gguf model: https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true.

๐Ÿ“Œ 3. (Windows only) Install supported terminal

  • Install Windows Terminal (which supports emojis): https://apps.microsoft.com/detail/9n0dx20hk701 (use this terminal to run vtmate)

๐Ÿ“Œ 4. (Optional) OpenTTS support

  • docker pull synesthesiam/opentts:all

Configure agents

The first time you run vtmate it will create a configuration file if it doesn't exist in ~/.vtmate/settings with 2 agents. You can define as many agents as you want.

Example of agent definition:

[agent]
name = explainer
language = en
tts = supersonic2
voice = F1
voice_speed = 1.1
provider = ollama
baseurl = http://127.0.0.1:11434
model = llama3.2:3b
system_prompt = "You are a helpful AI assistant. Your only funcion is to explain things as simple as possible in no more than 150 words or 450 words if the user asks for a longer explanation."
sound_threshold_peak = 0.12
end_silence_ms = 2500
ptt = true
whisper_model_path = ~/.whisper-models/ggml-tiny.bin
  • By default all agents are set in PTT mode, you have to keep SPACE pressed to talk. If you want to use LIVE mode, make sure you adjust your microphone levels correctly and adjust sound_threshold_peak and end_silence_ms settings to your need
  • โš ๏ธ Currently you cannot mix kokoro and supersonic tts systems (pick one).
  • Voice mixing is supported for kokoro TTS system only, you can create a voice by mixing 2 kokoro voices by percentage. Example mixing 50% of bm_daniel and 50% of am_puck: set voice name to bm_daniel.5+am_puck.5

To see explanation of each field:

vtmate --help

How to use it

The first agent defined in ~/vtmate/settings will always be selected agent when running vtmate, unless -a <agent_name> is used.

Before running vtmate make sure ollama is running: ollama serve. Optionally, if you want to use llama.cpp make sure llama-server is running.

All cli options:

  -a <agent_name>                       set a specific initial agent
  -p <prompt>                           initialize with a text prompt
  -q                                    quiet mode: produces a single response and exit (requires `-p` or `-i`)
  -i <file.txt>                         initialize with a file prompt
  -i -                                  initialize with prompt from STDIN (runs in quiet mode)
  -s                                    save the conversation to text and audio file in ~/.vtmate/conversations or ~/.vtmate/read-files
  --debate <AGENT1> <AGENT2> [SUBJECT]  initialize a debate between 2 agents with an initial prompt
  --debate <AGENT1> <AGENT2> -i <FILE>  initialize a debate between 2 agents with an initial prompt from file
  --debate <AGENT1> <AGENT2> -i โ€“       initialize a debate between 2 agents with an initial prompt from STDIN
  -r <file.txt>                         read a file with voice, phrase by phrase (no llm involved)
  -r -                                  read text from STDIN with voice, phrase by phrase (no llm involved). Use - for STDIN (runs in quiet mode)
  -c <settings_file>                    use a specific settings file
  --list-voices                         list all voices for all languages and tts systems
  --ptt <true/false>                    override for this session the ptt setting for all agents independently of its settings
  --verbose                             run the program in verbose mode
  --version                             print the vtmate installed version
  --help                                show help

For quick reference get the printable Quicksheet (PDF)

Conversation mode

conversation mode

Start conversation with default agent and save it as audio and text (waits for user voice input and respond)

vtmate -s

Start conversation with a specific agent (waits for user voice input and respond)

vtmate -a "main agent"

Start conversation with an initial text prompt

vtmate -p "Are we alone in the galaxy?"

Start conversation with an initial prompt from file

vtmate -i myprompt.txt

Get a single response from STDIN text and exit

echo "How to fly without wings?" | vtmate -i -
  • When running in LIVE mode just talk. You can also pause/resume recording by pressing SPACE once
  • When running in PTT mode: keep SPACE pushed while talking, and then release
  • Press SCAPE once during a mid response to cancel it
  • Press SCAPE twice for resetting the session
  • Press double u to undo last response
  • You can switch agents in realtime by pressing ARROW_LEFT / ARROW_RIGHT keyword arrows (you need at least 2 agents defined in ~/vtmate/settings).
  • You can change the voice speed by pressing ARROW_UP / ARROW_DOWN
  • Be able to save the conversation in a wav and text file by adding -s option. It will save it in ~/.vtmate/conversations folder
  • For quick reference get the printable Quicksheet (PDF)

Debate mode

debate mode

Initialize a debate between two agents and be able to participate in the debate by speaking at any time. To create a good debate adjust the system prompts of each agent and give a detailed initial input. In debate mode is good idea to set --ptt <true/false> option so that the ptt value is not switched on each agent turn.

Start a debate with an initial subject (with forced ptt mode)

vtmate --debate "God" "Devil" "How to succeed in life?" --ptt true

Start a debate with an initial prompt from file (with forced live mode)

vtmate --debate "God" "Devil" -i myprompt.txt  --ptt false

Start a debate with an initial file prompt (with forced ptt mode)

cat "Lets discuss the permissions of this files: \n\n $(ls -la)" > prompt.txt
vtmate --debate "Unix administrator" "Security Expert" -i prompt.txt --ptt true
  • When running in LIVE mode just talk. You can also pause/resume recording by pressing SPACE once
  • When running in PTT mode: keep SPACE pushed while talking, and then release
  • Press SCAPE once during a mid response to cancel it and stop the debate
  • Press SCAPE twice for resetting the session
  • Press double u to undo last response
  • You can also start/stop a debate from conversation mode by pressing Control+D and picking the debate agents.
  • Be able to save the conversation in a wav and text file by adding -s option. It will save it in ~/.vtmate/conversations folder
  • Here is an example on how to create automated audio debates from youtube videos using vtmate in combination with other tools
  • For quick reference get the printable Quicksheet (PDF)

Quiet mode

This mode process a text input, responds (text and audio) and exits

Get a single response from prompt

vtmate -q -p "Explain me the Zettelkasten Method"

Get a single response from prompt from file

vtmate -q -i myprompt.txt

Get a single response from prompt from STDIN and exit

echo "Is $(date) a national holiday day in Spain?" | vtmate -q -i -

Get a single response and save it as audio file and text file

echo "Can you find any suspicious processes in the next list? If so, why?\n\n $(ps aux | head -20)" | vtmate -q -i - -s

Read mode (file to speech)

read file mode

Read a text file or STDIN text phrase by phrase using an agent voice. Ensure the agent you choose has correct language and voice for your text. In this mode, only the next agent settings are used: "tts", "voice" and "language".

read from a txt file (and save it in ~/.vtmate/read-files)

vtmate -r myfile.txt -a reader

read from STDIN text, get a response and exit

echo "First phrase. Second phrase" | vtmate -r -

In this mode you can:

  • Move to previous phrase by pressing ARROW_UP
  • Move to next phrase by pressing ARROW_DOWN
  • Stop / Resume playback by pressing SPACE
  • For quick reference get the printable Quicksheet (PDF)

Separate agents

By default vtmate uses ~/.vtmate/settings file. You can create different setting fields for different agent groups, example:

philosophers.txt
scientists.txt
employees.txt

And then load each as you need:

vtmate -c philosophers.txt --debate "Aristoteles" "Ptahhotep" "how to achieve harmony?"

Model files

vtmate self contains (no need for manual installation) espeak-ng-data, the whisper tiny & small models, kokoro model and voices and supersonic2 model and voices which will be autoextracted from the binary when running vtmate if they are not found in next locations:

whisper models:

- `~/.whisper-models/ggml-tiny.bin`
- `~/.whisper-models/ggml-small.bin`

kokoro model files:

~/.cache/k/0.onnx
~/.cache/k/0.bin

espeak phonemes (used by kokoro):

- `~/.vtmate/espeak-ng-data.tar.gz`

supersonic2 files:

~/.vtmate/tts/supersonic2-model/onnx/duration_predictor.onnx
~/.vtmate/tts/supersonic2-model/onnx/text_encoder.onnx
~/.vtmate/tts/supersonic2-model/onnx/tts.json
~/.vtmate/tts/supersonic2-model/onnx/unicode_indexer.json
~/.vtmate/tts/supersonic2-model/onnx/vector_estimator.onnx
~/.vtmate/tts/supersonic2-model/onnx/vocoder.onnx
~/.vtmate/tts/supersonic2-model/voice_styles/M1.json
~/.vtmate/tts/supersonic2-model/voice_styles/M2.json
~/.vtmate/tts/supersonic2-model/voice_styles/M3.json
~/.vtmate/tts/supersonic2-model/voice_styles/M4.json
~/.vtmate/tts/supersonic2-model/voice_styles/M5.json
~/.vtmate/tts/supersonic2-model/voice_styles/F1.json
~/.vtmate/tts/supersonic2-model/voice_styles/F2.json
~/.vtmate/tts/supersonic2-model/voice_styles/F3.json
~/.vtmate/tts/supersonic2-model/voice_styles/F4.json
~/.vtmate/tts/supersonic2-model/voice_styles/F5.json
  • If you want to avoid sound interruptions you can use ptt mode or increase the sound_threshold_peak for your microphone levels.
  • If you want to use OpenTTS, start the docker service first: docker run --rm --platform=linux/amd64 -p 5500:5500 synesthesiam/opentts:all (it will pull the image the first time). Adjust the platform as needed depending on your hardware.
  • If you have problems starting vtmate you can remove ~/vtmate/settings so it recreates the default configuration
  • By default whisper tiny is used (from ~/.whisper-models/ggml-small.bin). If you need better speech recognition, download a better whisper model and update the whisper_model_path setting.

If you need help:

vtmate --help

Language support

ID Language Support TTS supported Number of voices
en ๐Ÿ‡ฌ๐Ÿ‡ง English ๐Ÿ† Best support โœ… SS2 โœ… Kokoro โœ… OpenTTS > 38 voices
es ๐Ÿ‡ช๐Ÿ‡ธ Spanish ๐Ÿ† Best support โœ… SS2 โœ… Kokoro โœ… OpenTTS > 14 voices
fr ๐Ÿ‡ซ๐Ÿ‡ท French ๐Ÿ† Best support โœ… SS2 โœ… Kokoro โœ… OpenTTS > 12 voices
zh ๐Ÿ‡จ๐Ÿ‡ณ Mandarin Chinese ๐Ÿฅˆ Good support โŒ SS2 โœ… Kokoro โœ… OpenTTS > 9 voices
ja ๐Ÿ‡ฏ๐Ÿ‡ต Japanese ๐Ÿฅˆ Good support โŒ SS2 โœ… Kokoro โœ… OpenTTS > 6 voices
pt ๐Ÿ‡ต๐Ÿ‡น Portuguese ๐Ÿฅˆ Good support โœ… SS2 โœ… Kokoro โŒ OpenTTS > 13 voices
ko ๐Ÿ‡ฐ๐Ÿ‡ท Korean ๐Ÿฅˆ Good support โœ… SS2 โŒ Kokoro โœ… OpenTTS 11 voices
it ๐Ÿ‡ฎ๐Ÿ‡น Italian ๐Ÿฅˆ Good support โŒ SS2 โœ… Kokoro โœ… OpenTTS > 3 voices
hi ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi ๐Ÿฅˆ Good support โŒ SS2 โœ… Kokoro โœ… OpenTTS > 4 voices
ar ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
bn ๐Ÿ‡ง๐Ÿ‡ฉ Bengali Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
ca ๐Ÿ‡ช๐Ÿ‡ธ Catalan Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
cs ๐Ÿ‡จ๐Ÿ‡ฟ Czech Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
de ๐Ÿ‡ฉ๐Ÿ‡ช German Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
el ๐Ÿ‡ฌ๐Ÿ‡ท Greek Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
fi ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
gu ๐Ÿ‡ฎ๐Ÿ‡ณ Gujarati Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
hu ๐Ÿ‡ญ๐Ÿ‡บ Hungarian Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
kn ๐Ÿ‡ฎ๐Ÿ‡ณ Kannada Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
mr ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
nl ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
pa ๐Ÿ‡ฎ๐Ÿ‡ณ Punjabi Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
ru ๐Ÿ‡ท๐Ÿ‡บ Russian Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
sv ๐Ÿ‡ธ๐Ÿ‡ช Swedish Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
sw ๐Ÿ‡ฐ๐Ÿ‡ช Swahili Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
ta ๐Ÿ‡ฎ๐Ÿ‡ณ Tamil Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
te ๐Ÿ‡ฎ๐Ÿ‡ณ Telugu Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice
tr ๐Ÿ‡น๐Ÿ‡ท Turkish Supported โŒ SS2 โŒ Kokoro โœ… OpenTTS 1 voice

Acceleration support

Do you have GPU? (nvidia? an apple computer?) Great! then vtmate speed is at lighting speed =)

  • To be able to use acceleration, pick the built version for your hardware from Releases list
  • For CUDA install CUDA Toolkit. For Vulkan install VULKAN SDK
macOS:            โœ… CPU    โœ… Metal
Linux (amd64):    โœ… CPU    โœ… CUDA     โš ๏ธ Vulkan
Linux (arm64):    โœ… CPU    โš ๏ธ CUDA     โŒ Vulkan
Windows (x86_64)  โœ… CPU    โš ๏ธ CUDA     โš ๏ธ Vulkan
Windows (arm64)   โŒ CPU    โŒ CUDA     โŒ Vulkan

โš ๏ธ Currently working on full static builds for all OS with Openblas + CUDA + Vulkan support. In the meantime, pick a release available from Releases list or build one yourself.

Build vtmate from source code

Simplest way:

cargo install vtmate

From git repository:

git clone https://github.com/DavidValin/vtmate
cargo build --release

Full configurable builds (OS, arch and gpu acceleration)

see:

build_linux.sh
build_macos.sh
build_windows.sh

Have fun o:)

About

Powerful ai toolkit to interact with ai models via voice from your terminal at extremely low latency with realistic voices. Allows live voice conversations and run debates between ai agents with user intervention, stdin and text file inputs and more

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE.commercial
Unknown
LICENSE.noncommercial

Stars

Watchers

Forks

Packages

 
 
 

Contributors