Babylon.cpp is a C and C++ library for grapheme-to-phoneme (G2P) conversion and neural text-to-speech (TTS) synthesis. All inference runs locally using ONNX Runtime — no internet connection is required and no data leaves the host machine.
It supports two TTS engines:
- Kokoro — High-quality multi-voice neural TTS at 24 kHz with 54+ voices across multiple languages.
- VITS — End-to-end neural TTS; compatible with Piper models.
Phonemization is handled by Open Phonemizer backed by a ~130,000-entry pronunciation dictionary.
| Platform | Architecture | Library |
|---|---|---|
| Linux | x86_64 | libbabylon.so |
| macOS | Universal (x86_64 + arm64) | libbabylon.dylib |
| Windows | x86_64 | babylon.dll |
| Android | arm64-v8a, x86_64 | libbabylon.so |
Requires CMake 3.18+, a C++17 compiler, and Git.
git clone --recursive https://github.com/Mobile-Artificial-Intelligence/babylon.cpp.git
cd babylon.cpp
make cliThis builds the library, CLI binary, and ONNX Runtime from source. All output goes to bin/.
| Target | Description |
|---|---|
make lib |
Library only |
make cli |
Library + CLI binary + runtime files |
make debug |
CLI build in Debug mode |
make android |
Cross-compile for Android (requires NDK) |
The babylon binary provides three subcommands. It auto-loads config.json from the same directory as the executable on startup.
# Phonemize text to IPA
babylon phonemize "Hello world"
# → hɛloʊ wɜːld
# Synthesise speech (Kokoro, default)
babylon tts "Hello world" -o hello.wav
babylon tts --voice en-US-nova --speed 1.2 "Hello world"
# Synthesise speech (VITS)
babylon tts --vits "Hello world" -o hello.wav
# Start the REST API server and web frontend
babylon serve
babylon serve --host 0.0.0.0 --port 9000Global flags (apply to all subcommands):
--config <path> Load a JSON config file
--phonemizer-model <path> Phonemizer ONNX model
--dictionary <path> Pronunciation dictionary JSON
--kokoro-model <path> Kokoro ONNX model
--kokoro-voice <name> Default Kokoro voice
--kokoro-voices <dir> Directory of voice .bin files
--vits-model <path> VITS ONNX model
When running babylon serve, the following endpoints are available:
| Method | Path | Description |
|---|---|---|
| GET | / |
Web frontend (HTML) |
| GET | /status |
Engine availability and voice count |
| GET | /voices |
List of available Kokoro voice names |
| POST | /phonemize |
Convert text to IPA or token IDs |
| POST | /tts |
Synthesise speech, returns audio/wav |
POST /tts body:
{
"text": "Hello world",
"engine": "kokoro",
"voice": "en-US-heart",
"speed": 1.0
}POST /phonemize body:
{ "text": "Hello world", "tokens": false }#include "babylon.h"
int main(void) {
babylon_g2p_options_t opts = {
.dictionary_path = "models/dictionary.json",
.use_punctuation = 1,
};
babylon_g2p_init("models/open-phonemizer.onnx", opts);
babylon_kokoro_init("models/kokoro-quantized.onnx");
babylon_kokoro_tts(
"Hello world",
"models/voices/en-US-heart.bin",
1.0f,
"output.wav"
);
babylon_kokoro_free();
babylon_g2p_free();
return 0;
}#include "babylon.h"
int main() {
OpenPhonemizer::Session phonemizer(
"models/open-phonemizer.onnx",
"models/dictionary.json",
/* use_punctuation = */ true
);
Kokoro::Session kokoro("models/kokoro-quantized.onnx");
std::string phonemes = phonemizer.phonemize("Hello world");
kokoro.tts(phonemes, "models/voices/en-US-heart.bin", 1.0f, "output.wav");
return 0;
}A full manual is available in docs/manual.tex, covering the complete C and C++ API reference, CLI options, REST API, build instructions, and model configuration.