Studio Voice Helper

Python wrapper around the NVIDIA Maxine Studio Voice NIM that accepts audio or video, enhances the speech track, and writes the result back out.

Features

Accepts audio or video input
Extracts audio from video, resamples to 48 kHz PCM WAV (NIM requirement)
Silence-aware chunking keeps each piece under the NIM file-size limit
Sends chunks to the NIM via gRPC and stitches results in order
Remuxes enhanced audio back into the original video container
Supports NGC_API_KEY via env vars / .env
Cleans up temp files (unless --debug)

Easy Setup

Requirements: Docker Desktop with an NVIDIA GPU (and nvidia-container-toolkit on Linux, or WSL2 GPU support on Windows).

0. Create API Key

The AI model is free to use but the download is restricted. Therefore you have to create a free API Key here: https://org.ngc.nvidia.com/setup/api-keys

1. Configure your API key

Create a .env file and put the API key there.

cp .env.example .env
# Open .env and set NGC_API_KEY to your key from https://org.ngc.nvidia.com/setup/api-keys

2. Put your media file in `./data/`

cp /path/to/your/video.mp4 data/

3. Start the NIM

The first run will download the model weights — this may take several minutes.

docker compose up -d nim

Check that it is ready (status should show healthy):

docker compose ps nim

4. Improve your audio

To enhance the audio simply run:

docker compose run --rm helper --input /data/video.mp4 --output /data/output.mp4

Paths inside the container must use /data/ — the mounted volume.

5. Stop when done

Your enhanced audiofile or video should now be in the output folder. So feel free to stop the containers:

docker compose down

Advanced Setup

Prerequisites

Tool	Purpose
Python 3.10+	Runtime
ffmpeg	Audio extraction, resampling, remuxing
NVIDIA Maxine Studio Voice NIM	gRPC inference server (GPU required)

Quick Start

1. Clone & install

Linux / macOS

git clone <repo-url> && cd StudioVoiceHelper

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Windows (PowerShell)

git clone <repo-url>; cd StudioVoiceHelper

python -m venv .venv; .\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

2. Compile gRPC protos

python compile_protos.py

3. Configure

cp .env.example .env
# Edit .env and set NGC_API_KEY

4. Start the NIM container

See NVIDIA docs.

docker run -it --rm --name=studio-voice \
    --runtime=nvidia --gpus all --shm-size=8GB \
    -e NGC_API_KEY=$NGC_API_KEY \
    -e STREAMING=false \
    -p 8001:8001 \
    nvcr.io/nim/nvidia/maxine-studio-voice:latest

5. Run

# Enhance audio inside a video
python -m app.cli --input input.mp4 --output output.mp4

# Enhance a standalone audio file
python -m app.cli --input noisy.wav --output clean.wav

# Verbose / debug mode (keeps temp files)
python -m app.cli --input in.mp4 --output out.mp4 --debug

CLI Options

Flag	Description
`-i, --input`	Input audio or video file (required)
`-o, --output`	Output file path (required)
`--target`	NIM gRPC endpoint (default `127.0.0.1:8001`)
`--model-type`	`48k-hq` (default), `48k-ll`, or `16k-hq`
`--debug`	Keep temp files, verbose logging
`-v, --verbose`	Enable DEBUG-level logging

Environment Variables

All optional; override via .env or shell.

Variable	Default	Purpose
`NGC_API_KEY`	(none)	NVIDIA NGC API key
`NIM_TARGET`	`127.0.0.1:8001`	NIM gRPC address
`NIM_MODEL_TYPE`	`48k-hq`	Model variant
`NIM_MAX_CHUNK_BYTES`	`34000000`	Max WAV chunk size (bytes)
`MIN_SILENCE_MS`	`400`	Min silence for split detection
`SILENCE_THRESH_DBFS`	`-40`	Silence dBFS threshold
`DEBUG`	`false`	Keep temp files

Docker

docker-compose.yml defines two services:

Service	What it does
`nim`	Runs the NVIDIA Maxine Studio Voice NIM container (requires NVIDIA GPU)
`helper`	Runs this wrapper; waits for the NIM to be healthy before starting

Prerequisites

NVIDIA GPU with nvidia-container-toolkit installed
On Windows: Docker Desktop with WSL2 GPU support enabled
NGC_API_KEY set in .env

Build

docker compose build helper

Run

# 1. Start (and keep running) the NIM in the background.
#    On first launch it downloads models — this can take several minutes.
docker compose up -d nim

# 2. Wait for it to become healthy (optional — helper depends_on handles this too)
docker compose ps nim

# 3. Run the helper against a file in ./data/
docker compose run --rm helper --input /data/input.mp4 --output /data/output.mp4

# 4. Stop the NIM when done
docker compose down

Note: Paths inside the container must start with /data/ (the mounted volume), not host-relative paths like .\data\file.mp4.

Manual `docker run` (without compose)

Linux / macOS

docker run --rm \
    -e NGC_API_KEY=$NGC_API_KEY \
    -e NIM_TARGET=host.docker.internal:8001 \
    -v "$(pwd)/data:/data" \
    studio-voice-helper \
    --input /data/input.mp4 --output /data/output.mp4

Windows (PowerShell)

docker run --rm `
    -e NGC_API_KEY=$env:NGC_API_KEY `
    -e NIM_TARGET=host.docker.internal:8001 `
    -v "${PWD}\data:/data" `
    studio-voice-helper `
    --input /data/input.mp4 --output /data/output.mp4

Tests

pip install pytest
pytest tests/ -v

Project Structure

app/
├── __init__.py
├── __main__.py        # python -m app entrypoint
├── cli.py             # Argument parsing, logging setup
├── config.py          # Env / .env config loader
├── media.py           # ffmpeg: extract, resample, remux
├── chunking.py        # Silence-aware splitting & stitching
├── nim_client.py      # gRPC client for Studio Voice NIM
├── pipeline.py        # Orchestrates the full flow
└── _generated/        # Proto-compiled gRPC stubs
protos/
└── studiovoice.proto  # NIM proto definition
tests/
├── test_config.py
├── test_media.py
├── test_chunking.py
└── test_pipeline.py

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
data		data
protos		protos
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
compile_protos.py		compile_protos.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Studio Voice Helper

Features

Easy Setup

0. Create API Key

1. Configure your API key

2. Put your media file in ./data/

3. Start the NIM

4. Improve your audio

5. Stop when done

Advanced Setup

Prerequisites

Quick Start

1. Clone & install

Linux / macOS

Windows (PowerShell)

2. Compile gRPC protos

3. Configure

4. Start the NIM container

5. Run

CLI Options

Environment Variables

Docker

Prerequisites

Build

Run

Manual docker run (without compose)

Linux / macOS

Windows (PowerShell)

Tests

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Put your media file in `./data/`

Manual `docker run` (without compose)

Packages