your_interview Audio Triage Tool

Trims silence from an interview audio file with FFmpeg, then does a fast triage pass — transcript, sentiment, and energy per chunk — and writes a JSON report.

Quick Start (< 2 minutes)

Prerequisites: Zig 0.13+, FFmpeg, Python 3.8+

git clone <repo>
cd readback

zig build
mkdir -p input output
cp your_interview.mp3 input/your_interview.mp3
zig-out/bin/readback input/your_interview.mp3

Output:

output/trimmed.mp3 — silence removed
output/report.json — analysis report
output/your_interview.mp3_sentiment.json — per-chunk transcript + sentiment (with sentiment setup; see below)

Prerequisites

Zig 0.13+: Download from ziglang.org
FFmpeg: Install via brew install ffmpeg (macOS), apt install ffmpeg (Ubuntu), or ffmpeg.org
Python 3.8+ (optional, only if using sentiment analysis)

Build

zig build test        # Run full test suite
zig build             # Build binary

Binary location: zig-out/bin/readback

Usage

# Trim silence and generate report (no sentiment analysis)
zig-out/bin/readback input/your_interview.mp3

# With custom output paths
zig-out/bin/readback input/your_interview.mp3 output/trimmed.mp3 output/report.json

Sentiment Analysis (Optional)

Emotion is read from the audio (prosody), not the words — so unusual tone on ordinary speech is caught. "How are you?" said hostilely is flagged; the same words said calmly are not.

For each 8-second chunk:

emotion — neutral / angry / fearful / happy / sad / calm, derived from the three dimensions below with a neutral deadzone (only clearly off-neutral chunks get a non-neutral label, which guards against misclassification)
valence / arousal / dominance — the raw audio-model outputs (each ~0–1), kept for auditing. Hostility = high arousal + high dominance + low valence; dominance is what separates "angry" from "fearful"
text — speech transcript (OpenAI Whisper, tiny model)

The emotion model is audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim, fine-tuned on MSP-Podcast (real conversational speech) — unlike acted-emotion models, it generalizes to natural audio. Valence is the model's weakest dimension (a known limitation of speech emotion recognition), so treat it as the coarsest of the three. Whisper-tiny on fixed 8-second cuts splits some utterances: transcripts are triage signposts, not clip- or show-note-grade.

Setup (one-time)

./scripts/setup_sentiment.sh

This creates .venv/ with all Python dependencies (torch, transformers, openai-whisper, soundfile, soxr). The emotion model (~661 MB) and Whisper tiny weights (~75 MB) download on first run.

Run with sentiment analysis

zig build
zig-out/bin/readback input/your_interview.mp3

Outputs:

output/your_interview.mp3_sentiment.json — normalized per-chunk transcript + sentiment
output/your_interview.mp3.raw_sentiment.json — raw Python output (intermediate, also a cache)

Audio is decoded in a single sequential pass, one 8-second chunk in memory at a time, so a four-hour file never loads into memory: decoding a 4.4-hour MP3 is ~5 s of I/O at a flat ~0.2 GB, and total process memory is set by the models (~1 GB), independent of file length. If the raw sentiment JSON already exists for an input, inference is skipped — delete it to force a fresh run.

Steady-state runtime: both the emotion model and Whisper run on every chunk (~0.3 s of inference per 8 s chunk on an M-series Mac — emotion on MPS, Whisper on CPU), so a four-hour file (~2,000 chunks) completes in roughly 12 minutes (measured: a 4.39 h MP3 → 1,978 segments in 12.0 min, 0.96 GB peak RSS). That cost is paid once: the raw-JSON cache is what stands between a re-run and re-inferring every chunk again, so keep output/<name>.raw_sentiment.json unless you intend to recompute. Use --no-transcribe (Python CLI) to skip Whisper when you only need emotion.

The neutral/emotion split is governed by a tunable deadzone (--deadzone, default 0.10); widen it to label more chunks neutral, narrow it to surface more emotion.

Cache files (not committed)

These are auto-generated and excluded from git:

.venv/ — Python virtual environment
~/.cache/whisper/ — Whisper model weights cache
~/.cache/huggingface/ — audeering emotion-model weights cache

Troubleshooting

zig: command not found

Download and install Zig from ziglang.org. Ensure zig is on your PATH.

ffmpeg: command not found

Install FFmpeg: brew install ffmpeg (macOS), apt install ffmpeg (Ubuntu), or see ffmpeg.org.

Sentiment analysis is slow on first run

This is normal: the emotion model (~661 MB) and Whisper tiny weights (~75 MB) download on first use, then cache.
Both models run per chunk on CPU/MPS, so long files take a while. The raw-JSON cache means you only pay it once per input.

Tests fail

Run zig build test for the Zig suite and .venv/bin/python -m pytest sentiment/tests/test_cli.py for the Python unit suite.

Development

Code layout: Zig in src/, Python sentiment in sentiment/
Run tests: zig build test or zig test src/<module>/<module>_test.zig
Run sentiment CLI directly: .venv/bin/python -m interview_sentiment.cli --input <file> --output <file>

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
sentiment		sentiment
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

your_interview Audio Triage Tool

Quick Start (< 2 minutes)

Prerequisites

Build

Usage

Sentiment Analysis (Optional)

Setup (one-time)

Run with sentiment analysis

Cache files (not committed)

Troubleshooting

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

your_interview Audio Triage Tool

Quick Start (< 2 minutes)

Prerequisites

Build

Usage

Sentiment Analysis (Optional)

Setup (one-time)

Run with sentiment analysis

Cache files (not committed)

Troubleshooting

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages