torchaudio

Here are 60 public repositories matching this topic...

2noise / ChatTTS

A generative speech model for daily dialogue.

python chat agent text-to-speech torch tts english chinese gpt natural-language-inference english-language chinese-language torchaudio llm chatgpt llm-agent chattts

Updated Apr 10, 2026
Python

DrewThomasson / VoxNovel

Star

VoxNovel: generate audiobooks giving each character a different voice actor.

windows linux mac torch tts epub audiobooks multi-speaker m4b torchaudio voice-cloning audiobook-creator booknlp generative-ai styletts2

Updated Jun 8, 2025
Python

KentoNishi / torch-pitch-shift

Star

Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

torch pytorch sound-processing augmentation pitch-shift gpu-support torchaudio audio-augmentation

Updated Sep 25, 2024
Python

Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.

audio python streaming high-performance numpy vad async-await torchaudio onnxruntime

Updated Dec 22, 2025
Python

igorshmukler / kokoro-ruslan

Star

Kokoro Language Model Training Script for Russian (Ruslan Corpus)

ml pytorch tts torchaudio

Updated Apr 23, 2026
Python

KentoNishi / torch-time-stretch

Star

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

torch pytorch sound-processing augmentation gpu-support torchaudio time-stretch audio-augmentation

Updated Sep 5, 2022
Python

overcrash66 / OpenTranslator

Sponsor

Star

Open Translator: Speech To Speech and Speech to text Translator with voice cloning and other cool features

multilingual multi-platform translator transformers speech-to-text speaker-recognition autosub multimodal torchaudio gtts-api s2st coqui-tts whisper-ai llama2 audio-translation xttsv2 s2tt

Updated Mar 26, 2026
Python

eonu / torch-fsdd

Star

A utility for wrapping the Free Spoken Digit Dataset into PyTorch-ready data set splits.

audio torch data-loader torchaudio pytorch-dataset pytorch-dataset-split audio-dataset pytorch-dataloader fsdd free-spoken-digit-dataset

Updated Dec 27, 2022
Python

aminul-huq / Speech-Command-Classification

Star

Speech command classification on Speech-Command v0.02 dataset using PyTorch and torchaudio. In this example, three models have been trained using the raw signal waveforms, MFCC features and MelSpectogram features.

speech dnn speech-recognition classification pytorch-tutorial torchaudio

Updated Dec 5, 2022
Python

thekartikeyamishra / VoiceCloner

Star

The Voice Cloner is a Python-based project that leverages Tacotron 2 and WaveGlow models for text-to-speech (TTS) synthesis and basic voice cloning. This project supports 22 official Indian languages, including Sanskrit, making it versatile for multilingual text input.

python machine-learning ai numpy torch librosa torchaudio indic-transliteration nvidia-waveglow nvidia-tacotron2 nvidia-pyindex

Updated Dec 17, 2024
Python

LukeSutor / programmatic-pitch

Star

High fidelity music synthesis using diffusion and UnivNet.

pytorch gan generative-model diffusion torchaudio

Updated Jan 10, 2024
Python

nipponjo / tts-german-pytorch

Star

🎙️ German TTS (FastPitch) with Thorsten voice / emotional

python text-to-speech deep-learning german speech pytorch tts speech-synthesis german-language torchaudio emotional-speech hifi-gan fastpitch

Updated Sep 16, 2024
Python

PRITHIVSAKTHIUR / Qwen3-TTS-Daggr-UI

Star

Demonstration for the Qwen/Qwen3-TTS-12Hz models using Daggr for modular UI nodes. Supports voice design (prompt-to-speech), voice cloning (zero-shot), and custom voice synthesis with multiple speakers and languages. Features lazy model loading to optimize memory, multi-model sizes (0.6B and 1.7B), ASR and support for various audio inputs.

python text-to-speech numpy torch pytorch speech-synthesis voice-control gradio librosa audio-processing asr torchaudio voice-cloning huggingface-transformers speech-to-speech daggr soundfile qwen-tts

Updated Feb 12, 2026
Python

glefundes / misophonia-bot

Star

🤖 Telegram bot powered by Deep Learning. Automatically assesses the safety of audios and voice messages for people suffering from misophonia.

audio telegram deep-learning telegram-bot pytorch telegram-bot-api audio-classification torchaudio

Updated Sep 27, 2020
Python

natgluons / ChronoSense

Star

Sleep Optimizer App to detect disturbances, optimize sleep quality, and give personalized recommendations (sleep audio analysis using librosa, PyTorch, sklearn).

audio-analysis audio-classification librosa sleep-tracker audio-processing chronobiology sleep-research sleep-analysis torchaudio