Skip to content
View marcoyang1998's full-sized avatar
  • University of Cambridge
  • Cambridge

Highlights

  • Pro

Block or report marcoyang1998

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

68 1 Updated Feb 7, 2026

A PyTorch-based Speech Toolkit

Python 11,291 1,663 Updated Mar 1, 2026

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 135 8 Updated Jan 21, 2026

XARES-LLM

Python 54 3 Updated Feb 11, 2026

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,711 242 Updated Dec 30, 2025

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 871 117 Updated Dec 2, 2025

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 197 4 Updated Feb 25, 2026

🤗 R1-AQA Model: mispeech/r1-aqa

Python 314 29 Updated Mar 28, 2025

This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

Python 68 4 Updated Aug 13, 2024

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Python 185 12 Updated Sep 1, 2025

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 4,053 349 Updated Jan 8, 2025

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Python 216 12 Updated Sep 10, 2024

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,176 1,096 Updated Nov 18, 2024

SALMONN family: A suite of advanced multi-modal LLMs

1,391 111 Updated Feb 3, 2026

Inference code for Llama models

Python 59,199 9,818 Updated Jan 26, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 157,533 32,321 Updated Mar 7, 2026
Python 4 3 Updated Apr 25, 2023

Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup

Python 79 15 Updated Jun 30, 2025

Tools for handling multimodal data in machine learning projects.

Python 1,118 262 Updated Mar 7, 2026

Speech-to-text server framework with next-gen Kaldi

C++ 886 142 Updated Mar 7, 2026
Python 3 Updated Feb 12, 2026

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 10,661 1,205 Updated Mar 6, 2026

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, L…

C++ 1,644 206 Updated Oct 20, 2025
Python 46 10 Updated Nov 2, 2023

kaldi-asr/kaldi is the official location of the Kaldi project.

Shell 15,339 5,356 Updated Sep 22, 2025
Python 1,370 400 Updated Mar 3, 2026

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Cuda 1,306 233 Updated Mar 6, 2026

End-to-End Speech Processing Toolkit

Python 9,755 2,385 Updated Mar 5, 2026

:octocat: personal website + blog for every github user

JavaScript 6,751 681 Updated Feb 19, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Python 1,014 233 Updated Jul 8, 2019
Next