yfyeung

Yifan Yang yfyeung

PhD-ing in Spoken Language Processing @X-LANCE | Research Intern @QwenLM | Prev @Tencent-Hunyuan @microsoft @k2-fsa

168 followers · 132 following

Shanghai Jiao Tong University
https://yfyeung.github.io
https://scholar.google.com/citations?user=slhAlQ0AAAAJ
in/yifan-yang-290ba624b

Highlights

Organizations

Lists (10)

Sort

Starred repositories

FireRedTeam / FireRedVAD

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

Python 230 12 Updated Mar 4, 2026

QwenLM / Qwen3.5

Qwen3.5 is the large language model series developed by Qwen team, Alibaba Cloud.

1,862 94 Updated Mar 2, 2026

zai-org / GLM-5

GLM-5: From Vibe Coding to Agentic Engineering

1,660 136 Updated Feb 14, 2026

X-LANCE / SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 1,004 108 Updated Jan 15, 2026

FireRedTeam / FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 332 19 Updated Mar 5, 2026

OpenMOSS / MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…

Python 823 73 Updated Mar 6, 2026

datawhalechina / hello-agents

📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

Python 25,815 2,897 Updated Mar 6, 2026

yfyeung / CLSP

Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

68 1 Updated Feb 7, 2026

yangdongchao / UniAudio2

UniAudio 2.0: An audio fundation model for text, speech, sound, and music

Python 356 7 Updated Feb 14, 2026

Leey21 / awesome-ai-research-writing

Elevate your AI research writing, no more tedious polishing ✨

10,169 778 Updated Mar 5, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 1,884 174 Updated Jan 30, 2026

ZitengWangNYU / Scale-RAE

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Python 213 3 Updated Feb 13, 2026

QwenLM / Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 9,172 1,144 Updated Feb 6, 2026

ASLP-lab / WenetSpeech-Wu-Repo

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 110 1 Updated Feb 6, 2026

ASLP-lab / WenetSpeech-Chuan

Official repository for the WenetSpeech-Chuan dataset.

Python 160 4 Updated Feb 5, 2026

SWivid / AUV

An All-in-One Speech, Sound, Music Codec with Single Nested Codebook

Python 29 1 Updated Oct 11, 2025

k2-fsa / Flow2GAN

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 135 8 Updated Jan 21, 2026

LAION-AI / CLAP

Contrastive Language-Audio Pretraining

Python 2,045 205 Updated May 15, 2025

facebookresearch / dacvae

DACVAE

Python 197 16 Updated Dec 22, 2025

FunAudioLLM / Fun-ASR

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 913 80 Updated Feb 25, 2026

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,358 290 Updated Jan 5, 2026

zai-org / GLM-TTS

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 940 114 Updated Dec 17, 2025

NKU-HLT / SpeechLLM-as-Judges

Python 16 1 Updated Dec 6, 2025

ajd12342 / paraspeechcaps

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 155 9 Updated Mar 24, 2025

LTH14 / JiT

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 2,175 145 Updated Dec 8, 2025

stepfun-ai / Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 875 58 Updated Feb 13, 2026

gaogaotiantian / dowhen

An intuitive and low-overhead instrumentation tool for Python

Python 1,203 42 Updated Jul 8, 2025

AmphionTeam / SpeechJudge

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)

Python 66 4 Updated Dec 23, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,711 242 Updated Dec 30, 2025

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,102 246 Updated Feb 23, 2026

Yifan Yang yfyeung

Highlights

Organizations

Lists (10)

ASR

AudioTokenizer

Data

LLM

Research Notes

RL

TTA

TTS

TTS Eval

Vocoder

Starred repositories

bobplugin