Skip to content
View yfyeung's full-sized avatar

Highlights

  • Pro

Organizations

@X-LANCE @CS-BAOYAN

Block or report yfyeung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

Python 230 12 Updated Mar 4, 2026

Qwen3.5 is the large language model series developed by Qwen team, Alibaba Cloud.

1,862 94 Updated Mar 2, 2026

GLM-5: From Vibe Coding to Agentic Engineering

1,660 136 Updated Feb 14, 2026

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 1,004 108 Updated Jan 15, 2026

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 332 19 Updated Mar 5, 2026

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…

Python 823 73 Updated Mar 6, 2026

📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

Python 25,815 2,897 Updated Mar 6, 2026

Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

68 1 Updated Feb 7, 2026

UniAudio 2.0: An audio fundation model for text, speech, sound, and music

Python 356 7 Updated Feb 14, 2026

Elevate your AI research writing, no more tedious polishing ✨

10,169 778 Updated Mar 5, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 1,884 174 Updated Jan 30, 2026

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Python 213 3 Updated Feb 13, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 9,172 1,144 Updated Feb 6, 2026

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 110 1 Updated Feb 6, 2026

Official repository for the WenetSpeech-Chuan dataset.

Python 160 4 Updated Feb 5, 2026

An All-in-One Speech, Sound, Music Codec with Single Nested Codebook

Python 29 1 Updated Oct 11, 2025

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 135 8 Updated Jan 21, 2026

Contrastive Language-Audio Pretraining

Python 2,045 205 Updated May 15, 2025

DACVAE

Python 197 16 Updated Dec 22, 2025

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 913 80 Updated Feb 25, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,358 290 Updated Jan 5, 2026

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 940 114 Updated Dec 17, 2025

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 155 9 Updated Mar 24, 2025

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 2,175 145 Updated Dec 8, 2025

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 875 58 Updated Feb 13, 2026

An intuitive and low-overhead instrumentation tool for Python

Python 1,203 42 Updated Jul 8, 2025

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)

Python 66 4 Updated Dec 23, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,711 242 Updated Dec 30, 2025

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,102 246 Updated Feb 23, 2026
Next