Stars
The most powerful local music generation model that outperforms most commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
A Python 3 module to control DMX using OpenDMX or uDMX - Featuring fixture profiles, built-in effects and a web control panel.
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
UNOFFICIAL - A tool converting sound input to OSC trigger signals.
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
Intelligent, real-time, audio-responsive DMX light control.
Official implementation of YingMusic-SVC.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
This is the official implementation of our paper: "MiniMax-Remover: Taming Bad Noise Helps Video Object Removal"
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
✨ AsrTools: Smart Voice-to-Text Tool | Efficient Batch Processing | User-Friendly Interface | No GPU Required | Supports SRT/TXT Output | Turn your audio into accurate text in an instant!
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
一款专注于Ai翻译的工具,一键自动翻译RPG SLG游戏,Epub TXT小说,PDF Word MD文档,Srt Vtt Lrc字幕等等复杂长文本。
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Repository for training models for music source separation.
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Learn to build and deploy local Visual Language Models for Edge AI
🧑🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Official Implementation of "Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion"
Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.


