Lists (17)
Sort Name ascending (A-Z)
Starred repositories
Downloads videos and playlists from YouTube
A feature-rich command-line audio/video downloader
A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.
Reinforcement Learning via Self-Distillation (SDPO)
Implementing complete SigLIP2 loss components: SILC/TIPS self-distillation, LocCa captioning loss, and Sigmoid loss, following HuggingFace Transformers and SigLIP2 research papers. Open source rese…
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
A version of verl to support diverse tool use
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
[NeurIPS 2025] Efficient Reasoning Vision Language Models
Official Repository of Native Parallel Reasoner
A self-learning tutorail for CUDA High Performance Programing.
N-dimensional Rotary Position Embeddings for PyTorch
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…
Fully Open Framework for Democratized Multimodal Training
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant refe…
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Simple & Scalable Pretraining for Neural Architecture Research
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Awesome Unified Multimodal Models



