We have witnessed the powerful capabilities of RL-based LLMs. In this repository, we will add papers, slides, and other interesting materials that enhance LLM reasoning with reinforcement learning, helping everyone learn quickly!
- [2501] Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning
- [2501] Kimi k1.5: Scaling Reinforcement Learning with LLMs
- [2312] Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
- [2305] Let's verify step by step
- [2211] Solving math word problems with process-and outcome-based feedback
- [] Solving olympiad geometry without human demonstrations
- [2408] Deepseek-prover-v1. 5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search
- LLM Reasoning: Key Ideas and Limitations Denny Zhou-DeepMind (Video)
- Towards Reasoning in Large Language Models Jie Huang-UIUC
- Can LLMs Reason & Plan? Subbarao Kambhampati-ASU
- Inference-Time Techniques for LLM Reasoning Xinyun Chen-DeepMind
- Chain-of-Thought Reasoning In Language Models Zhuosheng Zhang-SJTU
- Learning to Self-Improve & Reason with LLMs Jason Weston-Meta & NYU
- EZ撸paper: DeepSeek-R1 论文详解 part 1:比肩 OpenAI-o1,如何做到的?
- [GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
- DeepSeek R1 Explained to your grandma
- 🔥
TinyZero (4*4090 is enough for 0.5B LLM, but can't observe aha moment)
- 🔥
Open-r1
- 🔥
Logic-RL
- 🔥
Unsloth-GRPO (simplest r1 implementation)
- Feel free to contribute more papers or other any resources!