Awesome-RL-based-LLM-Reasoning

We have witnessed the powerful capabilities of RL-based LLMs. In this repository, we will add papers, slides, and other interesting materials that enhance LLM reasoning with reinforcement learning, helping everyone learn quickly!

Papers

Policy Optimization

Process-based Reward Models

Reinforcement learning

[2409] Training Language Models to Self-Correct via Reinforcement Learning

Search algorithms (Monte Carlo Tree Search or Beam Search)

[] Solving olympiad geometry without human demonstrations
[2408] Deepseek-prover-v1. 5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search

Slides

LLM Reasoning: Key Ideas and Limitations Denny Zhou-DeepMind (Video)
Towards Reasoning in Large Language Models Jie Huang-UIUC
Can LLMs Reason & Plan? Subbarao Kambhampati-ASU
Inference-Time Techniques for LLM Reasoning Xinyun Chen-DeepMind
Chain-of-Thought Reasoning In Language Models Zhuosheng Zhang-SJTU
Learning to Self-Improve & Reason with LLMs Jason Weston-Meta & NYU

Video

Open-Source Project

🔥 TinyZero (4*4090 is enough for 0.5B LLM, but can't observe aha moment)
🔥 Open-r1
🔥 Logic-RL
🔥 Unsloth-GRPO (simplest r1 implementation)

Introduction to Reinforcement Learning

Contributing

Feel free to contribute more papers or other any resources!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-RL-based-LLM-Reasoning

Papers

Policy Optimization

Process-based Reward Models

Reinforcement learning

Search algorithms (Monte Carlo Tree Search or Beam Search)

Slides

Video

Open-Source Project

Introduction to Reinforcement Learning

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome-RL-based-LLM-Reasoning

Papers

Policy Optimization

Process-based Reward Models

Reinforcement learning

Search algorithms (Monte Carlo Tree Search or Beam Search)

Slides

Video

Open-Source Project

Introduction to Reinforcement Learning

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages