Skip to content

nejohnson2/rl-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning for LLM Alignment Workshop

A 5-part self-study workshop covering reinforcement learning concepts as applied to LLM alignment, from foundational RL through modern methods like DPO and GRPO.

Workshop Structure

Part Notebook Topic Key Concepts
1 01_rl_foundations.ipynb RL Foundations Bandits, policies, REINFORCE, baselines, variance reduction
2 02_policy_gradients_text.ipynb Policy Gradients for Text LLMs as policies, sentiment steering, KL divergence, mode collapse
3 03_rlhf_pipeline.ipynb RLHF Pipeline Reward models, Bradley-Terry, PPO with trl
4 04_dpo.ipynb Direct Preference Optimization DPO derivation, from-scratch implementation, DPO vs PPO
5 05_grpo_frontier.ipynb GRPO & the Frontier GRPO, verifiable rewards, RLAIF, online DPO, DeepSeek-R1

Prerequisites

  • Solid ML/deep learning background
  • Familiarity with PyTorch and HuggingFace transformers
  • No prior RL knowledge required

Setup

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter notebook notebooks/

Hardware

Designed to run locally on macOS with Apple Silicon (MPS). All notebooks use GPT-2 124M, which fits comfortably in memory. Code auto-detects MPS/CUDA/CPU.

Progression

Each notebook builds on the previous. The arc is:

Bandits → Policy Gradients → Text as RL → RLHF/PPO → DPO → GRPO
  (toy)      (theory)         (bridge)    (classic)   (modern) (frontier)

Datasets Used

  • Parts 3-4: Anthropic HH-RLHF (human preference data, loaded from HuggingFace)
  • Part 5: GSM8K (grade school math, loaded from HuggingFace)

Key Libraries

  • torch — all implementations
  • transformers — GPT-2 and sentiment classifier
  • trl — PPOTrainer, DPOTrainer
  • datasets — HH-RLHF, GSM8K

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors