Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
-
Updated
Apr 6, 2026 - Python
Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Evaluation for AI outputs against trusted educational rubrics. Measure and improve content quality with research-backed rubrics — ensuring rigor, reliability, and alignment to classroom needs.
Libella — Android learning path for product professionals learning LLM systems through trade-offs, rubrics, and calibrated AI grading.
Rubric-as-judge runner using the Anthropic Python SDK with tool-use API for schema-validated structured output. 17 mocked tests included.
Lightweight examples of LLM evaluation artifacts: rubrics, prompts, and evaluator guidelines.
Open tools for the human-judgment layer of AI evaluation: EvalKit (Python package + CLI), Robotics ReviewKit, and the Buying Toolkit.
🚀 Explore DR Tulu, an innovative reinforcement learning model designed for long-form deep research tasks, achieving top benchmarks with cutting-edge techniques.
Add a description, image, and links to the rubrics topic page so that developers can more easily learn about it.
To associate your repository with the rubrics topic, visit your repo's landing page and select "manage topics."