Toolkit for measuring Claude Code and Codex performance over time against a baseline using SWEbench-lite dataset **No API key required for Max or Pro subscribers**
-
Updated
Nov 22, 2025 - Python
Toolkit for measuring Claude Code and Codex performance over time against a baseline using SWEbench-lite dataset **No API key required for Max or Pro subscribers**
Wrapper of common LLM evaluation frameworks
🚀 Generate front-end code from design mockups using a powerful integration of Gemini and Claude within a user-friendly command system.
Add a description, image, and links to the swebench topic page so that developers can more easily learn about it.
To associate your repository with the swebench topic, visit your repo's landing page and select "manage topics."