LLM Registry

The Source of Truth for LLM Benchmarks. Compare top models like DeepSeek V3, Claude 3.5 Sonnet, and GPT-4o across trusted evaluation sets.

Features

Global Leaderboard: Sortable, filterable index with tier filtering (Verified vs Discovered)
Interactive Comparison: "Versus Mode" with Radar Charts and Delta tables
Deep Specs: Context window, pricing (input/output/cache/reasoning), max output tokens
Verified Scores: Distinguishes between third-party, provider, community, and estimated results
Data Freshness: Training cutoff dates and score age indicators
Tier System: Verified (curated) vs Discovered (auto-imported) model classification
Capability Filtering: Filter by reasoning, vision, tools, audio, code specialization
Family System: Model family grouping (Llama, GPT, Claude, Gemini, etc.)
API Access: REST API with rate limiting and OpenAPI 3.0 spec
Data Validation: Built-in scripts to prevent broken IDs and out-of-range scores
Scalable Architecture: Supports 10,000+ models with on-demand data loading

Stack

Framework: Next.js 16 (App Router)
Styling: Tailwind CSS v4 + Shadcn UI
Data: Hybrid architecture (manifest + full data)
State Management: SWR for on-demand data fetching
Charts: Recharts
Deployment: Cloudflare Pages (static + edge delivery)

Architecture

Hybrid Data Loading:

Registry Manifest (~50KB): Lightweight model list for discovery
Full Model Data (~870KB): Complete specs and benchmarks (used where needed)
Score Files (<1KB each): On-demand score loading

Key Hooks:

useRegistry() - Fetch model lists with tier filtering
useModelScores() - Load scores on-demand
useRegistryFamilies() - Get unique model families

See docs/SCALABLE_ARCHITECTURE.md for complete documentation.

Getting Started

Install dependencies:
```
bun install
```

Generate registry manifest:

bun run import:models-dev
bun run generate:manifest

Run the development server:
```
bun dev
```
Open http://localhost:3000

Data Quality Commands

Import models.dev metadata:
```
bun run import:models-dev
```
Generate registry manifest:
```
bun run generate:manifest
```
Validate registry integrity:
```
bun run validate:data
```
Run strict validation (CI parity):
```
bun run validate:data:strict
```
Generate a category and benchmark coverage report:
```
bun run report:coverage
```
Run tests:
```
bun run test
```

Deploy (Cloudflare Workers)

This project deploys with OpenNext to Cloudflare Workers (not Cloudflare Pages).

One-time auth:
```
bunx wrangler login
```
Build and preview locally in the Workers runtime:
```
bun run preview
```
Deploy to production:
```
bun run deploy
```

Automated Deploys (Cloudflare Git Integration)

Use Workers Builds (not Pages) for fully automated deploys on every push.

Project type: Workers
Worker name: llm-registry (must match wrangler.jsonc)
Root directory: /
Build command: leave empty (or true)
Deploy command:
```
bun run deploy
```

This gives a single automated pipeline step per commit: build + deploy.

Methodology

Global and category views use normalized benchmark scores (0-100).
Lower-is-better metrics are inverted so higher normalized score always means better performance.
Category averages are computed over available scores for that category.
Compare view defaults to strict shared-benchmark analysis for fair model-vs-model deltas.
Exploratory compare mode allows partial overlap; missing values stay explicit as N/A.
Capability profile (radar) shows all available domains in scope and never treats missing data as zero.
Leaderboard supports Coverage-Assisted mode by default; use coverageMode=strict (Observed Only) to rank using measured scores only.
Full methodology page: /about
Ongoing SEO operations checklist: SEO_CHECKLIST.md

Project Structure

src/
├── app/                    # Next.js App Router pages
├── components/
│   ├── dashboard/          # Leaderboard, compare, and data viz components
│   └── ui/                 # Shadcn UI components
├── data/
│   ├── models.ts           # Model definitions and scores
│   ├── benchmarks.ts       # Benchmark taxonomy and metadata
│   ├── aa-overrides.ts     # Artificial Analysis data imports
│   ├── sources.ts          # Data source registry
│   └── changelog.ts        # Version history
├── lib/
│   ├── registry-data.ts    # Data processing and queries
│   └── leaderboard-query.ts # Leaderboard filtering logic
├── types/                  # TypeScript type definitions
└── scripts/                # Data validation and import scripts

API

The registry provides a REST API for programmatic access:

GET /api/v1/models - List all models
GET /api/v1/models/[id] - Get specific model details
GET /api/v1/benchmarks - List all benchmarks
GET /api/v1/leaderboard - Get leaderboard data with filtering
GET /api/v1/export?format=json|csv - Export data for research workflows

Full API documentation available at /api-docs

Adding a Model

Open src/data/models.ts
Add a new object to the models array following the Model interface
Add scores for existing benchmarks
Include provenance metadata (source, verification level, as-of date)
Run validation: bun run validate:data:strict

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see LICENSE for details.

Data Attribution

Benchmark data includes contributions from:

Artificial Analysis (https://artificialanalysis.ai/) - Imported under current policy with explicit attribution
Provider-reported scores from model publishers
Third-party evaluation results

All imported data includes provenance tracking with source IDs, verification levels, and as-of dates.

Build & Deployment

Production Build

bun run build:cf

This will:

Generate registry manifest (1,581 models)
Copy score files to public directory
Build Next.js application
Output static files to dist/

Deploy to Cloudflare Pages

# Manual deployment
bun run build:cf
npx wrangler pages deploy dist/

# Or connect GitHub repo for auto-deploy

Automated Sync (Weekly)

GitHub Actions automatically:

Imports latest models.dev data (every Monday 2 AM UTC)
Detects changes
Creates pull request for review

See .github/workflows/update-models-dev.yml

Performance

Bundle Sizes

Component	Size	Notes
Client Bundle	~150KB	React + app code
Registry Manifest	~50KB	1,581 models
Score Files	<1KB	Per model

Load Times

Page	Initial Load	Data Fetch	Total
Leaderboard	<1s	<200ms	<1.2s
Model Detail	<1s	<50ms	<1.05s
Explore	<1s	N/A	<1s

Scalability

Current Models: 1,581
Max Supported: 50,000+
Build Time: ~3 seconds
API Response: <20ms (edge cached)

API Documentation

REST API available at /api/v1/:

GET /api/v1/models - List all models
GET /api/v1/models/[id] - Model details
GET /api/v1/benchmarks - List benchmarks
GET /api/v1/scores - Query scores
GET /api/v1/leaderboards/[category] - Category rankings
GET /api/v1/export - Export data (JSON/CSV)

Rate Limiting: 100 requests/minute per IP (via Cloudflare WAF)

Full documentation: /api-docs

Documentation

Scalable Architecture - Data loading patterns
Cloudflare WAF Setup - Rate limiting
Migration Guide - Upgrading to v0.7.0
API Documentation - REST API reference

Version

Current: v0.7.0 (2026-03-01)

Recent Changes:

Tier system (Verified/Discovered models)
On-demand data loading with hooks
Automated models.dev import
1,581 models with rich metadata
Advanced filtering (family, capability, provider)
Score files for on-demand loading

See Changelog for complete history.

License

MIT License - see LICENSE file for details.

Attribution

Data Sources:

models.dev - Model metadata (MIT License)
Artificial Analysis - Score overrides
Manual curation - Benchmark scores

Technologies:

Next.js 16
TypeScript 5
React 19
Tailwind CSS v4
Shadcn UI
SWR
Cloudflare Pages

Built with ❤️ for the AI community.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
docs		docs
public		public
reports		reports
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CAPABILITY_ICONS_IMPROVED.md		CAPABILITY_ICONS_IMPROVED.md
COMPLETION_SUMMARY.md		COMPLETION_SUMMARY.md
CONTRIBUTING.md		CONTRIBUTING.md
CRITICAL_FIXES_SUMMARY.md		CRITICAL_FIXES_SUMMARY.md
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
LICENSE		LICENSE
MEDIUM_PRIORITY_COMPLETE.md		MEDIUM_PRIORITY_COMPLETE.md
MIGRATION_GUIDE.md		MIGRATION_GUIDE.md
MODELS_DEV_IMPORT.md		MODELS_DEV_IMPORT.md
PERFORMANCE_REPORT.md		PERFORMANCE_REPORT.md
PHASE2_SUMMARY.md		PHASE2_SUMMARY.md
PHASE_1_2_TEST_REPORT.md		PHASE_1_2_TEST_REPORT.md
PR2_OPTIMIZATIONS.md		PR2_OPTIMIZATIONS.md
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
RELEASE_NOTES_v0.7.0.md		RELEASE_NOTES_v0.7.0.md
SEO_CHECKLIST.md		SEO_CHECKLIST.md
TESTING_RESULTS.md		TESTING_RESULTS.md
TODO.md		TODO.md
bun.lock		bun.lock
components.json		components.json
eslint.config.mjs		eslint.config.mjs
make_transparent_user.py		make_transparent_user.py
next.config.ts		next.config.ts
package.json		package.json
postcss.config.mjs		postcss.config.mjs
process_logo.py		process_logo.py
tsconfig.json		tsconfig.json
wrangler.jsonc		wrangler.jsonc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Registry

Features

Stack

Architecture

Getting Started

Data Quality Commands

Deploy (Cloudflare Workers)

Automated Deploys (Cloudflare Git Integration)

Methodology

Project Structure

API

Adding a Model

Contributing

License

Data Attribution

Build & Deployment

Production Build

Deploy to Cloudflare Pages

Automated Sync (Weekly)

Performance

Bundle Sizes

Load Times

Scalability

API Documentation

Documentation

Version

License

Attribution

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Registry

Features

Stack

Architecture

Getting Started

Data Quality Commands

Deploy (Cloudflare Workers)

Automated Deploys (Cloudflare Git Integration)

Methodology

Project Structure

API

Adding a Model

Contributing

License

Data Attribution

Build & Deployment

Production Build

Deploy to Cloudflare Pages

Automated Sync (Weekly)

Performance

Bundle Sizes

Load Times

Scalability

API Documentation

Documentation

Version

License

Attribution

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages