Skip to content

Commit b11d6f2

Browse files
committed
initial commit
0 parents  commit b11d6f2

File tree

10 files changed

+2848
-0
lines changed

10 files changed

+2848
-0
lines changed

.gitignore

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Python-generated files
2+
__pycache__/
3+
*.py[oc]
4+
build/
5+
dist/
6+
wheels/
7+
*.egg-info
8+
9+
# Virtual environments
10+
.venv
11+
worktrees/
12+
results/
13+
queue/
14+
15+
# Agent prompt files (generated per-session by launchers)
16+
CLAUDE.md
17+
AGENTS.md
18+
19+
# Experimental code/artifacts
20+
dev/

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.10

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# autoresearch
2+
3+
Autonomous LLM pretraining research, driven by AI agents.
4+
5+
The idea: give an AI agent a small but real LLM training setup and let it run experiments overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model.
6+
7+
This particular implementation is trying to be the least fancy baseline, but it's clear how one would adjust the `program.md` file to run more sophisticated research programs with more elaborate instructions. For example, the agent can actively do little experiments on research while the job is running.
8+
9+
## How it works
10+
11+
The repo is deliberately small and only has a few files:
12+
13+
- **`constants.py`** — fixed rules: sequence length, time budget, eval tokens. Not modified.
14+
- **`prepare.py`** — one-time data prep (downloads training data, trains a BPE tokenizer) and runtime utilities (dataloader, evaluation). Not modified.
15+
- **`train.py`** — the single file the agent edits. Contains the full GPT model, optimizer (Muon + AdamW), and training loop. Everything is fair game: architecture, hyperparameters, optimizer, batch size, etc.
16+
- **`program.md`** — instructions for the agent. Point your agent here and let it go.
17+
18+
Training runs for a **fixed 5-minute time budget** (wall clock, excluding startup/compilation). The metric is **val_bpb** (validation bits per byte) — lower is better, and vocab-size-independent so architectural changes are fairly compared.
19+
20+
## Quick start
21+
22+
**Requirements:** A single NVIDIA GPU (tested on H100), Python 3.10+, [uv](https://docs.astral.sh/uv/).
23+
24+
```bash
25+
# 1. Install dependencies
26+
uv sync
27+
28+
# 2. Download data and train tokenizer (one-time, ~5 min)
29+
uv run prepare.py
30+
31+
# 3. Run a single training experiment (5 min + startup)
32+
uv run train.py
33+
```
34+
35+
## Running the agent
36+
37+
Simply spin up your Claude/Codex or whatever you want in this repo, then you can something like:
38+
39+
```
40+
Hi have a look at program.md and let's kick off a new experiment! let's do the setup first.
41+
```
42+
43+
The `program.md` file is essentially a super lightweight "skill".
44+
45+
## Project structure
46+
47+
```
48+
constants.py — fixed constants (do not modify)
49+
prepare.py — data prep + runtime utilities (do not modify)
50+
train.py — model, optimizer, training loop (agent modifies this)
51+
program.md — agent instructions
52+
spawn.sh — multi-agent launcher
53+
pyproject.toml — dependencies
54+
```
55+
56+
## Design choices
57+
58+
- **Single file to modify.** The agent only touches `train.py`. This keeps the scope manageable and diffs reviewable.
59+
- **Fixed time budget.** Training always runs for exactly 5 minutes. This makes experiments directly comparable regardless of what the agent changes (model size, batch size, architecture, etc).
60+
- **BPB metric.** Bits per byte is independent of tokenizer vocabulary size, so the agent could in principle change the vocab size and still get a fair comparison.
61+
- **Self-contained.** No external dependencies beyond PyTorch and a few small packages. No distributed training, no complex configs. One GPU, one file, one metric.

constants.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""
2+
Fixed constants for autoresearch. Do not modify.
3+
"""
4+
5+
MAX_SEQ_LEN = 2048 # context length
6+
TIME_BUDGET = 300 # training time budget in seconds (5 minutes)
7+
EVAL_TOKENS = 40 * 524288 # number of tokens for val eval

0 commit comments

Comments
 (0)