A Modal fork of karpathy/autoresearch — run autonomous LLM pretraining research on cloud GPUs without needing a local NVIDIA GPU.
The original autoresearch requires a local NVIDIA GPU (ideally an H100). This fork adds a thin Modal wrapper so you can run everything from a laptop (macOS, Linux, Windows) while training executes on a remote H100 via Modal's serverless GPU infrastructure.
What changed from upstream:
- Added
run_modal.py— Modal wrapper that sendstrain.pyto a remote H100 for each experiment - Updated
program.md— agent instructions usemodal runinstead ofuv run - Everything else (model, optimizer, data, evaluation) is identical to upstream
Requirements: Python 3.10+, a Modal account, the modal CLI.
# 1. Install the Modal CLI (if you don't already have it)
pip install modal
modal setup
# 2. One-time data prep (~2 min, runs on a cheap T4)
modal run run_modal.py --prepare-data
# 3. Run a single training experiment (~5 min on H100)
modal run run_modal.py
# 4. Check status of a running/completed experiment
modal run run_modal.py --statusEach modal run run_modal.py invocation:
- Uploads your current
train.pyandprepare.pyto Modal (picks up the agent's latest edits) - Spins up an H100 container with all dependencies pre-installed (image is cached after first build)
- Runs training for the fixed 5-minute time budget
- Streams output back and saves results to a persistent Modal Volume for polling
Data shards and the tokenizer are stored on a Modal Volume (autoresearch-cache) so they persist across runs.
Same as upstream — spin up Claude Code or any coding agent in this repo, then prompt:
Hi have a look at program.md and let's kick off a new experiment! let's do the setup first.
The agent edits train.py locally and runs modal run run_modal.py > run.log 2>&1 for each experiment. The --status flag lets you poll results while training is in progress.
prepare.py — constants, data prep + runtime utilities (do not modify)
train.py — model, optimizer, training loop (agent modifies this)
run_modal.py — Modal wrapper (sends train.py to remote H100)
program.md — agent instructions (uses modal run instead of uv run)
pyproject.toml — dependencies
To change the GPU type, edit the gpu= parameter in run_modal.py:
@app.function(
image=image,
gpu="H100", # Change to "A100-80GB", "A100", "A10G", "L4", "T4", etc.
...
)See upstream for the full original README, design choices, and context.
MIT