🌍 Nano World Model

A minimalist repository for training video world models based on diffusion-forcing.

Key Features

🚀 Instant Start — Minimal dependencies, easy data loading. From clone to first rollout in minutes.
🛠️ Unified Pipeline — Training, Validation, Evaluation; All managed with clean hydra-based configuration systems.
🔬 Scientific Transparency — Clean codebase with head-to-head ablations across prediction target, action injection, and model scale; Fully open-source, including model checkpoints.
🤖 Diverse Applications — Long-horizon rollouts, rollout to 3d point clouds, planning (MPC) out of the box.

🚀 Quick Start

git clone https://github.com/simchowitzlabpublic/nano-world-model.git
cd nano-world-model
conda env create -f environment.yml && conda activate nanowm

Set data + results paths (or use the gitignored src/configs/local/paths.yaml template — see docs/config_system.md):

export DATASET_DIR=/path/to/dino_wm_data       # DINO-WM envs (point_maze, pusht, ...)
export CSGO_DATA_DIR=/path/to/csgo             # CSGO HDF5 files
export RT1_DATA_ROOT=/path/to/rt1_fractal      # RT-1 LeRobot mirror (optional)
export RESULTS_DIR=/path/to/results            # checkpoints + logs land here

Download the i3d torchscript used by FID/FVD evaluation:

mkdir -p pretrained_models/i3d && curl -L \
    "https://www.dropbox.com/scl/fi/c5nfs6c422nlpj880jbmh/i3d_torchscript.pt?rlkey=x5xcjsrz0818i4qxyoglp5bb8&dl=1" \
    -o pretrained_models/i3d/i3d_torchscript.pt

For dataset downloads (DINO-WM, RT-1, CSGO), see docs/datasets/README.md.

🥷 Train your first model

DINO-WM PushT, NanoWM-B/2, default settings (pred-v · additive injection · cosine + ZTSNR):

python src/main.py experiment=dino_wm_pusht dataset=dino_wm/pusht model=nanowm_b2

CSGO with the L/2 model:

python src/main.py experiment=csgo dataset=game/csgo model=nanowm_l2_csgo

RT-1 (fractal) main run:

python src/main.py experiment=rt1 dataset=rt1/rt1 model=nanowm_b2

For reproducibility, we provide example scripts in src/scripts/. See docs/training.md for the full training guide, design choices, and ablation tables.

📦 Pretrained Checkpoints

Best-config runs (pred-v · additive · cosine + ZTSNR · NanoWM-B/2 unless noted):

Domain	Checkpoint	Steps
DINO-WM Point Maze	🤗 nanowm-b2-dino-wm-point-maze-30k	30k
DINO-WM Wall	🤗 nanowm-b2-dino-wm-wall-15k	15k
DINO-WM Rope	🤗 nanowm-b2-dino-wm-rope-15k	15k
DINO-WM Granular	🤗 nanowm-b2-dino-wm-granular-15k	15k
DINO-WM PushT	🤗 nanowm-b2-dino-wm-pusht-100k	100k
RT-1 (fractal)	🤗 nanowm-b2-rt1-300k	300k
CSGO	🤗 nanowm-l2-csgo-100k (NanoWM-L/2)	100k

We also provide RT-1 ablation tables with HF checkpoint paths. See docs/training.md#design-choices for the full table and ablation numbers.

🎬 Sample Predictions

CSGO 50-frame auto-regressive long-rollouts (NanoWM-L/2, 100k):

Quantitative Metrics Evaluated on 256 fixed samples (seed=42), 250 DDIM steps, sequential scheduling (frame-by-frame autoregressive denoising).

Dataset	Steps	PSNR ↑	SSIM ↑	LPIPS ↓	FID ↓
Point Maze	30k	36.74	0.984	0.019	9.66
Wall	15k	34.05	0.994	0.010	2.64
PushT	100k	33.19	0.982	0.016	13.63
Rope	15k	31.63	0.953	0.056	35.20
Granular	15k	26.08	0.917	0.073	40.05
RT-1	300k	24.36	0.787	0.180	35.08

Full per-domain numbers and methodology in docs/evaluation.md.

🧭 Applications

NanoWM rollouts can be used directly for downstream applications, including long-horizon generation, video-to-3D reconstruction, and MPC-style planning.

Long-horizon rollout — autoregressive rollout from trained checkpoints
Video → 3D map — Depth Anything 3 point cloud reconstruction from rollout videos
MPC-style planning — CEM planning over world model rollouts

📚 Documentation

docs/config_system.md — Hydra config layout, overrides, environment variables
docs/training.md — training workflow, design choices, ablation tables, all checkpoints
docs/evaluation.md — evaluation workflow, metric definitions, full result tables
docs/datasets/README.md — DINO-WM / RT-1 / CSGO formats, downloads, splits
docs/applications/planning.md — MPC + CEM model-predictive control
docs/applications/long_rollout.md — long-horizon autoregressive rollout
docs/applications/video_to_3d.md — Depth Anything 3 point cloud pipeline

🙏 Acknowledgements

We build upon a number of existing codebases: Latte, Vid2World, DFoT, and DINO-WM. More broadly, this repository draws inspirations and design principles from NanoGPT, NanoChat, and Boyuan Chen's Research Template. We sincerely thank the codebases above for open-sourcing their works.

📝 Citation

If you find this repository useful in your research, please consider citing:

@misc{nanoworldmodels,
  title={Nano World Model: A Minimalist, Batteries-Included Repository for Advancing World Model Science},
  author={Siqiao Huang and Partha Kaushik and Michael Chen and Hengkai Pan and Kaiwen Geng and Omar Chehab and Fernando Moreno-Pino and Max Simchowitz},
  year={2026},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/simchowitzlabpublic/nano-world-model}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
docs		docs
sample		sample
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 Nano World Model

Key Features

🚀 Quick Start

🥷 Train your first model

📦 Pretrained Checkpoints

🎬 Sample Predictions

🧭 Applications

📚 Documentation

🙏 Acknowledgements

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌍 Nano World Model

Key Features

🚀 Quick Start

🥷 Train your first model

📦 Pretrained Checkpoints

🎬 Sample Predictions

🧭 Applications

📚 Documentation

🙏 Acknowledgements

📝 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages