Skip to content

simchowitzlabpublic/nano-world-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌍 Nano World Model

License: MIT

A minimalist repository for training video world models based on diffusion-forcing.

3×4 rollout grid

Key Features

  • 🚀 Instant Start — Minimal dependencies, easy data loading. From clone to first rollout in minutes.
  • 🛠️ Unified Pipeline — Training, Validation, Evaluation; All managed with clean hydra-based configuration systems.
  • 🔬 Scientific Transparency — Clean codebase with head-to-head ablations across prediction target, action injection, and model scale; Fully open-source, including model checkpoints.
  • 🤖 Diverse Applications — Long-horizon rollouts, rollout to 3d point clouds, planning (MPC) out of the box.

🚀 Quick Start

git clone https://github.com/simchowitzlabpublic/nano-world-model.git
cd nano-world-model
conda env create -f environment.yml && conda activate nanowm

Set data + results paths (or use the gitignored src/configs/local/paths.yaml template — see docs/config_system.md):

export DATASET_DIR=/path/to/dino_wm_data       # DINO-WM envs (point_maze, pusht, ...)
export CSGO_DATA_DIR=/path/to/csgo             # CSGO HDF5 files
export RT1_DATA_ROOT=/path/to/rt1_fractal      # RT-1 LeRobot mirror (optional)
export RESULTS_DIR=/path/to/results            # checkpoints + logs land here

Download the i3d torchscript used by FID/FVD evaluation:

mkdir -p pretrained_models/i3d && curl -L \
    "https://www.dropbox.com/scl/fi/c5nfs6c422nlpj880jbmh/i3d_torchscript.pt?rlkey=x5xcjsrz0818i4qxyoglp5bb8&dl=1" \
    -o pretrained_models/i3d/i3d_torchscript.pt

For dataset downloads (DINO-WM, RT-1, CSGO), see docs/datasets/README.md.

🥷 Train your first model

DINO-WM PushT, NanoWM-B/2, default settings (pred-v · additive injection · cosine + ZTSNR):

python src/main.py experiment=dino_wm_pusht dataset=dino_wm/pusht model=nanowm_b2

CSGO with the L/2 model:

python src/main.py experiment=csgo dataset=game/csgo model=nanowm_l2_csgo

RT-1 (fractal) main run:

python src/main.py experiment=rt1 dataset=rt1/rt1 model=nanowm_b2

For reproducibility, we provide example scripts in src/scripts/. See docs/training.md for the full training guide, design choices, and ablation tables.

📦 Pretrained Checkpoints

Best-config runs (pred-v · additive · cosine + ZTSNR · NanoWM-B/2 unless noted):

Domain Checkpoint Steps
DINO-WM Point Maze 🤗 nanowm-b2-dino-wm-point-maze-30k 30k
DINO-WM Wall 🤗 nanowm-b2-dino-wm-wall-15k 15k
DINO-WM Rope 🤗 nanowm-b2-dino-wm-rope-15k 15k
DINO-WM Granular 🤗 nanowm-b2-dino-wm-granular-15k 15k
DINO-WM PushT 🤗 nanowm-b2-dino-wm-pusht-100k 100k
RT-1 (fractal) 🤗 nanowm-b2-rt1-300k 300k
CSGO 🤗 nanowm-l2-csgo-100k (NanoWM-L/2) 100k

We also provide RT-1 ablation tables with HF checkpoint paths. See docs/training.md#design-choices for the full table and ablation numbers.

🎬 Sample Predictions

CSGO 50-frame auto-regressive long-rollouts (NanoWM-L/2, 100k):

CSGO 50-frame autoregressive long rollout

Quantitative Metrics Evaluated on 256 fixed samples (seed=42), 250 DDIM steps, sequential scheduling (frame-by-frame autoregressive denoising).

Dataset Steps PSNR ↑ SSIM ↑ LPIPS ↓ FID ↓
Point Maze 30k 36.74 0.984 0.019 9.66
Wall 15k 34.05 0.994 0.010 2.64
PushT 100k 33.19 0.982 0.016 13.63
Rope 15k 31.63 0.953 0.056 35.20
Granular 15k 26.08 0.917 0.073 40.05
RT-1 300k 24.36 0.787 0.180 35.08

Full per-domain numbers and methodology in docs/evaluation.md.

🧭 Applications

NanoWM rollouts can be used directly for downstream applications, including long-horizon generation, video-to-3D reconstruction, and MPC-style planning.

Video-to-3D point cloud demo

📚 Documentation

🙏 Acknowledgements

We build upon a number of existing codebases: Latte, Vid2World, DFoT, and DINO-WM. More broadly, this repository draws inspirations and design principles from NanoGPT, NanoChat, and Boyuan Chen's Research Template. We sincerely thank the codebases above for open-sourcing their works.

📝 Citation

If you find this repository useful in your research, please consider citing:

@misc{nanoworldmodels,
  title={Nano World Model: A Minimalist, Batteries-Included Repository for Advancing World Model Science},
  author={Siqiao Huang and Partha Kaushik and Michael Chen and Hengkai Pan and Kaiwen Geng and Omar Chehab and Fernando Moreno-Pino and Max Simchowitz},
  year={2026},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/simchowitzlabpublic/nano-world-model}},
}