|
| 1 | +# Mathlib Cache |
| 2 | + |
| 3 | +This directory contains the implementation of Mathlib's build cache system (`lake exe cache`), which downloads pre-built `.olean` files to avoid recompiling the entire library. |
| 4 | + |
| 5 | +> **Note**: A new `lake cache` command is currently being designed and implemented in Lake itself. This will eventually replace the Mathlib-specific `lake exe cache` and work for all repositories. Until then, this cache system remains the primary way to get pre-built artifacts for Mathlib. |
| 6 | +
|
| 7 | +## Quick Start |
| 8 | + |
| 9 | +```bash |
| 10 | +# Download and unpack cache for all of Mathlib |
| 11 | +lake exe cache get |
| 12 | + |
| 13 | +# Force re-download everything |
| 14 | +lake exe cache get! |
| 15 | + |
| 16 | +# Download cache for specific files only (and their dependencies) |
| 17 | +lake exe cache get Mathlib/Algebra/Group/Basic.lean |
| 18 | +lake exe cache get Mathlib.Algebra.Group.Basic |
| 19 | +``` |
| 20 | + |
| 21 | +## Commands |
| 22 | + |
| 23 | +### No Privilege Required |
| 24 | + |
| 25 | +| Command | Description | |
| 26 | +|-----------------|---------------------------------------------------------------------| |
| 27 | +| `get [ARGS]` | Download linked files missing on the local cache and decompress | |
| 28 | +| `get! [ARGS]` | Download all linked files and decompress (force re-download) | |
| 29 | +| `get- [ARGS]` | Download linked files missing to local cache, but do not decompress | |
| 30 | +| `pack` | Compress non-compressed build files into the local cache | |
| 31 | +| `pack!` | Compress build files into the local cache (no skipping) | |
| 32 | +| `unpack` | Decompress linked already downloaded files | |
| 33 | +| `unpack!` | Decompress linked already downloaded files (no skipping) | |
| 34 | +| `clean` | Delete non-linked files | |
| 35 | +| `clean!` | Delete everything on the local cache | |
| 36 | +| `lookup [ARGS]` | Show information about cache files for the given Lean files | |
| 37 | + |
| 38 | +### Privilege Required (CI/Maintainers) |
| 39 | + |
| 40 | +| Command | Description | |
| 41 | +|----------------|-----------------------------------------------------------| |
| 42 | +| `put` | Run `pack` then upload linked files missing on the server | |
| 43 | +| `put!` | Run `pack` then upload all linked files | |
| 44 | +| `put-unpacked` | `put` only files not already packed; intended for CI use | |
| 45 | +| `commit` | Write a commit on the server | |
| 46 | +| `commit!` | Overwrite a commit on the server | |
| 47 | + |
| 48 | +### Arguments |
| 49 | + |
| 50 | +The `get`, `get!`, `get-`, and `lookup` commands accept: |
| 51 | + |
| 52 | +- Module names: `Mathlib.Algebra.Group.Basic` |
| 53 | +- File paths: `Mathlib/Algebra/Group/Basic.lean` |
| 54 | +- Folder names: `Mathlib/Data/` (finds all Lean files inside) |
| 55 | +- Glob patterns: `Mathlib/**/Order/*.lean` (via shell expansion) |
| 56 | + |
| 57 | +When arguments are provided, only the specified files and their transitive imports are downloaded. |
| 58 | + |
| 59 | +### Options |
| 60 | + |
| 61 | +| Option | Description | |
| 62 | +|---------------------|--------------------------------------------------------------------------------------------| |
| 63 | +| `--repo=OWNER/REPO` | Override the repository to fetch cache from (e.g., `--repo=leanprover-community/mathlib4`) | |
| 64 | + |
| 65 | +## Environment Variables |
| 66 | + |
| 67 | +### Cache Location |
| 68 | + |
| 69 | +| Variable | Description | Default | |
| 70 | +|---------------------|------------------------------------|-------------------------------------------------| |
| 71 | +| `MATHLIB_CACHE_DIR` | Directory for cached `.ltar` files | `$XDG_CACHE_HOME/mathlib` or `~/.cache/mathlib` | |
| 72 | + |
| 73 | +### Cache Backend Selection |
| 74 | + |
| 75 | +| Variable | Description | Default | |
| 76 | +|--------------------------------|----------------------------------------------------------|-------------| |
| 77 | +| `MATHLIB_CACHE_USE_CLOUDFLARE` | Set to `1` or `true` to use Cloudflare R2 instead of Azure | Azure cache | |
| 78 | + |
| 79 | +### Custom Cache URLs |
| 80 | + |
| 81 | +These allow overriding the cache endpoints, useful for mirrors or custom deployments: |
| 82 | + |
| 83 | +| Variable | Description | Default | |
| 84 | +|-------------------------|---------------------------------|-------------------------------------------------------------------| |
| 85 | +| `MATHLIB_CACHE_GET_URL` | URL for downloading cache files | Azure or Cloudflare URL based on `MATHLIB_CACHE_USE_CLOUDFLARE` | |
| 86 | +| `MATHLIB_CACHE_PUT_URL` | URL for uploading cache files | Azure or Cloudflare URL based on `MATHLIB_CACHE_USE_CLOUDFLARE` | |
| 87 | + |
| 88 | +### Authentication (for uploads) |
| 89 | + |
| 90 | +| Variable | Description | |
| 91 | +|--------------------------|------------------------------------------------| |
| 92 | +| `MATHLIB_CACHE_SAS` | Azure SAS token (when using Azure backend) | |
| 93 | +| `MATHLIB_CACHE_S3_TOKEN` | S3 credentials (when using Cloudflare backend) | |
| 94 | + |
| 95 | +## How It Works |
| 96 | + |
| 97 | +### File Hashing |
| 98 | + |
| 99 | +Each Lean file's cache is identified by a hash computed from: |
| 100 | + |
| 101 | +1. **Root hash**: A combination of: |
| 102 | + - `lakefile.lean` content |
| 103 | + - `lean-toolchain` content |
| 104 | + - `lake-manifest.json` content |
| 105 | + - The Lean compiler's git hash |
| 106 | + - A generation counter (bumped to invalidate all caches) |
| 107 | + |
| 108 | +2. **File hash**: Mixing: |
| 109 | + - The root hash |
| 110 | + - The file's path relative to its package |
| 111 | + - The file's content hash |
| 112 | + - Hashes of all imported files |
| 113 | + |
| 114 | +This ensures that any change to toolchain, dependencies, or source files produces a different cache key. |
| 115 | + |
| 116 | +### Cache File Format |
| 117 | + |
| 118 | +Cache files use the `.ltar` format (Lean tar), handled by [leantar](https://github.com/digama0/leangz). Each `.ltar` contains: |
| 119 | + |
| 120 | +- `.olean` files (compiled Lean) |
| 121 | +- `.ilean` files (interface info) |
| 122 | +- `.trace` files (build traces) |
| 123 | +- `.c` files (generated C code) |
| 124 | +- Associated `.hash` files |
| 125 | + |
| 126 | +### Cached Packages |
| 127 | + |
| 128 | +The cache covers these packages: |
| 129 | + |
| 130 | +- `Mathlib` |
| 131 | +- `Batteries` |
| 132 | +- `Aesop` |
| 133 | +- `Cli` |
| 134 | +- `ImportGraph` |
| 135 | +- `LeanSearchClient` |
| 136 | +- `Plausible` |
| 137 | +- `Qq` |
| 138 | +- `ProofWidgets` |
| 139 | +- `Archive` |
| 140 | +- `Counterexamples` |
| 141 | + |
| 142 | +## Default Cache Backends |
| 143 | + |
| 144 | +### Azure Blob Storage (Default) |
| 145 | + |
| 146 | +- **Download URL**: `https://lakecache.blob.core.windows.net/mathlib4` |
| 147 | +- Used by default for downloads and uploads |
| 148 | + |
| 149 | +### Cloudflare R2 |
| 150 | + |
| 151 | +- **Download URL**: `https://mathlib4.lean-cache.cloud` |
| 152 | +- **Upload URL**: `https://a09a7664adc082e00f294ac190827820.r2.cloudflarestorage.com/mathlib4` |
| 153 | +- Enable with `MATHLIB_CACHE_USE_CLOUDFLARE=1` |
| 154 | + |
| 155 | +## Setting Up Your Own Cache Endpoint |
| 156 | + |
| 157 | +You can host your own cache mirror or private cache using any S3-compatible storage or HTTP server. |
| 158 | + |
| 159 | +### Requirements |
| 160 | + |
| 161 | +Your endpoint must support: |
| 162 | + |
| 163 | +1. **GET requests** for downloading files at: |
| 164 | + - `/f/{repo}/{hash}.ltar` - for fork caches |
| 165 | + - `/f/{hash}.ltar` - for main mathlib cache (Azure only) |
| 166 | + - `/c/{commit_hash}` - for commit manifests |
| 167 | + |
| 168 | +2. **PUT requests** for uploading (if you need upload capability) |
| 169 | + |
| 170 | +### Using a Custom Endpoint |
| 171 | + |
| 172 | +```bash |
| 173 | +# Download from a custom mirror |
| 174 | +export MATHLIB_CACHE_GET_URL="https://my-mirror.example.com/mathlib4" |
| 175 | +lake exe cache get |
| 176 | + |
| 177 | +# Upload to a custom endpoint |
| 178 | +export MATHLIB_CACHE_PUT_URL="https://my-upload.example.com/mathlib4" |
| 179 | +export MATHLIB_CACHE_SAS="your-auth-token" # or MATHLIB_CACHE_S3_TOKEN for S3 |
| 180 | +lake exe cache put |
| 181 | +``` |
| 182 | + |
| 183 | +### Example: S3-Compatible Storage |
| 184 | + |
| 185 | +For S3-compatible storage (MinIO, Cloudflare R2, AWS S3, etc.): |
| 186 | + |
| 187 | +1. Create a bucket (e.g., `mathlib-cache`) |
| 188 | +2. Configure public read access for downloads (or use signed URLs) |
| 189 | +3. Set up authentication for uploads |
| 190 | +4. Set the environment variables: |
| 191 | + |
| 192 | +```bash |
| 193 | +export MATHLIB_CACHE_GET_URL="https://your-bucket.s3.region.amazonaws.com/mathlib-cache" |
| 194 | +export MATHLIB_CACHE_PUT_URL="https://your-bucket.s3.region.amazonaws.com/mathlib-cache" |
| 195 | +export MATHLIB_CACHE_USE_CLOUDFLARE=1 # Use S3-style auth |
| 196 | +export MATHLIB_CACHE_S3_TOKEN="ACCESS_KEY:SECRET_KEY" |
| 197 | +``` |
| 198 | + |
| 199 | +### Example: Simple HTTP Mirror |
| 200 | + |
| 201 | +For a read-only mirror using nginx or any static file server: |
| 202 | + |
| 203 | +1. Periodically sync files from the official cache |
| 204 | +2. Serve them at a public URL |
| 205 | +3. Point users to your mirror: |
| 206 | + |
| 207 | +```bash |
| 208 | +export MATHLIB_CACHE_GET_URL="https://mathlib-mirror.myorg.com" |
| 209 | +lake exe cache get |
| 210 | +``` |
| 211 | + |
| 212 | +### URL Structure |
| 213 | + |
| 214 | +The cache uses this URL pattern: |
| 215 | + |
| 216 | +``` |
| 217 | +{BASE_URL}/f/{repo}/{filename}.ltar # Fork/branch caches |
| 218 | +{BASE_URL}/f/{filename}.ltar # Main mathlib cache (Azure) |
| 219 | +{BASE_URL}/c/{commit_hash} # Commit manifests |
| 220 | +``` |
| 221 | + |
| 222 | +Where: |
| 223 | +- `{repo}` is like `leanprover-community/mathlib4` or `username/mathlib4` |
| 224 | +- `{filename}` is a hash like `1234567890abcdef` |
| 225 | +- `{commit_hash}` is a git commit SHA |
| 226 | + |
| 227 | +## Dependencies |
| 228 | + |
| 229 | +The cache system automatically downloads and manages: |
| 230 | + |
| 231 | +- **curl** (>=7.70, preferably >=7.81) - for HTTP transfers |
| 232 | +- **leantar** - for `.ltar` compression/decompression |
| 233 | + |
| 234 | +If your system curl is too old, a static binary is downloaded automatically on Linux. |
| 235 | + |
| 236 | +## File Locations |
| 237 | + |
| 238 | +| Path | Description | |
| 239 | +|-----------------------------|------------------------------| |
| 240 | +| `~/.cache/mathlib/` | Default cache directory | |
| 241 | +| `~/.cache/mathlib/*.ltar` | Cached build artifacts | |
| 242 | +| `~/.cache/mathlib/curl.cfg` | Temporary curl configuration | |
| 243 | +| `.lake/build/lib/lean/` | Unpacked `.olean` files | |
| 244 | +| `.lake/build/ir/` | Unpacked `.c` files | |
0 commit comments