Skip to content

Commit c897694

Browse files
committed
chore(Cache): add README and rename USE_FRO_CACHE to MATHLIB_CACHE_USE_CLOUDFLARE (leanprover-community#33647)
This PR: - Adds `Cache/README.md` documenting the cache system, environment variables, and how to set up custom endpoints - Renames `USE_FRO_CACHE` to `MATHLIB_CACHE_USE_CLOUDFLARE` for consistent `MATHLIB_CACHE_*` prefix across all cache environment variables - Updates the help text in `Cache/Main.lean` to document the `--repo` option and environment variables - Fixes help text saying "Run 'mk'" (should be "Run 'pack'") 🤖 Prepared with Claude Code
1 parent f53fc46 commit c897694

File tree

8 files changed

+288
-33
lines changed

8 files changed

+288
-33
lines changed

.github/build.in.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,7 @@ jobs:
395395
# do not try to upload files just downloaded
396396
397397
echo "Uploading cache to Azure..."
398-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
398+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
399399
env:
400400
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
401401

@@ -451,8 +451,8 @@ jobs:
451451
export MATHLIB_CACHE_SAS="${MATHLIB_CACHE_SAS_RAW%"${MATHLIB_CACHE_SAS_RAW##*[![:space:]]}"}"
452452
453453
echo "Uploading Archive and Counterexamples cache to Azure..."
454-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
455-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
454+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
455+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
456456
env:
457457
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
458458

.github/workflows/bors.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -405,7 +405,7 @@ jobs:
405405
# do not try to upload files just downloaded
406406
407407
echo "Uploading cache to Azure..."
408-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
408+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
409409
env:
410410
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
411411

@@ -461,8 +461,8 @@ jobs:
461461
export MATHLIB_CACHE_SAS="${MATHLIB_CACHE_SAS_RAW%"${MATHLIB_CACHE_SAS_RAW##*[![:space:]]}"}"
462462
463463
echo "Uploading Archive and Counterexamples cache to Azure..."
464-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
465-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
464+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
465+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
466466
env:
467467
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
468468

.github/workflows/build.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -411,7 +411,7 @@ jobs:
411411
# do not try to upload files just downloaded
412412
413413
echo "Uploading cache to Azure..."
414-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
414+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
415415
env:
416416
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
417417

@@ -467,8 +467,8 @@ jobs:
467467
export MATHLIB_CACHE_SAS="${MATHLIB_CACHE_SAS_RAW%"${MATHLIB_CACHE_SAS_RAW##*[![:space:]]}"}"
468468
469469
echo "Uploading Archive and Counterexamples cache to Azure..."
470-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
471-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
470+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
471+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
472472
env:
473473
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
474474

.github/workflows/build_fork.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -409,7 +409,7 @@ jobs:
409409
# do not try to upload files just downloaded
410410
411411
echo "Uploading cache to Azure..."
412-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
412+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked
413413
env:
414414
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
415415

@@ -465,8 +465,8 @@ jobs:
465465
export MATHLIB_CACHE_SAS="${MATHLIB_CACHE_SAS_RAW%"${MATHLIB_CACHE_SAS_RAW##*[![:space:]]}"}"
466466
467467
echo "Uploading Archive and Counterexamples cache to Azure..."
468-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
469-
USE_FRO_CACHE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
468+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Archive.lean
469+
MATHLIB_CACHE_USE_CLOUDFLARE=0 ../master-branch/.lake/build/bin/cache --repo=${{ github.event.pull_request.head.repo.full_name || github.repository }} put-unpacked Counterexamples.lean
470470
env:
471471
MATHLIB_CACHE_SAS_RAW: ${{ secrets.MATHLIB_CACHE_SAS }}
472472

Cache/Init.lean

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ namespace Cache.Requests
88

99
open System (FilePath)
1010

11-
-- FRO cache may be flaky: https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/The.20cache.20doesn't.20work/near/411058849
11+
-- Cloudflare cache may be flaky: https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/The.20cache.20doesn't.20work/near/411058849
1212
-- This is defined in a separate file because it is used in the definition of `URL` and `UPLOAD_URL`
1313
-- and Lean does not allow one `initialize` to use another `initialize` defined in the same file
14-
initialize useFROCache : Bool ← do
15-
let froCache ← IO.getEnv "USE_FRO_CACHE"
16-
return froCache == some "1" || froCache == some "true"
14+
initialize useCloudflareCache : Bool ← do
15+
let cache ← IO.getEnv "MATHLIB_CACHE_USE_CLOUDFLARE"
16+
return cache == some "1" || cache == some "true"

Cache/Main.lean

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Authors: Arthur Paulino, Jon Eugster
77
import Cache.Requests
88

99
def help : String := "Mathlib4 caching CLI
10-
Usage: cache [COMMAND]
10+
Usage: cache [COMMAND] [OPTIONS]
1111
1212
Commands:
1313
# No privilege required
@@ -20,15 +20,17 @@ Commands:
2020
unpack! Decompress linked already downloaded files (no skipping)
2121
clean Delete non-linked files
2222
clean! Delete everything on the local cache
23-
lookup [ARGS] Show information about cache files for the given lean files
23+
lookup [ARGS] Show information about cache files for the given Lean files
2424
2525
# Privilege required
26-
put Run 'mk' then upload linked files missing on the server
27-
put! Run 'mk' then upload all linked files
26+
put Run 'pack' then upload linked files missing on the server
27+
put! Run 'pack' then upload all linked files
2828
put-unpacked 'put' only files not already 'pack'ed; intended for CI use
2929
commit Write a commit on the server
3030
commit! Overwrite a commit on the server
31-
collect TODO
31+
32+
Options:
33+
--repo=OWNER/REPO Override the repository to fetch/push cache from
3234
3335
* Linked files refer to local cache files with corresponding Lean sources
3436
* Commands ending with '!' should be used manually, when hot-fixes are needed
@@ -46,6 +48,15 @@ Valid arguments are:
4648
* Folder names like 'Mathlib/Data/' (find all Lean files inside `Mathlib/Data/`)
4749
* With bash's automatic glob expansion one can also write things like
4850
'Mathlib/**/Order/*.lean'.
51+
52+
# Environment variables
53+
54+
* MATHLIB_CACHE_DIR Local cache directory (default: ~/.cache/mathlib)
55+
* MATHLIB_CACHE_USE_CLOUDFLARE Set to '1' to use Cloudflare instead of Azure
56+
* MATHLIB_CACHE_GET_URL Override the download URL
57+
* MATHLIB_CACHE_PUT_URL Override the upload URL
58+
59+
See Cache/README.md for more details.
4960
"
5061

5162
/-- Commands which (potentially) call `curl` for downloading files -/

Cache/README.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# Mathlib Cache
2+
3+
This directory contains the implementation of Mathlib's build cache system (`lake exe cache`), which downloads pre-built `.olean` files to avoid recompiling the entire library.
4+
5+
> **Note**: A new `lake cache` command is currently being designed and implemented in Lake itself. This will eventually replace the Mathlib-specific `lake exe cache` and work for all repositories. Until then, this cache system remains the primary way to get pre-built artifacts for Mathlib.
6+
7+
## Quick Start
8+
9+
```bash
10+
# Download and unpack cache for all of Mathlib
11+
lake exe cache get
12+
13+
# Force re-download everything
14+
lake exe cache get!
15+
16+
# Download cache for specific files only (and their dependencies)
17+
lake exe cache get Mathlib/Algebra/Group/Basic.lean
18+
lake exe cache get Mathlib.Algebra.Group.Basic
19+
```
20+
21+
## Commands
22+
23+
### No Privilege Required
24+
25+
| Command | Description |
26+
|-----------------|---------------------------------------------------------------------|
27+
| `get [ARGS]` | Download linked files missing on the local cache and decompress |
28+
| `get! [ARGS]` | Download all linked files and decompress (force re-download) |
29+
| `get- [ARGS]` | Download linked files missing to local cache, but do not decompress |
30+
| `pack` | Compress non-compressed build files into the local cache |
31+
| `pack!` | Compress build files into the local cache (no skipping) |
32+
| `unpack` | Decompress linked already downloaded files |
33+
| `unpack!` | Decompress linked already downloaded files (no skipping) |
34+
| `clean` | Delete non-linked files |
35+
| `clean!` | Delete everything on the local cache |
36+
| `lookup [ARGS]` | Show information about cache files for the given Lean files |
37+
38+
### Privilege Required (CI/Maintainers)
39+
40+
| Command | Description |
41+
|----------------|-----------------------------------------------------------|
42+
| `put` | Run `pack` then upload linked files missing on the server |
43+
| `put!` | Run `pack` then upload all linked files |
44+
| `put-unpacked` | `put` only files not already packed; intended for CI use |
45+
| `commit` | Write a commit on the server |
46+
| `commit!` | Overwrite a commit on the server |
47+
48+
### Arguments
49+
50+
The `get`, `get!`, `get-`, and `lookup` commands accept:
51+
52+
- Module names: `Mathlib.Algebra.Group.Basic`
53+
- File paths: `Mathlib/Algebra/Group/Basic.lean`
54+
- Folder names: `Mathlib/Data/` (finds all Lean files inside)
55+
- Glob patterns: `Mathlib/**/Order/*.lean` (via shell expansion)
56+
57+
When arguments are provided, only the specified files and their transitive imports are downloaded.
58+
59+
### Options
60+
61+
| Option | Description |
62+
|---------------------|--------------------------------------------------------------------------------------------|
63+
| `--repo=OWNER/REPO` | Override the repository to fetch cache from (e.g., `--repo=leanprover-community/mathlib4`) |
64+
65+
## Environment Variables
66+
67+
### Cache Location
68+
69+
| Variable | Description | Default |
70+
|---------------------|------------------------------------|-------------------------------------------------|
71+
| `MATHLIB_CACHE_DIR` | Directory for cached `.ltar` files | `$XDG_CACHE_HOME/mathlib` or `~/.cache/mathlib` |
72+
73+
### Cache Backend Selection
74+
75+
| Variable | Description | Default |
76+
|--------------------------------|----------------------------------------------------------|-------------|
77+
| `MATHLIB_CACHE_USE_CLOUDFLARE` | Set to `1` or `true` to use Cloudflare R2 instead of Azure | Azure cache |
78+
79+
### Custom Cache URLs
80+
81+
These allow overriding the cache endpoints, useful for mirrors or custom deployments:
82+
83+
| Variable | Description | Default |
84+
|-------------------------|---------------------------------|-------------------------------------------------------------------|
85+
| `MATHLIB_CACHE_GET_URL` | URL for downloading cache files | Azure or Cloudflare URL based on `MATHLIB_CACHE_USE_CLOUDFLARE` |
86+
| `MATHLIB_CACHE_PUT_URL` | URL for uploading cache files | Azure or Cloudflare URL based on `MATHLIB_CACHE_USE_CLOUDFLARE` |
87+
88+
### Authentication (for uploads)
89+
90+
| Variable | Description |
91+
|--------------------------|------------------------------------------------|
92+
| `MATHLIB_CACHE_SAS` | Azure SAS token (when using Azure backend) |
93+
| `MATHLIB_CACHE_S3_TOKEN` | S3 credentials (when using Cloudflare backend) |
94+
95+
## How It Works
96+
97+
### File Hashing
98+
99+
Each Lean file's cache is identified by a hash computed from:
100+
101+
1. **Root hash**: A combination of:
102+
- `lakefile.lean` content
103+
- `lean-toolchain` content
104+
- `lake-manifest.json` content
105+
- The Lean compiler's git hash
106+
- A generation counter (bumped to invalidate all caches)
107+
108+
2. **File hash**: Mixing:
109+
- The root hash
110+
- The file's path relative to its package
111+
- The file's content hash
112+
- Hashes of all imported files
113+
114+
This ensures that any change to toolchain, dependencies, or source files produces a different cache key.
115+
116+
### Cache File Format
117+
118+
Cache files use the `.ltar` format (Lean tar), handled by [leantar](https://github.com/digama0/leangz). Each `.ltar` contains:
119+
120+
- `.olean` files (compiled Lean)
121+
- `.ilean` files (interface info)
122+
- `.trace` files (build traces)
123+
- `.c` files (generated C code)
124+
- Associated `.hash` files
125+
126+
### Cached Packages
127+
128+
The cache covers these packages:
129+
130+
- `Mathlib`
131+
- `Batteries`
132+
- `Aesop`
133+
- `Cli`
134+
- `ImportGraph`
135+
- `LeanSearchClient`
136+
- `Plausible`
137+
- `Qq`
138+
- `ProofWidgets`
139+
- `Archive`
140+
- `Counterexamples`
141+
142+
## Default Cache Backends
143+
144+
### Azure Blob Storage (Default)
145+
146+
- **Download URL**: `https://lakecache.blob.core.windows.net/mathlib4`
147+
- Used by default for downloads and uploads
148+
149+
### Cloudflare R2
150+
151+
- **Download URL**: `https://mathlib4.lean-cache.cloud`
152+
- **Upload URL**: `https://a09a7664adc082e00f294ac190827820.r2.cloudflarestorage.com/mathlib4`
153+
- Enable with `MATHLIB_CACHE_USE_CLOUDFLARE=1`
154+
155+
## Setting Up Your Own Cache Endpoint
156+
157+
You can host your own cache mirror or private cache using any S3-compatible storage or HTTP server.
158+
159+
### Requirements
160+
161+
Your endpoint must support:
162+
163+
1. **GET requests** for downloading files at:
164+
- `/f/{repo}/{hash}.ltar` - for fork caches
165+
- `/f/{hash}.ltar` - for main mathlib cache (Azure only)
166+
- `/c/{commit_hash}` - for commit manifests
167+
168+
2. **PUT requests** for uploading (if you need upload capability)
169+
170+
### Using a Custom Endpoint
171+
172+
```bash
173+
# Download from a custom mirror
174+
export MATHLIB_CACHE_GET_URL="https://my-mirror.example.com/mathlib4"
175+
lake exe cache get
176+
177+
# Upload to a custom endpoint
178+
export MATHLIB_CACHE_PUT_URL="https://my-upload.example.com/mathlib4"
179+
export MATHLIB_CACHE_SAS="your-auth-token" # or MATHLIB_CACHE_S3_TOKEN for S3
180+
lake exe cache put
181+
```
182+
183+
### Example: S3-Compatible Storage
184+
185+
For S3-compatible storage (MinIO, Cloudflare R2, AWS S3, etc.):
186+
187+
1. Create a bucket (e.g., `mathlib-cache`)
188+
2. Configure public read access for downloads (or use signed URLs)
189+
3. Set up authentication for uploads
190+
4. Set the environment variables:
191+
192+
```bash
193+
export MATHLIB_CACHE_GET_URL="https://your-bucket.s3.region.amazonaws.com/mathlib-cache"
194+
export MATHLIB_CACHE_PUT_URL="https://your-bucket.s3.region.amazonaws.com/mathlib-cache"
195+
export MATHLIB_CACHE_USE_CLOUDFLARE=1 # Use S3-style auth
196+
export MATHLIB_CACHE_S3_TOKEN="ACCESS_KEY:SECRET_KEY"
197+
```
198+
199+
### Example: Simple HTTP Mirror
200+
201+
For a read-only mirror using nginx or any static file server:
202+
203+
1. Periodically sync files from the official cache
204+
2. Serve them at a public URL
205+
3. Point users to your mirror:
206+
207+
```bash
208+
export MATHLIB_CACHE_GET_URL="https://mathlib-mirror.myorg.com"
209+
lake exe cache get
210+
```
211+
212+
### URL Structure
213+
214+
The cache uses this URL pattern:
215+
216+
```
217+
{BASE_URL}/f/{repo}/{filename}.ltar # Fork/branch caches
218+
{BASE_URL}/f/{filename}.ltar # Main mathlib cache (Azure)
219+
{BASE_URL}/c/{commit_hash} # Commit manifests
220+
```
221+
222+
Where:
223+
- `{repo}` is like `leanprover-community/mathlib4` or `username/mathlib4`
224+
- `{filename}` is a hash like `1234567890abcdef`
225+
- `{commit_hash}` is a git commit SHA
226+
227+
## Dependencies
228+
229+
The cache system automatically downloads and manages:
230+
231+
- **curl** (>=7.70, preferably >=7.81) - for HTTP transfers
232+
- **leantar** - for `.ltar` compression/decompression
233+
234+
If your system curl is too old, a static binary is downloaded automatically on Linux.
235+
236+
## File Locations
237+
238+
| Path | Description |
239+
|-----------------------------|------------------------------|
240+
| `~/.cache/mathlib/` | Default cache directory |
241+
| `~/.cache/mathlib/*.ltar` | Cached build artifacts |
242+
| `~/.cache/mathlib/curl.cfg` | Temporary curl configuration |
243+
| `.lake/build/lib/lean/` | Unpacked `.olean` files |
244+
| `.lake/build/ir/` | Unpacked `.c` files |

0 commit comments

Comments
 (0)