This directory contains miscellaneous scripts that are useful for working on or with mathlib. When adding a new script, please make sure to document it here, so other readers have a chance to learn about it as well!
Note: CI automation scripts have been moved to mathlib-ci.
Workflows that use those scripts now execute them from external checkouts (typically
under ci-tools/).
Repository analysis and reporting
-
benchis mathlib's benchmark suite. View its README.md for more details. -
user_activity_report.pyGenerates a comprehensive report of all users with repository access and their last commit activity. Shows username, age of last commit, and access level, sorted by commit recency (most recent first).Features:
- Fetches repository collaborators and organization members via GitHub API
- Intelligent caching: user lists (24h TTL) and commit data (6h TTL) for performance
- Access level filtering:
--admin(admin users only),--write(write+ access) - Single user analysis:
--user USERNAMEfor debugging specific users - Result limiting:
--limit Nfor testing with smaller datasets - Inactive user cleanup:
--remove Ngenerates (but doesn't execute) gh commands to remove write access from non-admin users inactive for more than N days - Fallback to contributors API if collaborators access is restricted (
--contributors-only)
Caching: Results cached in
scripts/users_cache.jsonandscripts/commits_cache.json(automatically added to .gitignore). Cache saved after each commit lookup to prevent data loss.Requirements:
gh(GitHub CLI) installed and authenticated (gh auth login).
Tools for manual maintenance
-
fix_unused.pyBulk processing of unused variable warnings, replacing them with_. -
fix_deprecations.pyAutomatically fixes deprecation warnings by replacing deprecated identifiers with their suggested replacements. Runslake build --no-buildto collect deprecation warnings, then applies minimal changes by verifying the deprecated term appears at the expected column position before replacement. Handles both fully-qualified names (e.g.,Fin.lt_iff_val_lt_val) and unqualified names used within namespaces (e.g.,lt_iff_val_lt_val). Only processes files that exist in the current repository. Safe to run multiple times; processes all warnings in a single pass. Usage:python3 scripts/fix_deprecations.py -
add_deprecations.shis a text-based script that automatically adds deprecation statements. It assumes that the only difference between master and the current status of the PR consists of renames. More precisely, any change on a line that contains a declaration name and is not a rename, will likely confuse the script. -
create_deprecated_modules.leandefines the#create_deprecated_modulescommand that automatically generates thedeprecated_moduleentries, gathering information fromgit. The expectation is that this will be expanded to a fully automated process that happens in CI. -
migrate_to_fork.pyHelps contributors migrate from having direct write access to the main repository to using a fork-based workflow. This comprehensive script automates the entire migration process:- Validates the current branch (prevents migration of system branches like master, nightly-testing)
- Checks GitHub CLI installation/authentication with OS-specific installation instructions
- Creates or syncs a fork of mathlib4 automatically
- Sets up git remotes correctly (
upstreamfor leanprover-community/mathlib4,originfor user's fork) - Detects already-completed migration steps and skips them for efficiency
- Migrates the current branch to the fork with proper upstream tracking
- Intelligently handles existing PRs (migrates main repo PRs to fork-based PRs, detects existing fork PRs)
- Uses fast delete/re-add approach for remote operations to avoid slow branch tracking updates
- Provides comprehensive status reporting and next steps guidance
Run with
python3 scripts/migrate_to_fork.py(interactive) orpython3 scripts/migrate_to_fork.py -y(auto-accept). Requires GitHub CLI (gh) installed and authenticated. Safe to run multiple times.
-
githelper.pyThe subcommandgithelper.py fixhelps contributors fix their git repository setup by step-by-step converting it from its current state to a well-defined target state. The target state mostly matches the state after of a freshly cloned fork (gh repo clone <fork>) and looks like this:- The remote
upstreampoints toleanprover-community/mathlib4 - The remote
originpoints to the contributor's own fork - The
ghdefault repo points toleanprover-community/mathlib4 masters remote isupstreambut its pushRemote isorigin
Other subcommands to automate git-related actions may be added in the future.
- The remote
Analyzing Mathlib's import structure
topological_sort.pyPrints Mathlib modules in topological (import-DAG) order. By default leaves come last (roots first); use--reversefor leaves first. If filenames or module names are provided on stdin, outputs only those modules in topological order. Usage:python3 scripts/topological_sort.py [--reverse]
Backward-compatibility set_option migration tools
These scripts help with testing Lean PRs that change backward-compatibility option behaviour. They share a common DAG traversal library that parallelises work in import-graph order.
-
dag_traversal.pyReusable parallel DAG traversal for Lean import graphs. Parses the import DAG from.leansource files and parallelises an action over a forward or backward traversal. Each module is submitted to the thread pool the moment its last in-DAG dependency finishes.Library usage:
from dag_traversal import DAG, traverse_dag dag = DAG.from_directories(Path(".")) results = traverse_dag(dag, my_action, direction="forward", stop_on_failure=True)
CLI usage:
scripts/dag_traversal.py --forward 'lake build {}' # {} = file path scripts/dag_traversal.py --backward --module 'echo {}' # {} = module name scripts/dag_traversal.py --forward -j4 'my_script {}'
-
set_option_utils.pyShared configuration foradd_set_option.pyandrm_set_option.py. Contains the list of managed options (backward.isDefEq.respectTransparency,backward.whnf.reducibleClassField) and helpers for generatingset_optionlines and regex patterns. -
add_set_option.pyAddsset_option ... false inbefore declarations that fail to build. Traverses the DAG forward (roots first) so each module is only built after all its imports are clean. For each error, finds the enclosing declaration and tries candidate option combinations (most-recently-successful first, then each single option, then all together). Usage:python3 scripts/add_set_option.py [--option NAME] [--max-workers N] [--timeout N] [--files FILE ...] -
rm_set_option.pyRemoves unnecessaryset_option ... false inlines. Only targets lines without a trailing comment; lines likeset_option ... false in -- reasonare left untouched. Traverses the DAG backward (leaves first) so removing an option from an upstream file doesn't invalidate cached builds of downstream files. Tries removing all occurrences at once; if that fails, falls back to one-at-a-time removal. Usage:python3 scripts/rm_set_option.py [--option NAME] [--dry-run] [--max-workers N] [--files FILE ...]
CI workflow
lake-build-with-retry.shRunslake buildon a target untillake build --no-buildsucceeds. Used in the main build workflows.lake-build-wrapper.pyA wrapper script forlake buildwhich collapses normal build into log groups and saves a build summary JSON file. See file for usage.mk_all.leanrun vialake exe mk_all, regenerates the import-only filesMathlib.lean,Mathlib/Tactic.lean,Archive.leanandCounterexamples.leanlint-style.lean,lint-style.py,print-style-errors.shstyle linters, written in Python and Lean. Run vialake exe lint-style. Medium-term, the latter two scripts should be rewritten and incorporated inlint-style.lean.check_title_labels.leanverifies that a (non-WIP, non-draft) PR has a well-formed title. In the future, it may also check that a feature PR has a topic label.lint-bib.shnormalize the BibTeX filedocs/references.bibusingbibtool.yaml_check.py,check-yaml.leanSanity checks forundergrad.yaml,overview.yaml,100.yamland1000.yaml.autolabel.leanis the Lean script in charge of automatically adding at-label on eligible PRs. Autolabelling is inferred by which directories the current PR modifies.auto_commit.shruns a command and creates a commit with the result. The commit message formatx scripts/auto_commit.sh <command>enables a rebase workflow: during interactive rebase, you can convertpick abc # x scripts/auto_commit.sh cmdtox scripts/auto_commit.sh cmd(by deleting the "pick abc # " prefix), and git will re-run the command via exec. Example:scripts/auto_commit.sh lake exe mk_allManaging downstream reposdownstream_repos.ymlcontains basic information about significant downstream repositories.downstream-tags.pyis a script to check whether a given tag exists on the downstream repositories listed indownstream_repos.yml.downstream_dashboard.pyinspects the CI infrastructure of each repository indownstream_repos.ymland makes actionable suggestions for improvement or automation.
Version tag verification
verify_version_tags.pyverifies that Mathlib version tags are correctly published across git, GitHub, elan toolchains, and build cache.
Data files with linter exceptions
nolints.jsoncontains exceptions for allenv_linters in mathlib. For permanent and deliberate exceptions, add a@[nolint lintername]in the .lean file instead.nolints_prime_decls.txtcontains temporary exceptions for thedocPrimelinter
Both of these files should tend to zero over time; please do not add new entries to these files. PRs removing (the need for) entries are welcome.
API surrounding CI
check_title_labels.leanis used to check whether a PR title follows our commit style conventions.
Docker images
docker_build.shbuilds thelean4,gitpod4, andgitpod4-blueprintDocker images. These are used by some CI workflows, as well as places such as Gitpod.docker_push.shfirst runsdocker_build.sh, and then pushes the images to Docker Hub, appropriately tagged with the date on which the images were built. This should be re-run after breaking changes toelan, so that CI and Gitpod have access to updated versions ofelan.