From-scratch, zero-dependency C implementations of high-performance computing, scientific computation, parallel programming, and numerical methods. Each module covers everything from MPI/OpenMP parallel programming to GPU computing, linear algebra libraries, and physics simulation — bridging HPC theory to runnable C code.
| Module | Topics | Key References |
|---|---|---|
| mini-mpi-lab | MPI: point-to-point (send/recv), collectives (bcast/reduce/allreduce/scatter), communicators, topology | MPI 4.0 Standard |
| mini-openmp-lab | OpenMP: parallel for, sections, reduction, critical/atomic, task, scheduling, thread affinity | OpenMP 5.2 Spec |
| mini-cuda-hpc | CUDA kernel, grid/block/thread, shared memory, cooperative groups, streams/events, NVLink, GPU-aware MPI | CUDA C++ Programming Guide |
| mini-parallel-compute | Data vs task parallelism, loop parallelism, fork-join, work-stealing scheduler, Amdahl/Gustafson's law | MIT 6.172, Cilk |
| mini-linear-algebra-lib | BLAS (GEMM, GEMV), LAPACK (LU/QR/SVD), sparse (CSR/SpMV), iterative solvers (CG, GMRES), eigen | Reference BLAS, LAPACK |
| mini-numerical-compute | ODE (RK4, Verlet), PDE (FDM: Jacobi/Gauss-Seidel/SOR), FFT, Monte Carlo, root finding, optimization | Numerical Recipes, MIT 18.330 |
| mini-dist-training-infra | Data parallel (DDP), model/pipeline parallel, FSDP/ZeRO, gradient compression, all-reduce optimization | Megatron-LM, DeepSpeed ZeRO |
| mini-supercomputing | Top500 metrics (Linpack), HPC cluster arch (login/compute/storage), Slurm job scheduler, InfiniBand | Top500, Slurm Docs |
| mini-performance-eng | Roofline model, flops/byte analysis, cache blocking/tiling, loop optimization, vectorization, prefetch | Williams "Roofline", Agner Fog |
| mini-physics-simulation | N-body (Barnes-Hut, PM), CFD (Lattice Boltzmann), MD (Lennard-Jones, thermostat), SPH | Frenkel "Understanding MD" |
| mini-scientific-workflow | DAG workflow (Dask-like), container (Singularity), HPC I/O (HDF5 sim), checkpoint, provenance | Dask, Singularity, HDF5 |
- Zero external dependencies — pure C (C99/C11), only
libcandlibm - Self-contained modules — each directory has its own
Makefile,include/,src/,examples/,demos/,tests/ - HPC simulation in user-space — educational models of MPI communication, GPU kernels, and physics simulations
- Theory-to-code mapping — every module includes
docs/with paper/standard-alignment notes - Practical demos — MPI collective simulator, CUDA kernel simulator, GEMM optimizer, N-body simulator, and more
Each module is standalone. Navigate to a module directory and run:
cd mini-mpi-lab
make all # build everything
make test # run testsRequires GCC and GNU Make.
mini-hpc-sci-compute/
├── mini-mpi-lab/ # MPI Parallel Programming
├── mini-openmp-lab/ # OpenMP Shared-Memory Programming
├── mini-cuda-hpc/ # CUDA GPU Computing
├── mini-parallel-compute/ # Parallel Computing Fundamentals
├── mini-linear-algebra-lib/ # Linear Algebra Library
├── mini-numerical-compute/ # Numerical Computing
├── mini-dist-training-infra/ # Distributed Training Infrastructure
├── mini-supercomputing/ # Supercomputing
├── mini-performance-eng/ # Performance Engineering
├── mini-physics-simulation/ # Physics Simulation
└── mini-scientific-workflow/ # Scientific Workflows
MIT