| title | sdk |
|---|---|
K-means Python vs C++ (pybind11) |
docker |
This repository revisits and modernizes a k-means clustering project originally implemented during my EPITECH mathematics coursework, adapted here to demonstrate a hybrid Python/C++ (pybind11) approach for performance-critical computation.
This repository presents a hybrid Python/C++ architecture for performance-critical scientific workloads:
- Python is used for orchestration, configuration, and experiment iteration.
- C++ is used for the computational kernel (k-means inner loop).
pybind11provides a thin binding layer so the C++ kernel can be called from Python.
The intent is to make the boundary between experimentation code and performance-sensitive kernels explicit.
┌──────────────────────────────┐
│ Python layer │
│ - dataset generation │
│ - experiment orchestration │
│ - web API + UI integration │
│ Files: web_app.py, kmeans.py │
└───────────────┬──────────────┘
│ (NumPy arrays)
┌───────────────▼──────────────┐
│ Binding layer (pybind11) │
│ - marshaling + validation │
│ Files: setup.py, kmeans_cpp │
└───────────────┬──────────────┘
│ (C-contiguous float64)
┌───────────────▼──────────────┐
│ C++ layer (kernel) │
│ - assignment + update loops │
│ - minimal allocations │
│ File: kmeans_cpp.cpp │
└──────────────────────────────┘
Key boundary decisions:
- The Python API passes a single dense
float64arrayXinto C++. - The C++ implementation releases the Python GIL during the hot loop.
- Both implementations expose comparable outputs (labels, centers, inertia, iteration count).
The benchmark preset used by the web app is:
k=25n_features=10points_per_cluster=650(soN = 25 * 650 = 16250)max_iter=35,n_init=1,init=kmeans++,random_state=7
Example wall-time results (single run):
| Implementation | Dataset | Runtime |
|---|---|---|
| Python | N=16250, d=10, k=25 |
~19.87 s |
| C++ (pybind11) | N=16250, d=10, k=25 |
~0.064 s |
Numbers vary by machine and compiler flags.
- The benchmark dataset generation uses a fixed
random_state. - Each implementation is deterministic given a fixed seed and fixed inputs.
- Due to floating-point and tie-breaking differences, Python and C++ may converge to slightly different local minima; this project focuses on timing and comparable convergence behavior.
This repository includes lightweight profiling scripts:
profile_python.py: runs the Python implementation undercProfileand prints a summary.profile_cpp.py: runs the C++ implementation and prints the timing breakdown reported by the extension.
Run them locally:
python profile_python.py
python profile_cpp.pyTypical outcome:
- On the benchmark preset on one run (
python profile_python.py, total ~13.42s),cProfileshowed the assignment step dominating runtime:kmeans.py:_assign_naive~11.98s self time (~89% of total), ~13.03s cumulative (~97% of total)numpy.dotinside the assignment loop ~1.04s self time (~8% of total)kmeans.py:_update_naive~0.38s self time (~3% of total)
This motivates moving the assignment/update kernel into C++ while keeping Python for orchestration.
- Live app (Hugging Face Spaces): https://huggingface.co/spaces/Zivana/kmeans-pybind11-demo
- Canonical repository + documentation (GitHub): https://github.com/adriendomoison/kmeans-pybind11-demo
pybind11extension: binding a C++ implementation as an importable Python module.- Performance: the C++ implementation releases the Python GIL during the tight loop.
- API: a minimal FastAPI backend + static frontend for side-by-side comparison.
python -m pip install -r requirements.txt
python -m pip install -e .python -m uvicorn web_app:app --host 127.0.0.1 --port 8000Open:
web_app.py: FastAPI service used for the deployed app and benchmark preset generation.static/: minimal UI that triggers Python and C++ runs in parallel and visualizes results.kmeans.py: reference Python implementation of k-means (baseline + profiling target).kmeans_cpp.cpp: C++ k-means kernel (assignment + update loops) exposed viapybind11.setup.py,pyproject.toml: packaging/build configuration for the extension.profile_python.py,profile_cpp.py: reproducible profiling/timing entrypoints.run_kmeans.py: optional CLI runner for local timing.
Reproduce profiling evidence:
python profile_python.py
python profile_cpp.pyRun the interactive UI locally:
python -m uvicorn web_app:app --host 127.0.0.1 --port 8000- The benchmark preset uses
algorithm="naive"for the Python implementation to make interpreter overhead in the assignment loop visible in profiling. - The C++ implementation mirrors the same algorithmic steps but executes the hot loops in compiled code and releases the GIL.
This repository also contains DBSCAN code (dbscan.py, run_dbscan.py, and optional dbscan_cpp.cpp) kept as a secondary reference for clustering algorithms and extension-module structure. It is not used by the k-means implementation.