mio - Memory I/O Microbenchmark

A tool for measuring memory bandwidth across various access patterns on Linux.

Supports both NUMA system memory (--membind) and CXL/DevDAX devices (--devdax).

Requirements

Linux (Ubuntu recommended)
x86-64 CPU with AVX2 support
Python 3.10+
Root/sudo privileges
One of the following memory targets:
- NUMA node (any multi-socket or CXL-attached memory)
- CXL/DevDAX device (e.g., /dev/dax0.0)

Setup

Build only (no root required)

./build.sh

Full environment setup

sudo ./scripts/setup.sh                  # install deps + build
sudo ./scripts/setup.sh --build-only     # build without environment changes
sudo ./scripts/setup.sh --with-cxl       # also configure CXL device to DevDAX mode

The setup script can:

Install build tools (cmake, ninja-build, g++, numactl, ndctl, etc.)
Load MSR kernel module
(Optional, --with-cxl) Convert CXL device to DevDAX mode
Disable NUMA auto balancing
Disable swap
Build the benchmark binary

To verify the environment without making changes:

./scripts/setup.sh --check

If your DevDAX device is not dax0.0:

sudo ./scripts/setup.sh --with-cxl --devdax dax1.0

Recommended: DevDAX Mode

For best results, we recommend running benchmarks in DevDAX mode (--devdax).

Quick Start

Using CXL/DevDAX device (recommended)

# Sequential read on DevDAX device
sudo ./build/microbench --mode seq_read --threads 16 --memory-per-thread 65536 --devdax /dev/dax0.0

# Run full automated benchmark
sudo python3 scripts/run_benchmark.py --devdax /dev/dax0.0

Benchmark Modes

Automated benchmark (`run_benchmark.py`)

# Full benchmark (both large and small memory sizes)
sudo python3 scripts/run_benchmark.py --devdax /dev/dax0.0
sudo python3 scripts/run_benchmark.py --membind 4

# Large memory only (cache miss scenario)
sudo python3 scripts/run_benchmark.py --devdax /dev/dax0.0 --large-mem

# Small memory only (cache hit scenario)
sudo python3 scripts/run_benchmark.py --membind 4 --small-mem

# Quick mode (minimal test set)
sudo python3 scripts/run_benchmark.py --devdax /dev/dax0.0 4

Full mode tests:

seq_read, seq_write
random_read x3 block sizes (512K, 1M, 2M)
zipfian_read x3 block sizes (512K, 1M, 2M)
random_write (1M)

Quick mode tests (append 4):

seq_read, seq_write, random_read (1M), random_write (1M)

Memory size presets (with 16 threads):

Option	Per Thread	Total
`--large-mem`	65536 MiB	1 TiB
`--small-mem`	8192 MiB	128 GiB

Single test (`run.sh`)

./scripts/run.sh seq_read                        # default: 16 threads, 1MiB block
./scripts/run.sh random_read 32                  # 32 threads
./scripts/run.sh random_read 16 2097152          # block-size=2MiB
./scripts/run.sh --init seq_read                 # initialize memory with seq_write first
./scripts/run.sh --help                          # show all options

Note: When using DevDAX with certain devices, you may need the --init flag on the first run to initialize the device with a sequential write.

Supported modes: seq_read, seq_write, random_read, random_write, stride_read, stride_write, zipfian_read

Units

Parameter	Unit	Example
`block-size`	Bytes	`1048576` = 1 MiB, `2097152` = 2 MiB, `524288` = 512 KiB
`memory-per-thread`	MiB	`65536` = 64 GiB, `8192` = 8 GiB
`total-mem` (run.sh)	MiB	`1048576` = 1 TiB

See scripts/commands.md for a full list of copy-pasteable commands.

Output

Results are organized by timestamp:

result/
  2025-01-15_14-30-00/
    00_init_seq_write/
    mem_65536/
      01_cache_flush_1/
      02_seq_read/
      ...
    mem_8192/
      ...
summary/
  2025-01-15_14-30-00/
    benchmark_summary.txt

The summary file contains bandwidth results in a table format:

MemPerThread Mode                 Block Size       Bandwidth
------------ -------------------- ------------ ---------------
       65536 seq_read                        -      12.34 GB/s
       65536 random_read           1048576 B        8.56 GB/s
       ...

Advanced Options

For direct use of the benchmark binary:

sudo ./build/microbench [OPTIONS]

Key options:

Option	Description
`--mode <mode>`	Access pattern (e.g., `seq_read`, `random_write`)
`--threads <n>`	Number of threads (default: 1)
`--memory-per-thread <MiB>`	Memory per thread in MiB
`--block-size <bytes>`	Block size for random/stride access
`--membind <node>`	NUMA memory node (e.g., `0`, `4`)
`--devdax <device>`	DevDAX device path (e.g., `/dev/dax0.0`)
`--offset <hex>`	Memory offset for DevDAX
`--cpu-affinity <nodes>`	Pin threads to CPUs of NUMA node(s)
`--prefetch <ON\|OFF>`	Control HW prefetcher (requires Secure Boot disabled)
`--bypass-cache`	Use non-temporal stores
`--hugepage`	Use 2MB huge pages
`--result-dir <dir>`	Output directory
`--help`	Show all options

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
include		include
scripts		scripts
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mio - Memory I/O Microbenchmark

Requirements

Setup

Build only (no root required)

Full environment setup

Recommended: DevDAX Mode

Quick Start

Using CXL/DevDAX device (recommended)

Benchmark Modes

Automated benchmark (`run_benchmark.py`)

Single test (`run.sh`)

Units

Output

Advanced Options

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mio - Memory I/O Microbenchmark

Requirements

Setup

Build only (no root required)

Full environment setup

Recommended: DevDAX Mode

Quick Start

Using CXL/DevDAX device (recommended)

Benchmark Modes

Automated benchmark (run_benchmark.py)

Single test (run.sh)

Units

Output

Advanced Options

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Automated benchmark (`run_benchmark.py`)

Single test (`run.sh`)

Packages