GPU SpMV Deliverable

Implementations

CPU CSR reference implementation
GPU simple CSR kernel: one CUDA thread per row
GPU adaptive CSR kernel:
- normal rows: one thread per row
- long rows: one CUDA block per row with shared-memory reduction
cuSPARSE CSR SpMV comparison

Build

make

Data

Download the SuiteSparse Matrix Collection inputs used for the benchmarks:

./scripts/download_williams_matrices.sh

This creates data/*.mtx for the following matrices:

pdb1HYS
consph
cant
pwtk
mac_econ_fwd500
mc2depi
cop20k_A
scircuit
webbase-1M
rail4284

Validation

make run

Benchmark

./build/benchmark data/*.mtx

Output

Benchmark CSV files are written to results/.

Reproducibility Environment

The included benchmark results were produced with the following environment:

Node: edu01
Loaded module: CUDA/12.3.2
GPU: NVIDIA A30, 24576 MiB
NVIDIA driver: 550.144.03
NVIDIA-SMI reported CUDA version: 12.4
CUDA compiler: nvcc release 12.3, V12.3.107
GCC: 11.4.1 20230605 (Red Hat 11.4.1-2)
G++: 11.4.1 20230605 (Red Hat 11.4.1-2)
CPU: Intel Xeon Silver 4309Y CPU @ 2.80GHz

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
apps		apps
include		include
results		results
scripts		scripts
src		src
.gitignore		.gitignore
Makefile		Makefile
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU SpMV Deliverable

Implementations

Build

Data

Validation

Benchmark

Output

Reproducibility Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU SpMV Deliverable

Implementations

Build

Data

Validation

Benchmark

Output

Reproducibility Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages