Skip to content

theRTLmaker/CUDA_in_100_days

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

100 Days of CUDA Kernels πŸš€ progress badge

A personal journey to explore, implement, and deeply understand GPU programming through 100 small, focused CUDA projects.

The aim is to learn GPU programming by building, testing, profiling and documenting small kernels that grow in complexity over time.

What you'll find here

  • Daily challenge folders (challanges/1_vectorAdd, challanges/2_matrixMult, ...) each containing notes.md and the kernel implementation (*.cu).

πŸ’‘ The Idea

Every day, I build one CUDA kernel β€” from the basics (vector addition) all the way to advanced patterns (shared memory tiling, warp-level primitives, cooperative groups, streams, graph execution, etc.).

This repository documents my progress with:

  • πŸ“˜ Daily Notes β€” notes.md inside each folder
  • 🧠 Explanations β€” kernels and CUDA concepts
  • πŸ§ͺ Code Implementations β€” clean, runnable examples
  • πŸ“Š A Progress Table β€” tracking each challenge

The goal is not just to write kernels β€” it's to understand how they interact with the architecture and how to write correct, fast, and maintainable GPU code.

Project goals

  • Understand CUDA execution model (threads, warps, blocks, grids).
  • Learn memory hierarchy and optimization: shared memory, registers, caches, and HBM/GDDR characteristics.
  • Explore advanced features: cooperative groups, streams, graphs, CUDA Graphs, Tensor Cores, WMMA, and memory-bound optimizations.
  • Improve profiling & benchmarking skills (nvprof / Nsight / nvtx markers).
  • Produce short, self-contained notes for each day.

🧭 Repository Structure

CUDA_in_100_days/
β”œβ”€β”€ challanges/
β”‚   β”œβ”€β”€ ...
β”‚   └── N_<kernel_name>/
β”‚       β”œβ”€β”€ notes.md
β”‚       └── <kernel_name>.cu
β”œβ”€β”€ scripts/
β”‚   └── update_readme.py
β”œβ”€β”€ notes_template.md
β”œβ”€β”€ badge.svg
β”œβ”€β”€ README.md
└── .gitignore

πŸ“… Progress Table

Day Folder Topic Short description
1 1_vectorAdd Vector Addition Basic CUDA kernel computing element-wise addition of two float vectors.
2 2_matrixMult Matrix Multiplication Naive dense matrix multiplication kernel, revisiting thread indexing in 2D, memory coalescing.
3 3_sharedMem_MatrixMult Shared Memory Matrix Multiplication Use of shared memory to reduce number of global accesses between threads on the same block
4 4_sharedMem_blockTiling_MatrixMult Shared Memory 1-D Block Tiling Matrix Multiplication Use of 1-D tiling to increase ratio of loads per FLOP
5 [5_sharedMem_2DblockTiling_MatrixMult copy](challanges/5_sharedMem_2DblockTiling_MatrixMult copy/) Shared Memory 2-D Block Tiling Matrix Multiplication Use of 2-D tiling to increase ratio of loads per FLOP
... ... ... ...

Progress: Day 5 / 100 (5%)

About

A deep dive of creating a CUDA kernel per day.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •