Skip to content

rootkiller6788/KernelLab

Repository files navigation

KernelLab

A tiny, hard CUDA kernel laboratory for AI hot-path operators.

KernelLab provides hand-written CUDA C++ kernels for the most critical operators in LLM inference and training. All kernels expose a pure C ABI, making them callable from Rust, Python, C, or any language with FFI support.

Power

Operator execution authority — whoever controls kernel selection, memory access patterns, and numerical precision.

First-Edition Kernels

Kernel Description
rmsnorm Root-mean-square normalization (LLM default)
rope Rotary position embedding
softmax Online safe softmax with optional mask
silu SiLU / SwiGLU activation
vec_add Element-wise vector addition (residual)
kv_copy KV cache write
quant_dequant INT8/FP8 quantization utilities

C ABI

int ak_rmsnorm_f16(void* out, const void* x, const void* weight,
                   int B, int T, int D, float eps, void* stream);
int ak_rope_f16(void* out, const void* x, const void* cos, const void* sin,
                int B, int T, int D, void* stream);
int ak_softmax_f16(void* out, const void* x, const void* mask,
                   int B, int H, int T, int D, void* stream);
int ak_silu_f16(void* out, const void* x, int N, void* stream);
int ak_kv_copy_f16(void* out, const void* x,
                    int B, int H, int T, int D, void* stream);

Build

mkdir build && cd build
cmake .. -G Ninja
ninja

Test

cd build && ctest

Tech Stack

Layer Choice
Language C11 + CUDA C++
Build CMake + Ninja
Test CTest + cuda-memcheck
Benchmark CUDA events

Project Structure

apeinx-kernels/
├── include/akernel.h        # Public C ABI
├── src/                     # CPU reference implementations
├── cuda/                    # CUDA GPU kernel implementations
├── bench/                   # Micro-benchmarks
├── tests/                   # Correctness tests (CPU vs CUDA)
└── CMakeLists.txt

Relationship to Other Apeinx Projects

KernelLab (C ABI / libakernel.so)
    ↑
ApeinxRT-Core (Rust, calls via FFI)
    ↑
Apeinx-IR (compiles .apxir → plan.json consumed by RT-Core)
    ↑
ApexTrain-Core (consumes trace.jsonl from RT-Core)

License

TBD

About

Hand-crafted CUDA kernels for LLM hot paths.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors