KernelLab

A tiny, hard CUDA kernel laboratory for AI hot-path operators.

KernelLab provides hand-written CUDA C++ kernels for the most critical operators in LLM inference and training. All kernels expose a pure C ABI, making them callable from Rust, Python, C, or any language with FFI support.

Power

Operator execution authority — whoever controls kernel selection, memory access patterns, and numerical precision.

First-Edition Kernels

Kernel	Description
`rmsnorm`	Root-mean-square normalization (LLM default)
`rope`	Rotary position embedding
`softmax`	Online safe softmax with optional mask
`silu`	SiLU / SwiGLU activation
`vec_add`	Element-wise vector addition (residual)
`kv_copy`	KV cache write
`quant_dequant`	INT8/FP8 quantization utilities

C ABI

int ak_rmsnorm_f16(void* out, const void* x, const void* weight,
                   int B, int T, int D, float eps, void* stream);
int ak_rope_f16(void* out, const void* x, const void* cos, const void* sin,
                int B, int T, int D, void* stream);
int ak_softmax_f16(void* out, const void* x, const void* mask,
                   int B, int H, int T, int D, void* stream);
int ak_silu_f16(void* out, const void* x, int N, void* stream);
int ak_kv_copy_f16(void* out, const void* x,
                    int B, int H, int T, int D, void* stream);

Build

mkdir build && cd build
cmake .. -G Ninja
ninja

Test

cd build && ctest

Tech Stack

Layer	Choice
Language	C11 + CUDA C++
Build	CMake + Ninja
Test	CTest + cuda-memcheck
Benchmark	CUDA events

Project Structure

apeinx-kernels/
├── include/akernel.h        # Public C ABI
├── src/                     # CPU reference implementations
├── cuda/                    # CUDA GPU kernel implementations
├── bench/                   # Micro-benchmarks
├── tests/                   # Correctness tests (CPU vs CUDA)
└── CMakeLists.txt

Relationship to Other Apeinx Projects

KernelLab (C ABI / libakernel.so)
    ↑
ApeinxRT-Core (Rust, calls via FFI)
    ↑
Apeinx-IR (compiles .apxir → plan.json consumed by RT-Core)
    ↑
ApexTrain-Core (consumes trace.jsonl from RT-Core)

License

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bench		bench
cuda		cuda
include		include
python		python
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
GPU_TEST.md		GPU_TEST.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
README-CN.md		README-CN.md
README.md		README.md
TECH_STACK.md		TECH_STACK.md
gpu_test.sh		gpu_test.sh
quick_gpu_check.sh		quick_gpu_check.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KernelLab

Power

First-Edition Kernels

C ABI

Build

Test

Tech Stack

Project Structure

Relationship to Other Apeinx Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KernelLab

Power

First-Edition Kernels

C ABI

Build

Test

Tech Stack

Project Structure

Relationship to Other Apeinx Projects

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages