GitHub - deepsoftworks/rais: Concurrent LLM inference scheduling

A C++ task scheduler for AI inference on Apple Silicon. Prioritizes real-time LLM requests over batch work, overlaps SSD reads with GPU compute, and hot-swaps models without downtime.

Results

Concurrent request scheduling (Llama-3.2-1B-Instruct-4bit, 6 clients):

Metric	Naive FIFO	Rais	Speedup
Interactive TTFT	4,829 ms	1,438 ms	3.4x
Interactive E2E	5,653 ms	2,254 ms	2.5x

Layer-streaming throughput (IO/compute overlapped):

Model	Naive	Rais	Speedup
SmolLM2-135M (257 MB)	157 tok/s	188 tok/s	1.20x
TinyLlama-1.1B (2.1 GB)	15.5 tok/s	17.8 tok/s	1.15x

Quick start

git clone https://github.com/deepsoftworks/rais.git && cd rais
./install.sh
cmake --build build --target priority_example
./build/priority_example

Minimal usage

rais::Scheduler sched;

sched.submit([&] {
    generate(prompt);
}, rais::Lane::Interactive);

Python bindings

WITH_PYTHON=1 ./install.sh
PYTHONPATH=build python3 -c "import rais; print(rais.Scheduler)"

Architecture

Five priority lanes:

Lane	Purpose
`Interactive`	Real-time user requests (< 5ms submit-to-start)
`Background`	Model hot-swap, logging, embeddings
`Bulk`	Batch jobs, eval runs
`GPU`	Metal compute dispatch
`IO`	Dedicated threads for SSD weight reads

Key internals: lock-free MPMC ring + Chase-Lev work-stealing deques, earliest-deadline-first scheduling, starvation promotion, triple-buffered layer streaming, slab allocator (~83ns/alloc).

Integration

Works with MLX/mlx-lm, llama.cpp, and PyTorch. See examples/ for integration patterns:

examples/minimal_submit.cpp -- basic scheduler usage
examples/llama_cpp_integration.cpp -- llama.cpp integration
examples/rais_server.cpp -- server mode

Building

Requires macOS on Apple Silicon (M1+), CMake 3.20+, Xcode CLI tools, Catch2 v3.

brew install catch2
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
ctest --test-dir build --output-on-failure

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
examples		examples
include/rais		include/rais
python		python
shaders		shaders
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
rais.png		rais.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Results

Quick start

Minimal usage

Python bindings

Architecture

Integration

Building

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Results

Quick start

Minimal usage

Python bindings

Architecture

Integration

Building

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages