Ammar Ammar-Alnagar

Hey, I'm Ammar

AI Systems Engineer

I build systems that run LLMs in the real world—whether that's on a Raspberry Pi, a B200 cluster, or something in between. My focus: making models actually work within real constraints (latency, cost, hardware, privacy).

Projects I'm Working On

Zllm – (closed source for now)CUDA Inference Engine

A custom vLLM fork where I experiment with kernel-level optimizations.

Exploring FlashAttention memory patterns and adaptive kernel selection
Practical focus: better throughput without sacrificing flexibility
Currently used in a few production deployments

Helios-Engine – Rust Agent Framework

A lightweight framework for building reliable LLM agents.

Async I/O with Tokio, zero-copy patterns where it matters
Built for projects that need control without the Python overhead

MILI – Mojo Inference System

An experimental inference engine written in Mojo.

Implementing core kernels (RoPE, RMSNorm, Attention) to learn the language and its performance model
Goal: readable code that doesn't sacrifice efficiency

AI-Kernel-Learning

Notes, experiments, and small demos as I dig deeper into GPU programming.

From raw CUDA → CuTe → Mojo: documenting the journey
Happy if it helps someone else avoid the same dead ends

Let's Connect

If you're working on inference, kernels, or just trying to ship LLMs without burning a cloud budget—say hi. I'm always up for swapping notes.

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣤⡶⠿⠿⠷⣶⣄⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣰⡿⠁⠀⠀⢀⣀⡀⠙⣷⡀⠀⠀⠀
⠀⠀⠀⡀⠀⠀⠀⠀⠀⢠⣿⠁⠀⠀⠀⠘⠿⠃⠀⢸⣿⣿⣿⣿
⠀⣠⡿⠛⢷⣦⡀⠀⠀⠈⣿⡄⠀⠀⠀⠀⠀⠀⠀⣸⣿⣿⣿⠟
⢰⡿⠁⠀⠀⠙⢿⣦⣤⣤⣼⣿⣄⠀⠀⠀⠀⠀⢴⡟⠛⠋⠁⠀
⣿⠇⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠁⠀⠀⠀⠀⠀⠈⣿⡀⠀⠀⠀
⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⡇⠀⠀⠀
⣿⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⡇⠀⠀⠀
⠸⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⡿⠀⠀⠀⠀
⠀⠹⣷⣤⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣰⡿⠁⠀⠀⠀⠀
⠀⠀⠀⠉⠙⠛⠿⠶⣶⣶⣶⣶⣶⠶⠿⠟⠛⠉⠀⠀⠀⠀⠀⠀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ammar Ammar-Alnagar

Achievements

Achievements

Highlights

Organizations

Block or report Ammar-Alnagar