Skip to content
View Ammar-Alnagar's full-sized avatar
:copilot:
Deciphering the GPU manuscript.....
:copilot:
Deciphering the GPU manuscript.....

Block or report Ammar-Alnagar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Ammar-Alnagar/README.md

Hey, I'm Ammar

AI Systems Engineer

I build systems that run LLMs in the real world—whether that's on a Raspberry Pi, a B200 cluster, or something in between. My focus: making models actually work within real constraints (latency, cost, hardware, privacy).


Projects I'm Working On

Zllm – (closed source for now)CUDA Inference Engine

A custom vLLM fork where I experiment with kernel-level optimizations.

  • Exploring FlashAttention memory patterns and adaptive kernel selection
  • Practical focus: better throughput without sacrificing flexibility
  • Currently used in a few production deployments

Helios-Engine – Rust Agent Framework

A lightweight framework for building reliable LLM agents.

  • Async I/O with Tokio, zero-copy patterns where it matters
  • Built for projects that need control without the Python overhead

MILI – Mojo Inference System

An experimental inference engine written in Mojo.

  • Implementing core kernels (RoPE, RMSNorm, Attention) to learn the language and its performance model
  • Goal: readable code that doesn't sacrifice efficiency

Notes, experiments, and small demos as I dig deeper into GPU programming.

  • From raw CUDA → CuTe → Mojo: documenting the journey
  • Happy if it helps someone else avoid the same dead ends

Let's Connect

If you're working on inference, kernels, or just trying to ship LLMs without burning a cloud budget—say hi. I'm always up for swapping notes.

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣤⡶⠿⠿⠷⣶⣄⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣰⡿⠁⠀⠀⢀⣀⡀⠙⣷⡀⠀⠀⠀
⠀⠀⠀⡀⠀⠀⠀⠀⠀⢠⣿⠁⠀⠀⠀⠘⠿⠃⠀⢸⣿⣿⣿⣿
⠀⣠⡿⠛⢷⣦⡀⠀⠀⠈⣿⡄⠀⠀⠀⠀⠀⠀⠀⣸⣿⣿⣿⠟
⢰⡿⠁⠀⠀⠙⢿⣦⣤⣤⣼⣿⣄⠀⠀⠀⠀⠀⢴⡟⠛⠋⠁⠀
⣿⠇⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉⠁⠀⠀⠀⠀⠀⠈⣿⡀⠀⠀⠀
⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢹⡇⠀⠀⠀
⣿⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⡇⠀⠀⠀
⠸⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⡿⠀⠀⠀⠀
⠀⠹⣷⣤⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣰⡿⠁⠀⠀⠀⠀
⠀⠀⠀⠉⠙⠛⠿⠶⣶⣶⣶⣶⣶⠶⠿⠟⠛⠉⠀⠀⠀⠀⠀⠀

Pinned Loading

  1. Helios-Engine Helios-Engine Public

    Helios Engine is a powerful and flexible Rust framework for building LLM-powered agents with tool support, chat capabilities, and easy configuration management. Create intelligent agents that can i…

    Rust 42 5

  2. YAIE YAIE Public

    YAIE (Yet Another Inference Engine) is an educational project designed to help students and developers understand how modern LLM inference engines work. This implementation is inspired by state-of-…

    Python

  3. VisoLearn VisoLearn Public

    VisoLearn-2 is an AI-powered educational platform designed specifically for children with Autism Spectrum Disorder (ASD). Our mission is to leverage cutting-edge artificial intelligence to create p…

    Python

  4. MILI MILI Public

    A comprehensive, hands-on guide to building a high-performance LLM inference system in Mojo and Python.

    Mojo

  5. Axion Axion Public

    Axion is a high-performance LLM serving platform built with Rust that provides OpenAI-compatible APIs for chat completions, embeddings, and reranking. Designed for production environments, Axion de…

    Rust 1

  6. cronnx cronnx Public

    Cronnx is a high-performance, asynchronous Machine Learning inference server built in Rust. It demonstrates how to take a raw ONNX model and serve it via a robust HTTP API with features like dynami…

    Rust