Skip to content

ChanCheeKean/DecodeAI

Repository files navigation

DecodeAI

Decode AI from first principles. No black boxes. No hand-waving.

This repository is built on a simple belief: you cannot truly master AI by calling model.fit(). To understand how modern AI systems actually work, you need to build them from scratch — derive the math, implement the algorithms, and watch the gradients flow.

Every notebook in this repository dissects a core AI concept by implementing it from the ground up using raw PyTorch and NumPy. We go from the bias-variance tradeoff all the way to building GPT, LLaMA, DeepSeek, and GRPO — the same algorithm behind DeepSeek-R1. If a concept matters, we don't just explain it. We build it, break it, and rebuild it until the intuition is earned.

"What I cannot create, I do not understand." — Richard Feynman


Table of Contents

01 - Machine Learning

# Notebook Description
01 Data Processing Bias-variance tradeoff, feature scaling, data splitting, and preprocessing pipelines
02 Regression Linear and polynomial regression — cost functions, gradient descent, and regularization
03 Classification Logistic regression, SVMs, decision trees, random forests, and ensemble methods
04 Clustering K-Means, DBSCAN, hierarchical clustering — algorithms, objective functions, and evaluation
05 Dimension Reduction PCA derivation, eigenvalue decomposition, and t-SNE for high-dimensional data

02 - Deep Learning Foundation

# Notebook Description
01 Neural Network Foundations NumPy vectorization, broadcasting, forward/backward pass from scratch
02 Activation Functions Sigmoid, tanh, ReLU, GELU — why activations matter, saturation, and dying neurons
03 Weight Initialization Why zero init fails, variance explosion/vanishing, Xavier and He initialization proofs
04 Normalization Batch norm, layer norm, group norm — internal covariate shift and loss landscape smoothing
05 Regularization L2 weight decay, dropout, early stopping — fighting overfitting with math
06 Residual Connection The degradation problem, skip connections, and why deeper networks can fail without them
07 Loss Function BCE, cross-entropy derivations — why sigmoid+BCE and softmax+CE produce clean gradients
08 Optimizer SGD, momentum, RMSProp, Adam — from vanilla gradient descent to adaptive learning rates
09 Model Classification End-to-end image classification on CIFAR-10 applying all the foundations above

03 - Large Language Model

RNN

# Notebook Description
01 Vanilla RNN RNN cell from scratch — hidden states, BPTT, vanishing/exploding gradients
02 Recurrent Classifier Sentiment classification on IMDb using RNN/LSTM with padding and packing
03 RNN with Attention Seq2seq bottleneck problem, Bahdanau attention for date format translation

Transformer Models

# Notebook Description
A01 Pretrained Model - HuggingFace Using HuggingFace pipelines and pretrained models for text classification
A02 Attention Mechanism Bahdanau vs Luong attention — the information bottleneck and its solution
A03 Transformer Full transformer architecture from scratch — multi-head attention, positional encoding, encoder-decoder
B01 BERT Bidirectional encoder — WordPiece tokenization, MLM, NSP, and the fine-tuning paradigm
B02 ColBERT Late interaction retrieval — MaxSim scoring, query augmentation, token-level matching
C01 nanoGPT GPT-2 from scratch — byte-level BPE tokenization, causal self-attention, autoregressive decoding
C02 LLaMA LLaMA architecture deep dive — RMSNorm, RoPE, SwiGLU, grouped-query attention
C03 Mistral MoE Mixture of Experts — sparse routing, expert parallelism, sliding window attention
C04 DeepSeek Multi-head Latent Attention (MLA) — 24x KV-cache reduction via low-rank compression
C05 Qwen Advanced RoPE scaling — Position Interpolation, NTK-Aware, Dynamic NTK, YaRN

Text Retrieval & NLP

# Notebook Description
01 Text Embedding Cosine similarity, dot product, L2 distance — similarity metrics and embedding spaces
02 HNSW Approximate nearest neighbors — HNSW, Product Quantization, IVF for vector search
03 Topic Modeling Discovering latent topics from text corpora
04 NER Named Entity Recognition — BIO tagging, CoNLL-2003, token classification
05 RAG Retrieval-Augmented Generation — chunking strategies, vector stores, retrieval pipeline
06 Advanced RAG Advanced retrieval techniques — re-ranking, hybrid search, query transformation

Post-Training Alignment

# Notebook Description
01 Instruction Tuning Fine-tuning Pythia-2.8B on Dolly 15k — prompt formatting, loss masking on response tokens
02 SFT Supervised Fine-Tuning — the first step after pretraining in the LLM pipeline
03 Reward Model ORM vs PRM — Bradley-Terry loss, step-level credit assignment for reasoning
04 DPO vs ORPO and SimPO Direct Preference Optimization — aligning LLMs with human preferences without RL
05 GRPO with RLVR Group Relative Policy Optimization — the algorithm behind DeepSeek-R1, with verifiable rewards
06 PEFT (LoRA / QLoRA) LoRA and QLoRA from scratch — low-rank adaptation, 4-bit quantization, 99.6% fewer parameters
07 Abliteration Mechanistic interpretability — finding and removing the refusal direction in activation space

Model Compression

# Notebook Description
01 Distillation Knowledge distillation — response-level SFT, logit-level KD, rejection sampling
02 Model Pruning Unstructured and structured pruning — magnitude-based, layer-wise, and global strategies
03 Quantization FP32 to INT4 — numeric formats, quantization schemes, memory-accuracy tradeoffs

Agentic LLM

# Notebook Description
A01 LLM Prompting System prompt design patterns — persona, task-specific, guard-rails, few-shot, chain-of-thought
A02 LangChain LangChain fundamentals — data loaders, splitters, vectorstores, embeddings, retrieval chains
A03 Agent Harness The agent loop primitive — tool calling, finish reasons, state management from scratch
A04 Agent Gateway Intelligence layers — BASE, IDENTITY, SOUL, MEMORY, SKILLS, TOOLS, CONTEXT, HEARTBEAT
A05 Agent Operation Production observability — logs, metrics, traces, cost attribution, latency profiling
A06 Self Learning Loop Reflexion and verbal gradients — self-critique, reflection injection, iterative improvement
B01 LangGraph Agent Cyclic state graphs with LangGraph — conditional edges, tool routing, RAG agents, LangSmith tracing
B02 Claude Code Reverse-engineering Claude Code's agent loop — stop reasons, tool execution, harness internals

Production & Inference

# Notebook Description
01 Generation Text generation from scratch — KV-cache, sampling strategies, batched/continuous batching, speculative decoding

LLM Evaluation

# Notebook Description
01 LLM Evaluation Perplexity, intrinsic evaluation metrics — measuring how well a model predicts text

04 - Computer Vision

# Notebook Description
01 CNN Foundations Convolutions from scratch in NumPy — zero-padding, forward/backward pass, pooling, and gradient derivations
02 CNN Architecture Baseline CNN to ResNet on FashionMNIST — vanishing gradients and why skip connections work
03 Transfer Learning Feature extraction vs fine-tuning on CIFAR-10 — ImageNet normalization, layer freezing strategies
04 Object Detection IoU, NMS, anchor box assignment, and YOLO output decoding — built from scratch
05 Image Segmentation U-Net encoder/decoder from scratch — skip connections, pixel-wise loss, SegFormer inference
06 Metric Learning Siamese networks, contrastive loss, triplet loss, and FaceNet — embedding spaces for unseen classes
07 Vision Transformers ViT, DeiT, and Swin from scratch — patch embedding, multi-head self-attention, hierarchical windows
08 Contrastive Learning SimCLR, CLIP, and DINOv2 — self-supervised pretraining with NT-Xent loss
09 Diffusion Model DDPM, DDIM, Stable Diffusion — forward/reverse process, noise schedules, ContextUNet
10 Model Explainability Saliency maps, GradCAM, Integrated Gradients, and SHAP on ResNet-50

05 - Multi-Modal

# Notebook Description
01 Bridge Architecture Connecting frozen ViT to frozen LLM — LLaVA projectors, Flamingo Perceiver, BLIP-2 Q-Former, MoE bridges
02 Vision Language Model Qwen-VL style VLM from scratch — TinyViT, MLP projector, mRoPE, visual token insertion, Stage 1 training
03 Instruction Tuning Stage 2–3 VLM training — visual instructions, multi-turn dialog, RLHF-V for hallucination reduction
04 Reasoning & Inference VLM inference pipeline — decoding strategies, streaming, chain-of-thought, visual grounding, evaluation
05 Audio & Speech Waveforms to Mel spectrograms from scratch — audio encoders, Whisper, Phi-4 multimodal speech
06 Video Video understanding — spatial-temporal attention, ViViT, dynamic FPS sampling, text-timestamp alignment
07 Visual Agent & Computer Use VLMs that act on GUIs — perception, planning, action loops, computer-use agents
08 Native Multimodal Any-to-any unified token spaces — Chameleon, Transfusion, Emu3, Janus Pro with VQ-VAE tokenizers

Coming Soon

Topic Description
Training Strategy Training data curation, loss functions, distributed training, and GPU programming
Model Serving vLLM, PagedAttention, autoscaling, and production deployment
MLOps Experiment tracking, model versioning, CI/CD for ML, monitoring, and drift detection
LLM Benchmarks MMLU, HumanEval, GSM8K — standardized evaluation and leaderboard methodology
AI Governance Red teaming, toxicity benchmarks, bias evaluation, hallucination detection

More work is coming. This repository is actively maintained and expanding as the field evolves.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors