Skip to content
View zhixuan-lin's full-sized avatar

Block or report zhixuan-lin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AI agents running research on single-GPU nanochat training automatically

Python 43,080 5,966 Updated Mar 16, 2026

The HELMET Benchmark

Jupyter Notebook 204 39 Updated Feb 26, 2026

A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs dir…

TypeScript 24,282 6,984 Updated Mar 19, 2026

Solutions of Reinforcement Learning, An Introduction

Jupyter Notebook 2,397 514 Updated Jul 10, 2025

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 253 14 Updated Feb 13, 2026

LongAttn :Selecting Long-context Training Data via Token-level Attention

Python 15 2 Updated Jul 16, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,750 504 Updated Mar 13, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,976 126 Updated Mar 18, 2026

Flash Attention in 300-500 lines of CUDA/C++

Cuda 36 2 Updated Aug 22, 2025

Planning with unified multimodal models

Python 10 1 Updated Dec 11, 2025

The best ChatGPT that $100 can buy.

Python 49,497 6,486 Updated Mar 17, 2026

JAX bindings for Flash Attention v2

C++ 103 9 Updated Feb 28, 2026

Minimal yet performant LLM examples in pure JAX

Python 245 32 Updated Jan 14, 2026

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,698 192 Updated Oct 2, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 454 17 Updated Mar 17, 2026

A minimal evaluation framework for FLA models

7 Updated Aug 3, 2025

Code release for paper "Test-Time Training Done Right"

Python 413 24 Updated Jan 5, 2026

An efficient implementation of the NSA (Native Sparse Attention) kernel

Python 132 5 Updated Jun 24, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 332 75 Updated Mar 19, 2026

A Quirky Assortment of CuTe Kernels

Python 861 96 Updated Mar 18, 2026

Nano vLLM

Python 12,319 1,748 Updated Nov 3, 2025

[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)

Python 159 5 Updated Jul 8, 2025

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 899 54 Updated Dec 20, 2025

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 677 38 Updated Mar 16, 2026

Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"

Python 110 9 Updated Oct 11, 2025

Awesome LLM pre-training resources, including data, frameworks, and methods.

2 Updated Apr 25, 2025

Awesome LLM pre-training resources, including data, frameworks, and methods.

342 22 Updated Apr 29, 2025
Next