FlexTensor is a tensor offloading and management library for PyTorch that enables running large models on limited GPU memory by intelligently offloading tensors between GPU and CPU memory.

Python 95 11 Updated Apr 19, 2026

R6410418 / Jackrong-llm-finetuning-guide

Jupyter Notebook 970 174 Updated Apr 23, 2026

WeianMao / triattention

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

Python 639 53 Updated Apr 23, 2026

NVIDIA-NeMo / DataDesigner

🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.

Python 1,679 148 Updated Apr 23, 2026

DevTechJr / turboquant_cutile

turboquant-based compression engine for LLM KV cache

Python 57 8 Updated Apr 3, 2026

math-ai-org / mathcode

MathCode: A Frontier Mathematical Coding Agent

Python 475 48 Updated Apr 12, 2026

FujitsuResearch / OneCompression

Python package for LLM compression

Python 319 10 Updated Apr 23, 2026

Rangizingo / cc-cache-fix

Python 559 190 Updated Apr 1, 2026

mit-han-lab / fouroversix

Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”

Python 171 17 Updated Apr 21, 2026

mixlayer / nccl-skew-analyzer

Inspects nsys dumps and measures NCCL collective launch skew

Rust 2 Updated Dec 12, 2025

dmlc / dlpack

common in-memory tensor structure

C++ 1,201 160 Updated Jan 26, 2026

yao-jz / intra-kernel-profiler

Region-level profiling for CUDA kernels with trace, NVBit, CUPTI, NSys, and an interactive Explorer.

Python 112 11 Updated Apr 17, 2026

anthropics / buffa

Rust implementation of protobuf with editions support, JSON serialization, and zero-copy views

Rust 663 32 Updated Apr 23, 2026

zhuhanqing / APOLLO

APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention

Python 345 18 Updated Nov 29, 2025

chroma-core / context-1-data-gen

Python 407 42 Updated Apr 1, 2026

TheTom / turboquant_plus

Python 6,485 872 Updated Apr 21, 2026

Percepta-Core / transformer-vm

Compile programs directly into transformer weights. Includes a 2D convex-hull KV cache with O(log n) inference.

Python 187 35 Updated Mar 25, 2026

getcompanion-ai / feynman

TypeScript 5,797 720 Updated Apr 19, 2026

chuyishang / jax-lm

Repository for the blog post JAX-LM: Language Modeling and Distributed Training in JAX

Python 8 1 Updated Mar 24, 2026

duoan / TorchCode

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 3,545 295 Updated Mar 27, 2026