-
NVIDIA
- New York
Stars
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
A Datacenter Scale Distributed Inference Serving Framework
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
A high-throughput and memory-efficient inference and serving engine for LLMs
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
stb single-file public domain libraries for C/C++
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Simple, safe way to store and distribute tensors
Fast and memory-efficient exact attention
Transformer related optimization, including BERT, GPT
Portable, simple and extensible C++ logging library
Pytorch domain library for recommendation systems
precision colorscheme for the vim text editor
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
ctrlpvim / ctrlp.vim
Forked from kien/ctrlp.vimActive fork of kien/ctrlp.vim—Fuzzy file, buffer, mru, tag, etc finder.
Chained completion that works the way you want!
garbas / vim-snipmate
Forked from msanders/snipmate.vimsnipMate.vim aims to be a concise vim script that implements some of TextMate's snippets features in Vim.
Vundle, the plug-in manager for Vim
Check syntax in Vim/Neovim asynchronously and fix files, with Language Server Protocol (LSP) support
Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
Seamless operability between C++11 and Python
CUDA Templates and Python DSLs for High-Performance Linear Algebra
The world’s fastest framework for building websites.
nanobind: tiny and efficient C++/Python bindings
Tools for concurrent programming in Rust
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.



