quantization

Here are 45 public repositories matching this topic...

ImageOptim / libimagequant

Palette quantization library that powers pngquant and other PNG optimizers

palette quality visual-studio conversion callback minification image-optimization quantization rgba-pixels palette-generation pixel-array image-pixels pngquant

Updated Feb 11, 2026
Rust

Kaden-Schutt / hipfire

Star

RDNA-native LLM inference engine in Rust.

rust machine-learning hip gpu-computing rdna quantization rocm amd-gpu llm-inference

Updated May 25, 2026
Rust

Epistates / pmetal

Star

PMetal: high-performance Apple Silicon framework for local LLM inference, LoRA/QLoRA fine-tuning, serving, quantization, and MLX/Metal acceleration.

Updated May 8, 2026
Rust

AbdelStark / turboquant

Star

Rust implementation of Google's TurboQuant algorithm for vector quantization

machine-learning-algorithms quantization llms

Updated Mar 25, 2026
Rust

Geekgineer / needle-rs

Star

258 KB WASM runtime for Needle a 26M-parameter tool-calling transformer. Runs in browser, Cloudflare Workers, and Node.js. No backend required.

Updated May 20, 2026
Rust

DarthSim / quantizr

Star

Fast library for converting RGBA images to 8-bit palette images. Written in Rust; can be used in C programs

rust palette image quantization

Updated Apr 18, 2025
Rust

AutomataNexus / AxonML

Sponsor

Star

PyTorch-equivalent ML framework in pure Rust — 22 crates, CUDA GPU, biometrics, IR detection, LLMs, ONNX, distributed training

Updated May 24, 2026
Rust

Apple Neural Engine (ANE) LLM inference engine — reverse-engineered private APIs, Metal GPU shaders, hybrid ANE+GPU+CPU on Apple Silicon. 32 tok/s matching llama.cpp, 3.6 TFLOPS fused ANE mega-kernels.

macos rust reverse-engineering quantization ane npu edge-ai deltanet on-device-ai neural-engine apple-silicon apple-neural-engine llm-inference qwen gguf metal-gpu

Updated Mar 5, 2026
Rust

z2oh / chromatic_confinement

Star

Rust implementation of k-d tree to efficiently perform color quantization to predefined sets

rust algorithm quantization kdtree

Updated Feb 14, 2018
Rust

kemingy / rabitq

Sponsor

Star

rabitq rust implementation

quantization vector-search

Updated May 14, 2026
Rust

konjoai / vectro

Star

⚡ Vectro: Lightning-fast embedding quantization. Hit 12M+ vec/s throughput (4.85× faster than FAISS C++) while drastically cutting memory footprint for vector databases and local AI research.

Updated May 25, 2026
Rust

joshuagamboa / turboquant-apple-silicon

Star

High-performance Rust integration for aggressive KV cache quantization on Apple Silicon GPUs (Metal). Features a multi-turn TUI, smart context windowing, and full LLM observability.

macos rust machine-learning metal inference tui quantization kv-cache apple-silicon llm llama-cpp

Updated Apr 1, 2026
Rust

AIdevsmartdata / chimere

Star

Rust-native MoE inference runtime with custom CUDA kernels for Blackwell GPUs. Includes DFlash speculative decoding, multi-tier Engram memory, and entropy-adaptive routing. Targets Qwen3.5-35B-A3B on a single RTX 5060 Ti 16GB.

rust ffi cuda inference moe quantization mamba state-space-models deltanet blackwell engram llm qwen speculative-decoding sm120 mamba2 nemotron-h hybrid-ssm

Updated Apr 25, 2026
Rust

dejwi / image-quantization

Star

Image Color quantization with 3D visuals

rust raylib quantization 3d

Updated Feb 9, 2025
Rust

SaschaOnTour / turboquant

Star

Rust KV-cache compression for LLM inference. Implements TurboQuant (Zandieh et al., ICLR 2026) plus PQO — our variant that drops QJL, adds a fused CUDA kernel, and shrinks the cache to ~20% of FP16 (49% total VRAM at 32K). mistral.rs integration.

rust compression quantization memory-optimization kv-cache cuda-kernel llm llm-inference mistral-rs

Updated Apr 20, 2026
Rust

JohnClaw / chatllm.rs

Star

rust api wrapper for llm-inference chatllm.cpp

rust chatbot inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 27, 2024
Rust

snuk182 / mcq

Star

Simple Median Cut Quantization library

rust color quantization median

Updated Nov 17, 2016
Rust

TheRadDani / VectorPrime

Star

VectorPrime takes a model file and your hardware, then finds the fastest way to run it. It profiles your CPU, GPU, and RAM

quantization llm llm-inference

Updated Mar 16, 2026
Rust

brangi / blitzed

Star

Train tiny neural networks, quantize to INT8, generate C code, flash to ESP32. No TFLite runtime — just compiled inference kernels. 17us per inference on real hardware.

python rust iot machine-learning microcontrollers esp32 embedded-systems neural-networks quantization edge-ai tinyml model-optimization

Updated Apr 7, 2026
Rust

fajarkraton / fajar-lang

Star

Fajar Lang (fj) — Systems programming language for embedded ML & OS development. Compiler-enforced safety with @kernel/@device/@safe contexts. Rust-based compiler with Cranelift/LLVM backends. Made in Indonesia.

programming-language rust machine-learning compiler x86-64 llvm cuda wasm indonesia tensor arm64 quantization bare-metal risc-v systems-programming os-development embedded-ml

Updated May 25, 2026
Rust

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization

Here are 45 public repositories matching this topic...

ImageOptim / libimagequant

Kaden-Schutt / hipfire

Epistates / pmetal

AbdelStark / turboquant

Geekgineer / needle-rs

DarthSim / quantizr

AutomataNexus / AxonML

thebasedcapital / ane-infer

z2oh / chromatic_confinement

kemingy / rabitq

konjoai / vectro

joshuagamboa / turboquant-apple-silicon

AIdevsmartdata / chimere

dejwi / image-quantization

SaschaOnTour / turboquant

JohnClaw / chatllm.rs

snuk182 / mcq

TheRadDani / VectorPrime

brangi / blitzed

fajarkraton / fajar-lang

Improve this page

Add this topic to your repo