14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
-
Updated
Apr 1, 2026 - Python
14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
[ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
ultra-lightweight, mathematically robust prompt compression middleware
[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
This repo integrates DyCoke's token compression method with VLMs such as Gemma3 and InternVL3
[ICLR 2026] Official code of PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models.
Official implementation of TCSVT 2025 paper: DiViCo: Disentangled Visual Token Compression For Efficient Large Vision-Language Model
[ACL' 26 Findings] HiPrune, a training-free visual token pruning method for VLM acceleration.
Developer-first AI compression layer for reducing LLM verbosity, improving readability, and lowering token costs.
Smart token compression MCP server for LLM apps. Gemma 4 local compression, multi-model cost tracking, intelligent model routing. Save 70-90% on API costs.
Your Smithers (butler) for MCP token efficiency. Compression proxy that saves tokens on tool results, tool schemas, and server instructions — sits between any AI coding tool (Claude Code, Cursor, Codex, Windsurf, Cline) and your MCP servers. Up to 60% savings depending on content. Zero config. Works with any MCP server.
Are you using a caveman wrapper to reduce token output on your LLM? Translate your output back to normal English using a local LLM!
tok is an invisible bridge
Automate content research, card news, images, voice, and video from one prompt with an end-to-end Claude Code content pipeline
Add a description, image, and links to the token-compression topic page so that developers can more easily learn about it.
To associate your repository with the token-compression topic, visit your repo's landing page and select "manage topics."