token-compression

Here are 17 public repositories matching this topic...

open-compress / claw-compactor

14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.

Updated Apr 1, 2026
Python

HumanMLLM / LLaVA-Scissor

Star

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

video-understanding connected-components video-language-understanding mllm multimodal-large-language-models token-compression

Updated Jul 1, 2025
Python

Fanziyang-v / FlashVID

Star

[ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging

efficiency multimodal video-llms token-compression flashvid

Updated Apr 30, 2026
Python

HVision-NKU / GlimpsePrune

Star

Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"

inference-efficiency lvlms mllms visual-token-pruning token-compression

Updated Feb 13, 2026
Python

YiwengXie / FluxMem

Star

[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding

streaming-video video-understanding large-multimodal-models token-compression

Updated Mar 16, 2026
Python

overseek944 / twotrim

Star

ultra-lightweight, mathematically robust prompt compression middleware

ai compression-algorithm token-compression ai-cost-optimization

Updated Apr 13, 2026
Python

JinXins / MergeMix

Star

[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding

image-classification data-augmentation preference-learning mixup multimodal ranking-loss mmcv llava token-merging token-compression iclr2026

Updated Feb 27, 2026
Python

mvish7 / dycoke_token_compression

Star

This repo integrates DyCoke's token compression method with VLMs such as Gemma3 and InternVL3

inference-optimization vlms video-large-language-models token-compression

Updated Nov 11, 2025
Python

MouxiaoHuang / PPE

Star

[ICLR 2026] Official code of PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models.

multimodal positional-encoding large-language-models vision-language-model token-merging token-compression iclr2026 token-clustering

Updated Mar 16, 2026
Python

pzrain / DiViCo

Star

Official implementation of TCSVT 2025 paper: DiViCo: Disentangled Visual Token Compression For Efficient Large Vision-Language Model

multimodal large-vision-language-model token-compression

Updated May 13, 2025
Python

lijun2005 / ACL26Findings-HiPrune

Star

[ACL' 26 Findings] HiPrune, a training-free visual token pruning method for VLM acceleration.

vision-language-model training-free-acceleration token-compression

Updated Apr 7, 2026
Python

rvtechclub-alt / TokenCut

Star

Developer-first AI compression layer for reducing LLM verbosity, improving readability, and lowering token costs.

ai developer-tools codex text-compression ai-agents claude llm prompt-engineering chatgpt ai-productivity token-compression meme-ai

Updated Apr 25, 2026
Python

woling-dev / promptthrift-mcp

Star

Smart token compression MCP server for LLM apps. Gemma 4 local compression, multi-model cost tracking, intelligent model routing. Save 70-90% on API costs.

mcp claude cost-optimization ai-tools llm chatgpt ollama mcp-server token-compression gemma4

Updated Apr 11, 2026
Python

Your Smithers (butler) for MCP token efficiency. Compression proxy that saves tokens on tool results, tool schemas, and server instructions — sits between any AI coding tool (Claude Code, Cursor, Codex, Windsurf, Cline) and your MCP servers. Up to 60% savings depending on content. Zero config. Works with any MCP server.

mcp cursor context-window ai-coding-tools token-compression claude-code mcp-proxy