🎬 Streaming Video Understanding Papers

📚 A Curated Collection of Research Papers on Streaming, Online, and Real-time Video Understanding with LMMs

Focus Areas: Streaming Perception • Proactive QA • Real-time Memory • KV Compression • Token-efficient Long Video Modeling

📖 Table of Contents

🚀 Streaming Video LLMs
⚡ Token & KV Compression
🗣️ Proactive QA Systems
📊 Benchmarks & Datasets

🚀 Streaming Video LLMs

🏆 Model	📅 Year	📄 Paper	💻 Code	🌟 Highlights
Streamo	2025	📄 arXiv	-	Latest streaming model
StreamingVLM	2025	📄 arXiv	-	Advanced streaming architecture
Flash-VStream	2024	📄 arXiv	-	Flash attention for streaming
StreamForest	2025	📄 arXiv	-	Hierarchical streaming
LiveVLM	2025	📄 arXiv	-	Real-time vision-language
VideoChat-Online	2024	📄 arXiv	-	Conversational streaming
EyeWO	2024	-	💻 GitHub	Eyes Wide Open framework
StreamChat	2025	📄 arXiv	-	Interactive streaming chat
StreamBridge	2025	📄 arXiv	-	Bridging streaming gaps
VITA 1.5	2025	-	💻 GitHub	Multimodal streaming
CogStream	2025	📄 arXiv	-	Cognitive streaming

⚡ Token & KV Compression

🔥 KV Cache Compression

🏆 Method	📅 Year	📄 Paper	💻 Code	🎯 Core Technique
ReKV	2025	📄 arXiv	-	Recursive KV caching
StreamKV	2025	📄 arXiv	-	Streaming KV management
InfiniPot-V	2025	📄 arXiv	-	Infinite potential vision
StreamMem	2025	📄 arXiv	-	Streaming memory system
InfiniteVL	2025	-	💻 GitHub	Infinite vision-language

🎨 Token / Visual Compression

🏆 Method	📅 Year	📄 Paper	💻 Code	🎯 Core Technique
TimeChat-Online (DTD)	2025	📄 arXiv	-	Differential Token Drop
VideoLLM-MoD	2024	📄 arXiv	-	Mixture of Depths
StreamingTOM	2025	📄 arXiv	-	Token-level optimization
STC	2025	📄 arXiv	-	Spatial-temporal compression

🗣️ Proactive QA Systems

🤖 Online / Real-Time / Proactive Output

🏆 System	📅 Year	📄 Paper	💻 Code	🎯 Innovation
VideoLLM-online	2024	📄 arXiv	-	First online VideoLLM
MMDuet	2024	📄 arXiv	-	Dual-mode interaction
MMDuet 2	2025	📄 OpenReview	-	Enhanced dual-mode
Dispider	2025	📄 arXiv	-	Distributed processing
StreamMind	2025	📄 arXiv	-	Cognitive streaming
TimeChat-Online	2025	📄 arXiv	-	Temporal understanding
LiveStar	2025	📄 arXiv	-	Live streaming star
StreamVLN	2025	📄 arXiv	-	Vision-language navigation
LION-FS	2025	📄 arXiv	-	Few-shot learning
ROMA	2025	📄 arXiv	-	Omni-Multimodal Assistant

📊 Benchmarks & Datasets

🎯 Streaming / Online Video Understanding Benchmarks

🏆 Benchmark	📅 Year	📄 Paper	💻 Code	🎯 Focus Area
OVO-Bench	2025	-	💻 GitHub	Online video understanding
StreamingBench	2025	📄 arXiv	-	Comprehensive streaming eval
OmniMMI	2025	📄 arXiv	-	Omni-modal interaction
RTV-Bench	2025	📄 arXiv	-	Real-time video benchmark
VStream-QA (RVS)	2024	📄 arXiv	-	Video stream QA
StreamBench	2025	-	💻 GitHub	Streaming benchmark
StreamingCoT	2025	📄 arXiv	-	Chain-of-thought streaming
TV-Online	2025	📄 OpenReview	-	TV video understanding
ProactiveVideoQA	2025	📄 arXiv	-	Proactive QA
SVBench	2025	📄 arXiv	-	Streaming video benchmark
OSTBench	2025	-	💻 GitHub	Online streaming tasks
StreamEQA	2025	-	💻 GitHub	Embodied QA streaming

📈 Statistics

Category	Count	Latest Year
🚀 Streaming Video LLMs	11	2025
⚡ KV Cache Compression	5	2025
🎨 Token Compression	4	2025
🗣️ Proactive QA	9	2025
📊 Benchmarks	12	2025
Total Papers	41	2025

🤝 Contributing

We welcome contributions! Please feel free to:

📝 Submit new papers via Pull Request
🐛 Report issues or suggest improvements via Issues
⭐ Star this repository if you find it helpful!

Contribution Guidelines

Ensure the paper is related to streaming/online video understanding
Provide paper link (arXiv, OpenReview, etc.) or code repository
Include a brief description of the core contribution
Follow the existing table format

📧 Contact & Collaboration

Feel free to reach out for collaborations or discussions!

Last Updated: December 2025

Made with ❤️ by the Streaming Video Understanding Community

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Streaming Video Understanding Papers

📚 A Curated Collection of Research Papers on Streaming, Online, and Real-time Video Understanding with LMMs

📖 Table of Contents

🚀 Streaming Video LLMs

⚡ Token & KV Compression

🔥 KV Cache Compression

🎨 Token / Visual Compression

🗣️ Proactive QA Systems

🤖 Online / Real-Time / Proactive Output

📊 Benchmarks & Datasets

🎯 Streaming / Online Video Understanding Benchmarks

📈 Statistics

🤝 Contributing

Contribution Guidelines

📧 Contact & Collaboration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🎬 Streaming Video Understanding Papers

📚 A Curated Collection of Research Papers on Streaming, Online, and Real-time Video Understanding with LMMs

📖 Table of Contents

🚀 Streaming Video LLMs

⚡ Token & KV Compression

🔥 KV Cache Compression

🎨 Token / Visual Compression

🗣️ Proactive QA Systems

🤖 Online / Real-Time / Proactive Output

📊 Benchmarks & Datasets

🎯 Streaming / Online Video Understanding Benchmarks

📈 Statistics

🤝 Contributing

Contribution Guidelines

📧 Contact & Collaboration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages