π¬ Streaming Video Understanding Papers
π A Curated Collection of Research Papers on Streaming, Online, and Real-time Video Understanding with LMMs
Focus Areas: Streaming Perception β’ Proactive QA β’ Real-time Memory β’ KV Compression β’ Token-efficient Long Video Modeling
π Streaming Video LLMs
π Model
π
Year
π Paper
π» Code
π Highlights
Streamo
2025
π arXiv
-
Latest streaming model
StreamingVLM
2025
π arXiv
-
Advanced streaming architecture
Flash-VStream
2024
π arXiv
-
Flash attention for streaming
StreamForest
2025
π arXiv
-
Hierarchical streaming
LiveVLM
2025
π arXiv
-
Real-time vision-language
VideoChat-Online
2024
π arXiv
-
Conversational streaming
EyeWO
2024
-
π» GitHub
Eyes Wide Open framework
StreamChat
2025
π arXiv
-
Interactive streaming chat
StreamBridge
2025
π arXiv
-
Bridging streaming gaps
VITA 1.5
2025
-
π» GitHub
Multimodal streaming
CogStream
2025
π arXiv
-
Cognitive streaming
β‘ Token & KV Compression
π₯ KV Cache Compression
π Method
π
Year
π Paper
π» Code
π― Core Technique
ReKV
2025
π arXiv
-
Recursive KV caching
StreamKV
2025
π arXiv
-
Streaming KV management
InfiniPot-V
2025
π arXiv
-
Infinite potential vision
StreamMem
2025
π arXiv
-
Streaming memory system
InfiniteVL
2025
-
π» GitHub
Infinite vision-language
π¨ Token / Visual Compression
π Method
π
Year
π Paper
π» Code
π― Core Technique
TimeChat-Online (DTD)
2025
π arXiv
-
Differential Token Drop
VideoLLM-MoD
2024
π arXiv
-
Mixture of Depths
StreamingTOM
2025
π arXiv
-
Token-level optimization
STC
2025
π arXiv
-
Spatial-temporal compression
π£οΈ Proactive QA Systems
π€ Online / Real-Time / Proactive Output
π System
π
Year
π Paper
π» Code
π― Innovation
VideoLLM-online
2024
π arXiv
-
First online VideoLLM
MMDuet
2024
π arXiv
-
Dual-mode interaction
MMDuet 2
2025
π OpenReview
-
Enhanced dual-mode
Dispider
2025
π arXiv
-
Distributed processing
StreamMind
2025
π arXiv
-
Cognitive streaming
TimeChat-Online
2025
π arXiv
-
Temporal understanding
LiveStar
2025
π arXiv
-
Live streaming star
StreamVLN
2025
π arXiv
-
Vision-language navigation
LION-FS
2025
π arXiv
-
Few-shot learning
ROMA
2025
π arXiv
-
Omni-Multimodal Assistant
π Benchmarks & Datasets
π― Streaming / Online Video Understanding Benchmarks
π Benchmark
π
Year
π Paper
π» Code
π― Focus Area
OVO-Bench
2025
-
π» GitHub
Online video understanding
StreamingBench
2025
π arXiv
-
Comprehensive streaming eval
OmniMMI
2025
π arXiv
-
Omni-modal interaction
RTV-Bench
2025
π arXiv
-
Real-time video benchmark
VStream-QA (RVS)
2024
π arXiv
-
Video stream QA
StreamBench
2025
-
π» GitHub
Streaming benchmark
StreamingCoT
2025
π arXiv
-
Chain-of-thought streaming
TV-Online
2025
π OpenReview
-
TV video understanding
ProactiveVideoQA
2025
π arXiv
-
Proactive QA
SVBench
2025
π arXiv
-
Streaming video benchmark
OSTBench
2025
-
π» GitHub
Online streaming tasks
StreamEQA
2025
-
π» GitHub
Embodied QA streaming
Category
Count
Latest Year
π Streaming Video LLMs
11
2025
β‘ KV Cache Compression
5
2025
π¨ Token Compression
4
2025
π£οΈ Proactive QA
9
2025
π Benchmarks
12
2025
Total Papers
41
2025
We welcome contributions! Please feel free to:
π Submit new papers via Pull Request
π Report issues or suggest improvements via Issues
β Star this repository if you find it helpful!
Ensure the paper is related to streaming/online video understanding
Provide paper link (arXiv, OpenReview, etc.) or code repository
Include a brief description of the core contribution
Follow the existing table format
π§ Contact & Collaboration
Feel free to reach out for collaborations or discussions!
Last Updated: December 2025
Made with β€οΈ by the Streaming Video Understanding Community