Skip to content

Commit 8eb1d4e

Browse files
committed
update readme
1 parent f247117 commit 8eb1d4e

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

README.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
<div align="center">
22

3-
<h2>
4-
<a href="https://github.com/OpenGVLab/RIVER">[ICLR 2026] RIVER: A Real-Time Interaction Benchmark for Video LLMs</a>
5-
</h2>
3+
<h1>
4+
RIVER: A Real-Time Interaction Benchmark for Video LLMs
5+
</h1>
66

77
<img src="assets/RIVER logo.png" width="80" alt="RIVER logo">
88

@@ -12,16 +12,19 @@
1212
[Xiangyu Zeng](https://scholar.google.com/citations?user=jS13DXkAAAAJ&hl),
1313
[Yi Wang](https://scholar.google.com/citations?user=Xm2M8UwAAAAJ),
1414
[Limin Wang<sup>†</sup>](https://scholar.google.com/citations?user=HEuN8PcAAAAJ)
15-
[[🤗 HF Dataset]](https://huggingface.co/datasets/nanamma/RIVER),
16-
[[📄 arXiv]](https://arxiv.org/abs/2603.03985)
15+
[[💻 GitHub]](https://huggingface.co/datasets/nanamma/RIVER),
16+
[[🤗 Dataset on HF]](https://huggingface.co/datasets/nanamma/RIVER),
17+
[[📄 ArXiv]](https://arxiv.org/abs/2603.03985)
1718
</div>
1819

1920

2021
## Introduction
21-
This project introduces **RIVER Bench**, the first benchmark designed to evaluate the real-time interactive capabilities of Video Large Language Models through streaming video perception, featuring novel tasks for memory, live-perception, and proactive response.
22+
This project introduces **RIVER Bench**, designed to evaluate the real-time interactive capabilities of Video Large Language Models through streaming video perception, featuring novel tasks for memory, live-perception, and proactive response.
2223

2324
![RIVER](assets/river.jpg)
2425

26+
Based on the frequency and timing of reference events, questions, and answers, we further categorize online interaction tasks into four distinct subclasses, as visually depicted in the figure. For the Retro-Memory, the clue is drawn from the past; for the live-Perception, it comes from the present—both demand an immediate response. For the Pro-Response task, Video LLMs need to wait until the corresponding clue appears and then respond as quickly as possible.
27+
2528
## Dataset Preparation
2629
|Dataset |URL|
2730
|--------------|---|

0 commit comments

Comments
 (0)