All notable changes to EASI are documented in this file.
- 2 new benchmarks:
- ERIQ
- OSI-Bench
- More lmms-eval benchmark verification entries.
- VLMEvalKit evaluation issues on MuirBench.
- New Backend Support: Integrated lmms-eval alongside VLMEvalKit, offering flexible evaluation options.
- Expanded benchmark support: Added DSR-Bench.
- 1 new benchmark:
- STI-Bench
- 2 new models:
- SenseNova-SI-1.1-BAGEL-7B-MoT, SenseNova-SI-1.3-InternVL3-8B
- Add Benchmark Verification Info
- 4 new benchmarks:
- SPBench, MMSI-Video-Bench, VSI-SUPER-Recall, VSI-SUPER-Count.
- 3 new image benchmarks:
- ERQA, RefSpatial-Bench, RoboSpatial-Home
- Docker support improvements:
- Added a generic EASI runtime Dockerfile
- Added model-specific Dockerfiles for Cambrian-S and VLM3R
- 6 new models:
- SenseNova-SI 1.1: Qwen2.5-VL-3B / Qwen2.5-VL-7B / Qwen3-VL-8B
- SenseNova-SI 1.2: InternVL3-8B
- VLM-3R
- BAGEL-7B-MoT
- 4 new image benchmarks:
- STAR-Bench
- OmniSpatial
- Spatial-Visualization-Benchmark
- SPAR-Bench
- LLM-based answer extraction for selected EASI benchmarks
- Enable via:
--judge gpt-4o-1120 - Routes to
gpt-4o-2024-11-20
- Enable via:
- 9 Spatial Intelligence models (total expanded from 7 → 16):
- SenseNova-SI: InternVL3-8B / InternVL3-2B
- SpaceR-7B
- VST-3B-SFT / VST-7B-SFT
- Cambrian-S: 0.5B / 1.5B / 3B / 7B
- New benchmark:
- VSI-Bench-Debiased
- 7 Spatial Intelligence models:
- SenseNova-SI (InternVL3-8B / InternVL3-2B)
- MindCube Series (3 variants)
- SpatialLadder-3B
- SpatialMLLM-4B
- 6 Spatial Intelligence benchmarks:
- Image: MindCube, ViewSpatial, EmbSpatial, MMSI
- Video: VSI-Bench, SITE-Bench
- EASI standardized evaluation protocol (as described in the paper)
- Unified VLMEvalKit-based evaluation pipeline