|
10 | 10 |
|
11 | 11 | ### Evaluation Results |
12 | 12 |
|
13 | | -| Model | BLEU-4 | BLEU-1 | ROUGE-L | CIDEr | Word_cnt mean. | Word_cnt std. | |
14 | | -|:------------------------------|---------:|---------:|----------:|--------:|-----------------:|----------------:| |
15 | | -| Qwen-VL-Chat | 34 | 75.8 | 54.9 | 98.9 | 10 | 1.7 | |
16 | | -| IDEFICS-80B-Instruct | 32.5 | 76.1 | 54.1 | 94.9 | 9.7 | 3.2 | |
17 | | -| IDEFICS-9B-Instruct | 29.4 | 72.7 | 53.4 | 90.4 | 10.5 | 4.4 | |
18 | | -| InstructBLIP-7B | 20.9 | 56.8 | 39.9 | 58.1 | 11.6 | 5.9 | |
19 | | -| InstructBLIP-13B | 16.9 | 50 | 37 | 52.4 | 11.8 | 12.8 | |
20 | | -| InternLM-XComposer-VL | 12.4 | 38.3 | 37.9 | 41 | 26.3 | 22.2 | |
21 | | -| TransCore-M | 8.8 | 30.3 | 36.1 | 34.7 | 39.9 | 27.9 | |
22 | | -| GeminiProVision | 8.4 | 33.2 | 31.2 | 9.7 | 35.2 | 15.7 | |
23 | | -| LLaVA-v1.5-7B (QLoRA, XTuner) | 7.2 | 25 | 36.6 | 43.2 | 48.8 | 42.9 | |
24 | | -| mPLUG-Owl2 | 7.1 | 25.8 | 33.6 | 35 | 45.8 | 32.1 | |
25 | | -| LLaVA-v1-7B | 6.7 | 27.3 | 26.7 | 6.1 | 40.9 | 16.1 | |
26 | | -| VisualGLM | 5.4 | 28.6 | 23.6 | 0.2 | 41.5 | 11.5 | |
27 | | -| LLaVA-v1.5-13B (QLoRA, XTuner) | 5.3 | 19.6 | 25.8 | 17.8 | 72.2 | 39.4 | |
28 | | -| LLaVA-v1.5-13B | 5.1 | 20.7 | 21.2 | 0.3 | 70.6 | 22.3 | |
29 | | -| LLaVA-v1.5-7B | 4.6 | 19.6 | 19.9 | 0.1 | 72.5 | 21.7 | |
30 | | -| PandaGPT-13B | 4.6 | 19.9 | 19.3 | 0.1 | 65.4 | 16.6 | |
31 | | -| MiniGPT-4-v1-13B | 4.4 | 20 | 19.8 | 1.3 | 64.4 | 30.5 | |
32 | | -| MiniGPT-4-v1-7B | 4.3 | 19.6 | 17.5 | 0.8 | 61.9 | 30.6 | |
33 | | -| LLaVA-InternLM-7B (QLoRA) | 4 | 17.3 | 17.2 | 0.1 | 82.3 | 21 | |
34 | | -| CogVLM-17B-Chat | 3.6 | 21.3 | 20 | 0.1 | 56.2 | 13.7 | |
35 | | -| Qwen-VL | 3.5 | 11.6 | 30 | 41.1 | 46.6 | 105.2 | |
36 | | -| GPT-4v (detail: low) | 3.3 | 18 | 18.1 | 0 | 77.8 | 20.4 | |
37 | | -| ShareGPT4V-7B | 1.4 | 9.7 | 10.6 | 0.1 | 147.9 | 45.4 | |
38 | | -| MiniGPT-4-v2 | 1.4 | 12.6 | 13.3 | 0.1 | 83 | 27.1 | |
39 | | -| OpenFlamingo v2 | 1.3 | 6.4 | 15.8 | 14.9 | 60 | 81.9 | |
40 | | -| SharedCaptioner | 1 | 8.8 | 9.2 | 0 | 164.2 | 31.6 | |
| 13 | +| Model | BLEU-4 | BLEU-1 | ROUGE-L | CIDEr | Word_cnt mean. | Word_cnt std. | |
| 14 | +|:----------------------------|---------:|---------:|----------:|--------:|-----------------:|----------------:| |
| 15 | +| EMU2-Chat | 38.7 | 78.2 | 56.9 | 109.2 | 9.6 | 1.1 | |
| 16 | +| Qwen-VL-Chat | 34 | 75.8 | 54.9 | 98.9 | 10 | 1.7 | |
| 17 | +| IDEFICS-80B-Instruct | 32.5 | 76.1 | 54.1 | 94.9 | 9.7 | 3.2 | |
| 18 | +| IDEFICS-9B-Instruct | 29.4 | 72.7 | 53.4 | 90.4 | 10.5 | 4.4 | |
| 19 | +| InstructBLIP-7B | 20.9 | 56.8 | 39.9 | 58.1 | 11.6 | 5.9 | |
| 20 | +| InstructBLIP-13B | 16.9 | 50 | 37 | 52.4 | 11.8 | 12.8 | |
| 21 | +| InternLM-XComposer-VL | 12.4 | 38.3 | 37.9 | 41 | 26.3 | 22.2 | |
| 22 | +| GeminiProVision | 8.4 | 33.2 | 31.2 | 9.7 | 35.2 | 15.7 | |
| 23 | +| LLaVA-v1.5-7B (QLoRA) | 7.2 | 25 | 36.6 | 43.2 | 48.8 | 42.9 | |
| 24 | +| mPLUG-Owl2 | 7.1 | 25.8 | 33.6 | 35 | 45.8 | 32.1 | |
| 25 | +| LLaVA-v1-7B | 6.7 | 27.3 | 26.7 | 6.1 | 40.9 | 16.1 | |
| 26 | +| VisualGLM | 5.4 | 28.6 | 23.6 | 0.2 | 41.5 | 11.5 | |
| 27 | +| LLaVA-v1.5-13B (QLoRA) | 5.3 | 19.6 | 25.8 | 17.8 | 72.2 | 39.4 | |
| 28 | +| LLaVA-v1.5-13B | 5.1 | 20.7 | 21.2 | 0.3 | 70.6 | 22.3 | |
| 29 | +| LLaVA-v1.5-7B | 4.6 | 19.6 | 19.9 | 0.1 | 72.5 | 21.7 | |
| 30 | +| PandaGPT-13B | 4.6 | 19.9 | 19.3 | 0.1 | 65.4 | 16.6 | |
| 31 | +| MiniGPT-4-v1-13B | 4.4 | 20 | 19.8 | 1.3 | 64.4 | 30.5 | |
| 32 | +| MiniGPT-4-v1-7B | 4.3 | 19.6 | 17.5 | 0.8 | 61.9 | 30.6 | |
| 33 | +| LLaVA-InternLM-7B (QLoRA) | 4 | 17.3 | 17.2 | 0.1 | 82.3 | 21 | |
| 34 | +| LLaVA-InternLM2-20B (QLoRA) | 4 | 17.9 | 17.3 | 0 | 83.2 | 20.4 | |
| 35 | +| CogVLM-17B-Chat | 3.6 | 21.3 | 20 | 0.1 | 56.2 | 13.7 | |
| 36 | +| Qwen-VL | 3.5 | 11.6 | 30 | 41.1 | 46.6 | 105.2 | |
| 37 | +| GPT-4v (detail: low) | 3.3 | 18 | 18.1 | 0 | 77.8 | 20.4 | |
| 38 | +| TransCore-M | 2.1 | 14.2 | 13.8 | 0.2 | 92 | 6.7 | |
| 39 | +| ShareGPT4V-7B | 1.4 | 9.7 | 10.6 | 0.1 | 147.9 | 45.4 | |
| 40 | +| MiniGPT-4-v2 | 1.4 | 12.6 | 13.3 | 0.1 | 83 | 27.1 | |
| 41 | +| OpenFlamingo v2 | 1.3 | 6.4 | 15.8 | 14.9 | 60 | 81.9 | |
| 42 | +| SharedCaptioner | 1 | 8.8 | 9.2 | 0 | 164.2 | 31.6 | |
41 | 43 |
|
42 | 44 | We noticed that, VLMs that generate long image descriptions tend to achieve inferior scores under different caption metrics. |
43 | 45 |
|
|
0 commit comments