ReXrank is a public leaderboard for chest X-ray image interpretation, including both radiology report generation (RRG) and visual question answering (VQA) tasks.
ReXrank Challenge V1.0 is a competition in the generation of chest radiograph reports utilizing ReXGradient, the largest private test dataset consisting of 10,000 studies across 67 sites. The challenge attracted diverse participants from academic institutions, industry, and independent research teams, resulting in 24 state-of-the-art models previously benchmarked.
ReXrank Challenge V2.0 is a competition in VQA task utilizing VQA dataset constructed from ReXGradient, including 41,007 VQA pairs with 10,000 radiological studies. We benchmarked 8 state-of-the-art models.
ReXGradient-160K is the largest publicly available multi-site chest X-ray dataset, containing 273,004 unique chest X-ray images from 160,000 radiological studies, collected from 109,487 unique patients across 3 U.S. health systems (79 medical sites). In ReXrank, we use additional private test set ReXGradient, 10,000 studies for benchmarking.
ReXVQA is the largest and most comprehensive benchmark for VQA in chest radiology, comprising 653834 questions paired with 160,000 radiological studies. The dataset is constructed from ReXGradient-160K.
Rank | ReXVQA |
---|---|
1 |
MedGemma-4B-it
|
2 |
Janus-Pro-7B
DeepSeek |
3 |
Qwen2.5VL-7B-Instruct
Qwen |
4 |
Eagle2-9B
NVIDIA |
5 |
Gemini-1.5-Pro
|
6 |
Qwen2VL-7B-Instruct
Alibaba |
7 |
Phi35-Vision-Instruct
Microsoft |
8 |
LLaVA-1.5-7B
Meta |
ReXGradient is a large-scale private test dataset contains 10,000 studies collected from different medical centers in the US.
Rank |
Model |
1/RadCliQ-v1 |
BLEU |
BertScore |
SembScore |
RadGraph |
RaTEScore |
GREEN |
1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2025 |
MoERad-IU
IIT Madras |
1.018 | 0.227 | 0.434 | 0.446 | 0.247 | 0.575 | 0.494 | 0.468 |
2 2024 |
MedVersa
Harvard |
1.008 | 0.21 | 0.431 | 0.498 | 0.202 | 0.527 | 0.532 | 0.475 |
3 2025 |
MedGemma
|
1.008 | 0.2 | 0.427 | 0.479 | 0.223 | 0.617 | 0.566 | 0.457 |
4 2024 |
MAIRA-2
Microsoft |
0.963 | 0.205 | 0.436 | 0.462 | 0.187 | 0.559 | 0.531 | 0.475 |
5 2023 |
VLCI-IU
SYSU |
0.897 | 0.214 | 0.365 | 0.467 | 0.215 | 0.573 | 0.536 | 0.452 |
6 2025 |
RadPhi3.5Vision
Microsoft |
0.891 | 0.209 | 0.383 | 0.488 | 0.169 | 0.544 | 0.453 | 0.458 |
7 2023 |
RGRG
TUM |
0.888 | 0.19 | 0.391 | 0.47 | 0.169 | 0.54 | 0.487 | 0.46 |
8 2025 |
DD-LLaVA-X
SNUH |
0.886 | 0.166 | 0.387 | 0.469 | 0.174 | 0.542 | 0.504 | 0.459 |
9 2024 |
Libra
University of Glasgow |
0.881 | 0.165 | 0.385 | 0.474 | 0.168 | 0.544 | 0.555 | 0.473 |
10 2023 |
RaDialog
TUM |
0.876 | 0.188 | 0.402 | 0.45 | 0.158 | 0.522 | 0.435 | 0.456 |
11 2024 |
CXRMate-ED
CSIRO |
0.872 | 0.202 | 0.398 | 0.415 | 0.187 | 0.564 | 0.518 | 0.472 |
12 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.866 | 0.186 | 0.374 | 0.46 | 0.176 | 0.524 | 0.514 | 0.47 |
13 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.842 | 0.178 | 0.395 | 0.405 | 0.167 | 0.52 | 0.47 | 0.457 |
14 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.83 | 0.169 | 0.372 | 0.442 | 0.154 | 0.517 | 0.489 | 0.465 |
15 2024 |
CXRMate-RRG24
CSIRO |
0.792 | 0.15 | 0.327 | 0.462 | 0.152 | 0.518 | 0.408 | 0.458 |
16 2024 |
CheXpertPlus-CheX
Stanford |
0.787 | 0.143 | 0.361 | 0.431 | 0.124 | 0.476 | 0.411 | 0.414 |
17 2024 |
CheXpertPlus-MIMIC
Stanford |
0.777 | 0.154 | 0.341 | 0.442 | 0.13 | 0.501 | 0.52 | 0.473 |
18 2023 |
RadFM
SJTU |
0.775 | 0.157 | 0.365 | 0.392 | 0.135 | 0.504 | 0.406 | 0.438 |
19 2024 |
BiomedGPT-IU
Lehigh University |
0.771 | 0.099 | 0.317 | 0.437 | 0.157 | 0.472 | 0.388 | 0.451 |
20 2025 |
MoERad-MIMIC
IIT Madras |
0.756 | 0.145 | 0.351 | 0.406 | 0.116 | 0.508 | 0.431 | 0.446 |
21 2023 |
VLCI-MIMIC
SYSU |
0.721 | 0.157 | 0.31 | 0.402 | 0.122 | 0.488 | 0.477 | 0.455 |
22 2024 |
CheXagent
Stanford |
0.674 | 0.093 | 0.305 | 0.366 | 0.08 | 0.428 | 0.241 | 0.456 |
23 2024 |
GPT4V
OpenAI |
0.629 | 0.075 | 0.214 | 0.337 | 0.138 | 0.47 | 0.497 | 0.43 |
24 2024 |
LLM-CXR
KAIST |
0.507 | 0.043 | 0.182 | 0.142 | 0.029 | 0.317 | 0.044 | 0.326 |
4 2024 |
MedVersa
Harvard |
0.984 | 0.172 | 0.438 | 0.48 | 0.188 | 0.527 | 0.524 | 0.467 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.838 | 0.196 | 0.389 | 0.429 | 0.166 | 0.5 | 0.508 | 0.466 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
0.791 | 0.177 | 0.364 | 0.431 | 0.139 | 0.481 | 0.523 | 0.465 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.748 | 0.165 | 0.333 | 0.395 | 0.148 | 0.502 | 0.468 | 0.425 |
5 2023 |
RadFM
SJTU |
0.737 | 0.132 | 0.338 | 0.375 | 0.131 | 0.466 | 0.405 | 0.429 |
6 2024 |
GPT4V
OpenAI |
0.605 | 0.072 | 0.214 | 0.364 | 0.175 | 0.456 | 0.356 | 0.423 |
MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments. * denotes the model was trained on this dataset.
Rank |
Model |
1/RadCliQ-v1 |
BLEU |
BertScore |
SembScore |
RadGraph |
RaTEScore |
GREEN |
1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2024 |
MedVersa
Harvard |
1.103 | 0.209 | 0.448 | 0.466 | 0.273 | 0.55 | 0.374 | 0.365 |
2 2024 |
Libra
University of Glasgow |
0.898 | 0.232 | 0.402 | 0.403 | 0.218 | 0.523 | 0.356 | 0.362 |
3 2025 |
RadPhi3.5Vision
Microsoft |
0.888 | 0.223 | 0.386 | 0.431 | 0.207 | 0.534 | 0.294 | 0.356 |
4 2024 |
CXRMate-ED
CSIRO |
0.872 | 0.208 | 0.383 | 0.396 | 0.223 | 0.531 | 0.327 | 0.358 |
5 2024 |
CXRMate-RRG24
CSIRO |
0.87 | 0.198 | 0.367 | 0.423 | 0.22 | 0.521 | 0.338 | 0.359 |
6 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.805 | 0.142 | 0.367 | 0.379 | 0.181 | 0.49 | 0.305 | 0.363 |
7 2025 |
DD-LLaVA-X
SNUH |
0.801 | 0.154 | 0.348 | 0.402 | 0.182 | 0.505 | 0.301 | 0.361 |
8 2023 |
RaDialog
TUM |
0.799 | 0.127 | 0.363 | 0.387 | 0.172 | 0.485 | 0.273 | 0.359 |
9 2024 |
CheXpertPlus-MIMIC
Stanford |
0.788 | 0.145 | 0.361 | 0.375 | 0.17 | 0.485 | 0.311 | 0.363 |
10 2023 |
RGRG
TUM |
0.755 | 0.13 | 0.348 | 0.344 | 0.168 | 0.491 | 0.273 | 0.352 |
11 2025 |
MedGemma
|
0.744 | 0.165 | 0.346 | 0.339 | 0.159 | 0.549 | 0.293 | 0.349 |
12 2024 |
CheXagent
Stanford |
0.741 | 0.113 | 0.346 | 0.347 | 0.148 | 0.474 | 0.257 | 0.355 |
13 2025 |
MoERad-MIMIC
IIT Madras |
0.726 | 0.163 | 0.341 | 0.334 | 0.143 | 0.465 | 0.24 | 0.354 |
14 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.719 | 0.126 | 0.331 | 0.329 | 0.149 | 0.432 | 0.268 | 0.362 |
15 2024 |
CheXpertPlus-CheX
Stanford |
0.698 | 0.077 | 0.314 | 0.325 | 0.142 | 0.469 | 0.225 | 0.351 |
16 2024 |
MAIRA-2
Microsoft |
0.694 | 0.088 | 0.308 | 0.339 | 0.131 | 0.517 | 0.224 | 0.359 |
17 2023 |
VLCI-MIMIC
SYSU |
0.68 | 0.136 | 0.304 | 0.305 | 0.14 | 0.45 | 0.256 | 0.357 |
18 2023 |
RadFM
SJTU |
0.65 | 0.087 | 0.313 | 0.259 | 0.109 | 0.45 | 0.185 | 0.351 |
19 2025 |
MoERad-IU
IIT Madras |
0.643 | 0.064 | 0.321 | 0.213 | 0.122 | 0.455 | 0.174 | 0.347 |
20 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.613 | 0.055 | 0.303 | 0.191 | 0.103 | 0.448 | 0.164 | 0.347 |
21 2023 |
VLCI-IU
SYSU |
0.599 | 0.075 | 0.263 | 0.212 | 0.109 | 0.449 | 0.21 | 0.347 |
22 2024 |
GPT4V
OpenAI |
0.558 | 0.068 | 0.207 | 0.214 | 0.084 | 0.423 | 0.161 | 0.343 |
23 2024 |
BiomedGPT-IU
Lehigh University |
0.544 | 0.02 | 0.192 | 0.224 | 0.059 | 0.36 | 0.123 | 0.341 |
24 2024 |
LLM-CXR
KAIST |
0.516 | 0.037 | 0.181 | 0.156 | 0.046 | 0.341 | 0.043 | 0.307 |
4 2024 |
MedVersa
Harvard |
0.919 | 0.193 | 0.43 | 0.315 | 0.273 | 0.554 | 0.421 | 0.361 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.825 | 0.166 | 0.362 | 0.391 | 0.203 | 0.52 | 0.367 | 0.365 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
0.802 | 0.165 | 0.353 | 0.382 | 0.193 | 0.511 | 0.377 | 0.365 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.715 | 0.127 | 0.3 | 0.342 | 0.173 | 0.51 | 0.302 | 0.355 |
5 2023 |
RadFM
SJTU |
0.625 | 0.081 | 0.281 | 0.245 | 0.111 | 0.448 | 0.214 | 0.346 |
6 2024 |
GPT4V
OpenAI |
0.549 | 0.065 | 0.204 | 0.19 | 0.085 | 0.429 | 0.127 | 0.331 |
IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.
Rank |
Model |
1/RadCliQ-v1 |
BLEU |
BertScore |
SembScore |
RadGraph |
RaTEScore |
GREEN |
1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2025 |
MoERad-IU
IIT Madras |
1.922 | 0.277 | 0.525 | 0.641 | 0.341 | 0.684 | 0.665 | 0.587 |
2 2024 |
MedVersa
Harvard |
1.46 | 0.206 | 0.527 | 0.606 | 0.235 | 0.65 | 0.631 | 0.569 |
3 2024 |
CXRMate-RRG24
CSIRO |
1.458 | 0.245 | 0.456 | 0.638 | 0.302 | 0.666 | 0.68 | 0.598 |
4 2023 |
VLCI-IU
SYSU |
1.381 | 0.268 | 0.455 | 0.619 | 0.288 | 0.679 | 0.698 | 0.551 |
5 2025 |
MedGemma
|
1.34 | 0.217 | 0.475 | 0.6 | 0.26 | 0.678 | 0.724 | 0.57 |
6 2024 |
MAIRA-2
Microsoft |
1.298 | 0.219 | 0.477 | 0.604 | 0.233 | 0.627 | 0.194 | 0.599 |
7 2023 |
Cvt2distilgpt2-IU
CSIRO |
1.283 | 0.244 | 0.482 | 0.548 | 0.265 | 0.62 | 0.686 | 0.563 |
8 2024 |
CXRMate-ED
CSIRO |
1.22 | 0.225 | 0.464 | 0.557 | 0.249 | 0.655 | 0.685 | 0.597 |
9 2025 |
DD-LLaVA-X
SNUH |
1.204 | 0.189 | 0.443 | 0.6 | 0.233 | 0.636 | 0.671 | 0.574 |
10 2023 |
RadFM
SJTU |
1.187 | 0.2 | 0.459 | 0.566 | 0.23 | 0.627 | 0.615 | 0.572 |
11 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
1.179 | 0.198 | 0.453 | 0.593 | 0.211 | 0.618 | 0.648 | 0.576 |
12 2024 |
Libra
University of Glasgow |
1.176 | 0.183 | 0.441 | 0.614 | 0.21 | 0.624 | 0.698 | 0.593 |
13 2023 |
RGRG
TUM |
1.174 | 0.216 | 0.437 | 0.602 | 0.223 | 0.62 | 0.665 | 0.596 |
14 2025 |
RadPhi3.5Vision
Microsoft |
1.166 | 0.248 | 0.433 | 0.607 | 0.22 | 0.634 | 0.597 | 0.552 |
15 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
1.126 | 0.199 | 0.422 | 0.609 | 0.209 | 0.606 | 0.682 | 0.608 |
16 2023 |
RaDialog
TUM |
1.086 | 0.201 | 0.444 | 0.544 | 0.205 | 0.586 | 0.586 | 0.543 |
17 2025 |
MoERad-MIMIC
IIT Madras |
1.02 | 0.171 | 0.42 | 0.559 | 0.178 | 0.603 | 0.584 | 0.579 |
18 2024 |
CheXpertPlus-MIMIC
Stanford |
0.988 | 0.178 | 0.386 | 0.593 | 0.169 | 0.585 | 0.661 | 0.622 |
19 2024 |
BiomedGPT-IU
Lehigh University |
0.956 | 0.142 | 0.375 | 0.522 | 0.213 | 0.543 | 0.523 | 0.543 |
20 2024 |
CheXpertPlus-CheX
Stanford |
0.92 | 0.157 | 0.413 | 0.495 | 0.153 | 0.534 | 0.541 | 0.548 |
21 2023 |
VLCI-MIMIC
SYSU |
0.913 | 0.139 | 0.364 | 0.483 | 0.22 | 0.578 | 0.474 | 0.488 |
22 2024 |
CheXagent
Stanford |
0.827 | 0.116 | 0.353 | 0.488 | 0.139 | 0.503 | 0.389 | 0.574 |
23 2024 |
GPT4V
OpenAI |
0.708 | 0.076 | 0.274 | 0.405 | 0.146 | 0.517 | 0.651 | 0.55 |
24 2024 |
LLM-CXR
KAIST |
0.486 | 0.033 | 0.186 | 0.057 | 0.023 | 0.28 | 0.025 | 0.302 |
4 2024 |
MedVersa
Harvard |
1.452 | 0.195 | 0.518 | 0.601 | 0.244 | 0.628 | 0.658 | 0.583 |
3 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
1.249 | 0.244 | 0.476 | 0.598 | 0.232 | 0.606 | 0.694 | 0.588 |
5 2023 |
RadFM
SJTU |
1.22 | 0.196 | 0.479 | 0.556 | 0.234 | 0.596 | 0.644 | 0.551 |
1 2024 |
CheXpertPlus-MIMIC
Stanford |
1.111 | 0.227 | 0.449 | 0.594 | 0.187 | 0.57 | 0.681 | 0.615 |
2 2024 |
CheXpertPlus-CheX
Stanford |
0.995 | 0.198 | 0.394 | 0.55 | 0.211 | 0.604 | 0.706 | 0.568 |
6 2024 |
GPT4V
OpenAI |
0.683 | 0.079 | 0.235 | 0.403 | 0.16 | 0.519 | 0.399 | 0.528 |
CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.
Rank |
Model |
1/RadCliQ-v1 |
BLEU |
BertScore |
SembScore |
RadGraph |
RaTEScore |
GREEN |
1/FineRadScore |
---|---|---|---|---|---|---|---|---|---|
1 2025 |
RadPhi3.5Vision
Microsoft |
0.86 | 0.198 | 0.353 | 0.437 | 0.217 | 0.51 | 0.243 | 0.356 |
2 2024 |
CheXpertPlus-CheX-MIMIC
Stanford |
0.808 | 0.153 | 0.335 | 0.404 | 0.207 | 0.497 | 0.274 | 0.348 |
3 2024 |
CXRMate-RRG24
CSIRO |
0.801 | 0.157 | 0.315 | 0.411 | 0.218 | 0.521 | 0.276 | 0.35 |
4 2024 |
MAIRA-2
Microsoft |
0.788 | 0.163 | 0.359 | 0.355 | 0.189 | 0.485 | 0.273 | 0.352 |
5 2024 |
CheXpertPlus-CheX
Stanford |
0.786 | 0.15 | 0.342 | 0.377 | 0.191 | 0.487 | 0.237 | 0.343 |
6 2025 |
DD-LLaVA-X
SNUH |
0.753 | 0.085 | 0.318 | 0.385 | 0.172 | 0.476 | 0.206 | 0.343 |
7 2024 |
CXRMate-ED
CSIRO |
0.723 | 0.157 | 0.324 | 0.316 | 0.175 | 0.498 | 0.265 | 0.367 |
8 2024 |
MedVersa
Harvard |
0.719 | 0.129 | 0.323 | 0.344 | 0.147 | 0.47 | 0.243 | 0.343 |
9 2024 |
Libra
University of Glasgow |
0.718 | 0.157 | 0.319 | 0.323 | 0.169 | 0.466 | 0.253 | 0.344 |
10 2023 |
RaDialog
TUM |
0.709 | 0.131 | 0.312 | 0.353 | 0.138 | 0.445 | 0.211 | 0.333 |
11 2025 |
MedGemma
|
0.706 | 0.147 | 0.328 | 0.325 | 0.137 | 0.511 | 0.246 | 0.337 |
12 2023 |
RGRG
TUM |
0.674 | 0.154 | 0.315 | 0.274 | 0.14 | 0.453 | 0.216 | 0.337 |
13 2024 |
CheXpertPlus-MIMIC
Stanford |
0.663 | 0.14 | 0.292 | 0.294 | 0.134 | 0.43 | 0.238 | 0.344 |
14 2025 |
MoERad-MIMIC
IIT Madras |
0.641 | 0.122 | 0.267 | 0.3 | 0.12 | 0.434 | 0.166 | 0.343 |
15 2024 |
CheXagent
Stanford |
0.638 | 0.123 | 0.278 | 0.269 | 0.125 | 0.434 | 0.183 | 0.341 |
16 2023 |
Cvt2distilgpt2-MIMIC
CSIRO |
0.626 | 0.124 | 0.267 | 0.266 | 0.119 | 0.42 | 0.215 | 0.346 |
17 2025 |
MoERad-IU
IIT Madras |
0.595 | 0.075 | 0.284 | 0.175 | 0.102 | 0.39 | 0.127 | 0.341 |
18 2023 |
VLCI-MIMIC
SYSU |
0.589 | 0.12 | 0.229 | 0.251 | 0.101 | 0.384 | 0.165 | 0.33 |
19 2023 |
Cvt2distilgpt2-IU
CSIRO |
0.577 | 0.084 | 0.267 | 0.155 | 0.098 | 0.382 | 0.147 | 0.332 |
20 2023 |
RadFM
SJTU |
0.572 | 0.081 | 0.235 | 0.216 | 0.08 | 0.396 | 0.096 | 0.333 |
21 2024 |
GPT4V
OpenAI |
0.568 | 0.081 | 0.215 | 0.234 | 0.082 | 0.415 | 0.152 | 0.339 |
22 2023 |
VLCI-IU
SYSU |
0.555 | 0.106 | 0.22 | 0.17 | 0.094 | 0.418 | 0.194 | 0.339 |
23 2024 |
BiomedGPT-IU
Lehigh University |
0.552 | 0.022 | 0.2 | 0.241 | 0.056 | 0.351 | 0.118 | 0.32 |
24 2024 |
LLM-CXR
KAIST |
0.519 | 0.041 | 0.162 | 0.211 | 0.037 | 0.321 | 0.022 | 0.291 |
2 2024 |
CheXpertPlus_CheX
Stanford |
0.512 | 0.142 | 0.02 | 0.38 | 0.07 | 0.492 | 0.363 | 0.353 |
3 2024 |
CheXpertPlus_CheX_MIMIC
Stanford |
0.511 | 0.14 | 0.011 | 0.388 | 0.071 | 0.503 | 0.382 | 0.36 |
4 2024 |
MedVersa
Harvard |
0.493 | 0.09 | 0.013 | 0.337 | 0.05 | 0.452 | 0.334 | 0.354 |
1 2024 |
CheXpertPlus_MIMIC
Stanford |
0.482 | 0.103 | 0.002 | 0.318 | 0.049 | 0.429 | 0.293 | 0.347 |
5 2023 |
RadFM
SJTU |
0.443 | 0.067 | -0.038 | 0.229 | 0.027 | 0.39 | 0.137 | 0.34 |
6 2024 |
GPT4V
OpenAI |
0.431 | 0.055 | -0.065 | 0.208 | 0.028 | 0.393 | 0.182 | 0.329 |
Performance comparison of various vision-language models on medical VQA tasks.
Rank |
Model |
Overall Accuracy |
Differential Diagnosis |
Geometric Information |
Location Assessment |
Negation Assessment |
Presence Assessment |
---|---|---|---|---|---|---|---|
1 |
MedGemma-4B-it
|
0.8217 | 0.7671 | 0.8045 | 0.8347 | 0.8503 | 0.8521 |
2 |
Janus-Pro-7B
DeepSeek |
0.6656 | 0.5634 | 0.7542 | 0.6462 | 0.7573 | 0.6070 |
3 |
Qwen2.5VL-7B-Instruct
Qwen |
0.6555 | 0.6361 | 0.6648 | 0.6324 | 0.8327 | 0.5114 |
4 |
Eagle2-9B
NVIDIA |
0.6443 | 0.6817 | 0.5698 | 0.5695 | 0.8632 | 0.5375 |
5 |
Gemini-1.5-Pro
|
0.6331 | 0.6221 | 0.4689 | 0.5960 | 0.8568 | 0.6217 |
6 |
Qwen2VL-7B-Instruct
Alibaba |
0.5470 | 0.5265 | 0.4494 | 0.5405 | 0.6269 | 0.5915 |
7 |
Phi35-Vision-Instruct
Microsoft |
0.4749 | 0.6224 | 0.2215 | 0.3711 | 0.7950 | 0.3644 |
8 |
LLaVA-1.5-7B
Meta |
0.2661 | 0.2161 | 0.2346 | 0.2761 | 0.2402 | 0.3633 |