ReXrank

Chest X-ray Interpretation Leaderboard

What is ReXrank?

ReXrank is a public leaderboard for chest X-ray image interpretation, including both radiology report generation (RRG) and visual question answering (VQA) tasks.


ReXrank Challenge V1.0 is a competition in the generation of chest radiograph reports utilizing ReXGradient, the largest private test dataset consisting of 10,000 studies across 67 sites. The challenge attracted diverse participants from academic institutions, industry, and independent research teams, resulting in 24 state-of-the-art models previously benchmarked.


ReXrank Challenge V2.0 is a competition in VQA task utilizing VQA dataset constructed from ReXGradient, including 41,007 VQA pairs with 10,000 radiological studies. We benchmarked 8 state-of-the-art models.


ReXGradient-160K is the largest publicly available multi-site chest X-ray dataset, containing 273,004 unique chest X-ray images from 160,000 radiological studies, collected from 109,487 unique patients across 3 U.S. health systems (79 medical sites). In ReXrank, we use additional private test set ReXGradient, 10,000 studies for benchmarking.


ReXVQA is the largest and most comprehensive benchmark for VQA in chest radiology, comprising 653834 questions paired with 160,000 radiological studies. The dataset is constructed from ReXGradient-160K.

ReXrank Challenge V1.0 Leaderboard (RRG)

Rank ReXGradient MIMIC-CXR IU-Xray CheXpert Plus

1

MedVersa

Harvard

MedVersa

Harvard

CheXpertPlus-MIMIC

Stanford

CXRMate-ED

CSIRO

2

MAIRA-2

Microsoft

CheXpertPlus-MIMIC

Stanford

Cvt2distilgpt2-MIMIC

CSIRO

RadPhi3.5Vision

Microsoft

3

Libra

University of Glasgow

CheXpertPlus-CheX-MIMIC

Stanford

MAIRA-2

Microsoft

MAIRA-2

Microsoft

4

CheXpertPlus-MIMIC

Stanford

Cvt2distilgpt2-MIMIC

CSIRO

CXRMate-RRG24

CSIRO

CXRMate-RRG24

CSIRO

5

CXRMate-ED

CSIRO

Libra

University of Glasgow

CXRMate-ED

CSIRO

CheXpertPlus-CheX-MIMIC

Stanford

6

Cvt2distilgpt2-MIMIC

CSIRO

DD-LLaVA-X

SNUH

RGRG

TUM

Cvt2distilgpt2-MIMIC

CSIRO

7

MoERad-IU

IIT Madras

MAIRA-2

Microsoft

Libra

University of Glasgow

CheXpertPlus-MIMIC

Stanford

8

CheXpertPlus-CheX-MIMIC

Stanford

RaDialog

TUM

MoERad-IU

IIT Madras

Libra

University of Glasgow

9

RGRG

TUM

CXRMate-RRG24

CSIRO

MoERad-MIMIC

IIT Madras

CheXpertPlus-CheX

Stanford

10

DD-LLaVA-X

SNUH

CXRMate-ED

CSIRO

CheXpertPlus-CheX-MIMIC

Stanford

DD-LLaVA-X

SNUH

11

RadPhi3.5Vision

Microsoft

VLCI-MIMIC

SYSU

CheXagent

Stanford

MoERad-MIMIC

IIT Madras

12

CXRMate-RRG24

CSIRO

RadPhi3.5Vision

Microsoft

DD-LLaVA-X

SNUH

MedVersa

Harvard

13

MedGemma

Google

CheXagent

Stanford

RadFM

SJTU

CheXagent

Stanford

14

Cvt2distilgpt2-IU

CSIRO

MoERad-MIMIC

IIT Madras

MedGemma

Google

MoERad-IU

IIT Madras

15

CheXagent

Stanford

RGRG

TUM

MedVersa

Harvard

VLCI-IU

SYSU

16

RaDialog

TUM

CheXpertPlus-CheX

Stanford

Cvt2distilgpt2-IU

CSIRO

GPT4V

OpenAI

17

VLCI-MIMIC

SYSU

RadFM

SJTU

RadPhi3.5Vision

Microsoft

RGRG

TUM

18

VLCI-IU

SYSU

MedGemma

Google

VLCI-IU

SYSU

MedGemma

Google

19

BiomedGPT-IU

Lehigh University

Cvt2distilgpt2-IU

CSIRO

GPT4V

OpenAI

RaDialog

TUM

20

MoERad-MIMIC

IIT Madras

VLCI-IU

SYSU

CheXpertPlus-CheX

Stanford

RadFM

SJTU

21

RadFM

SJTU

MoERad-IU

IIT Madras

RaDialog

TUM

Cvt2distilgpt2-IU

CSIRO

22

GPT4V

OpenAI

GPT4V

OpenAI

BiomedGPT-IU

Lehigh University

VLCI-MIMIC

SYSU

23

CheXpertPlus-CheX

Stanford

BiomedGPT-IU

Lehigh University

VLCI-MIMIC

SYSU

BiomedGPT-IU

Lehigh University

24

LLM-CXR

KAIST

LLM-CXR

KAIST

LLM-CXR

KAIST

LLM-CXR

KAIST

ReXrank Challenge V2.0 Leaderboard (VQA)

Rank ReXVQA

1

MedGemma-4B-it

Google

2

Janus-Pro-7B

DeepSeek

3

Qwen2.5VL-7B-Instruct

Qwen

4

Eagle2-9B

NVIDIA

5

Gemini-1.5-Pro

Google

6

Qwen2VL-7B-Instruct

Alibaba

7

Phi35-Vision-Instruct

Microsoft

8

LLaVA-1.5-7B

Meta

ReXrank Challenge V1.0 - Model Performance on ReXGradient

ReXGradient is a large-scale private test dataset contains 10,000 studies collected from different medical centers in the US.

Rank
Model
1/RadCliQ-v1
BLEU
BertScore
SembScore
RadGraph
RaTEScore
GREEN
1/FineRadScore

1

2025
MoERad-IU

IIT Madras

1.018 0.227 0.434 0.446 0.247 0.575 0.494 0.468

2

2024
MedVersa

Harvard

1.008 0.21 0.431 0.498 0.202 0.527 0.532 0.475

3

2025
MedGemma

Google

1.008 0.2 0.427 0.479 0.223 0.617 0.566 0.457

4

2024
MAIRA-2

Microsoft

0.963 0.205 0.436 0.462 0.187 0.559 0.531 0.475

5

2023
VLCI-IU

SYSU

0.897 0.214 0.365 0.467 0.215 0.573 0.536 0.452

6

2025
RadPhi3.5Vision

Microsoft

0.891 0.209 0.383 0.488 0.169 0.544 0.453 0.458

7

2023
RGRG

TUM

0.888 0.19 0.391 0.47 0.169 0.54 0.487 0.46

8

2025
DD-LLaVA-X

SNUH

0.886 0.166 0.387 0.469 0.174 0.542 0.504 0.459

9

2024
Libra

University of Glasgow

0.881 0.165 0.385 0.474 0.168 0.544 0.555 0.473

10

2023
RaDialog

TUM

0.876 0.188 0.402 0.45 0.158 0.522 0.435 0.456

11

2024
CXRMate-ED

CSIRO

0.872 0.202 0.398 0.415 0.187 0.564 0.518 0.472

12

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.866 0.186 0.374 0.46 0.176 0.524 0.514 0.47

13

2023
Cvt2distilgpt2-IU

CSIRO

0.842 0.178 0.395 0.405 0.167 0.52 0.47 0.457

14

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.83 0.169 0.372 0.442 0.154 0.517 0.489 0.465

15

2024
CXRMate-RRG24

CSIRO

0.792 0.15 0.327 0.462 0.152 0.518 0.408 0.458

16

2024
CheXpertPlus-CheX

Stanford

0.787 0.143 0.361 0.431 0.124 0.476 0.411 0.414

17

2024
CheXpertPlus-MIMIC

Stanford

0.777 0.154 0.341 0.442 0.13 0.501 0.52 0.473

18

2023
RadFM

SJTU

0.775 0.157 0.365 0.392 0.135 0.504 0.406 0.438

19

2024
BiomedGPT-IU

Lehigh University

0.771 0.099 0.317 0.437 0.157 0.472 0.388 0.451

20

2025
MoERad-MIMIC

IIT Madras

0.756 0.145 0.351 0.406 0.116 0.508 0.431 0.446

21

2023
VLCI-MIMIC

SYSU

0.721 0.157 0.31 0.402 0.122 0.488 0.477 0.455

22

2024
CheXagent

Stanford

0.674 0.093 0.305 0.366 0.08 0.428 0.241 0.456

23

2024
GPT4V

OpenAI

0.629 0.075 0.214 0.337 0.138 0.47 0.497 0.43

24

2024
LLM-CXR

KAIST

0.507 0.043 0.182 0.142 0.029 0.317 0.044 0.326

ReXrank Challenge V1.0 - Model Performance on MIMIC-CXR

MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments. * denotes the model was trained on this dataset.

Rank
Model
1/RadCliQ-v1
BLEU
BertScore
SembScore
RadGraph
RaTEScore
GREEN
1/FineRadScore

1

2024
MedVersa

Harvard

1.103 0.209 0.448 0.466 0.273 0.55 0.374 0.365

2

2024
Libra

University of Glasgow

0.898 0.232 0.402 0.403 0.218 0.523 0.356 0.362

3

2025
RadPhi3.5Vision

Microsoft

0.888 0.223 0.386 0.431 0.207 0.534 0.294 0.356

4

2024
CXRMate-ED

CSIRO

0.872 0.208 0.383 0.396 0.223 0.531 0.327 0.358

5

2024
CXRMate-RRG24

CSIRO

0.87 0.198 0.367 0.423 0.22 0.521 0.338 0.359

6

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.805 0.142 0.367 0.379 0.181 0.49 0.305 0.363

7

2025
DD-LLaVA-X

SNUH

0.801 0.154 0.348 0.402 0.182 0.505 0.301 0.361

8

2023
RaDialog

TUM

0.799 0.127 0.363 0.387 0.172 0.485 0.273 0.359

9

2024
CheXpertPlus-MIMIC

Stanford

0.788 0.145 0.361 0.375 0.17 0.485 0.311 0.363

10

2023
RGRG

TUM

0.755 0.13 0.348 0.344 0.168 0.491 0.273 0.352

11

2025
MedGemma

Google

0.744 0.165 0.346 0.339 0.159 0.549 0.293 0.349

12

2024
CheXagent

Stanford

0.741 0.113 0.346 0.347 0.148 0.474 0.257 0.355

13

2025
MoERad-MIMIC

IIT Madras

0.726 0.163 0.341 0.334 0.143 0.465 0.24 0.354

14

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.719 0.126 0.331 0.329 0.149 0.432 0.268 0.362

15

2024
CheXpertPlus-CheX

Stanford

0.698 0.077 0.314 0.325 0.142 0.469 0.225 0.351

16

2024
MAIRA-2

Microsoft

0.694 0.088 0.308 0.339 0.131 0.517 0.224 0.359

17

2023
VLCI-MIMIC

SYSU

0.68 0.136 0.304 0.305 0.14 0.45 0.256 0.357

18

2023
RadFM

SJTU

0.65 0.087 0.313 0.259 0.109 0.45 0.185 0.351

19

2025
MoERad-IU

IIT Madras

0.643 0.064 0.321 0.213 0.122 0.455 0.174 0.347

20

2023
Cvt2distilgpt2-IU

CSIRO

0.613 0.055 0.303 0.191 0.103 0.448 0.164 0.347

21

2023
VLCI-IU

SYSU

0.599 0.075 0.263 0.212 0.109 0.449 0.21 0.347

22

2024
GPT4V

OpenAI

0.558 0.068 0.207 0.214 0.084 0.423 0.161 0.343

23

2024
BiomedGPT-IU

Lehigh University

0.544 0.02 0.192 0.224 0.059 0.36 0.123 0.341

24

2024
LLM-CXR

KAIST

0.516 0.037 0.181 0.156 0.046 0.341 0.043 0.307

ReXrank Challenge V1.0 - Model Performance on IU Xray

IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.

Rank
Model
1/RadCliQ-v1
BLEU
BertScore
SembScore
RadGraph
RaTEScore
GREEN
1/FineRadScore

1

2025
MoERad-IU

IIT Madras

1.922 0.277 0.525 0.641 0.341 0.684 0.665 0.587

2

2024
MedVersa

Harvard

1.46 0.206 0.527 0.606 0.235 0.65 0.631 0.569

3

2024
CXRMate-RRG24

CSIRO

1.458 0.245 0.456 0.638 0.302 0.666 0.68 0.598

4

2023
VLCI-IU

SYSU

1.381 0.268 0.455 0.619 0.288 0.679 0.698 0.551

5

2025
MedGemma

Google

1.34 0.217 0.475 0.6 0.26 0.678 0.724 0.57

6

2024
MAIRA-2

Microsoft

1.298 0.219 0.477 0.604 0.233 0.627 0.194 0.599

7

2023
Cvt2distilgpt2-IU

CSIRO

1.283 0.244 0.482 0.548 0.265 0.62 0.686 0.563

8

2024
CXRMate-ED

CSIRO

1.22 0.225 0.464 0.557 0.249 0.655 0.685 0.597

9

2025
DD-LLaVA-X

SNUH

1.204 0.189 0.443 0.6 0.233 0.636 0.671 0.574

10

2023
RadFM

SJTU

1.187 0.2 0.459 0.566 0.23 0.627 0.615 0.572

11

2024
CheXpertPlus-CheX-MIMIC

Stanford

1.179 0.198 0.453 0.593 0.211 0.618 0.648 0.576

12

2024
Libra

University of Glasgow

1.176 0.183 0.441 0.614 0.21 0.624 0.698 0.593

13

2023
RGRG

TUM

1.174 0.216 0.437 0.602 0.223 0.62 0.665 0.596

14

2025
RadPhi3.5Vision

Microsoft

1.166 0.248 0.433 0.607 0.22 0.634 0.597 0.552

15

2023
Cvt2distilgpt2-MIMIC

CSIRO

1.126 0.199 0.422 0.609 0.209 0.606 0.682 0.608

16

2023
RaDialog

TUM

1.086 0.201 0.444 0.544 0.205 0.586 0.586 0.543

17

2025
MoERad-MIMIC

IIT Madras

1.02 0.171 0.42 0.559 0.178 0.603 0.584 0.579

18

2024
CheXpertPlus-MIMIC

Stanford

0.988 0.178 0.386 0.593 0.169 0.585 0.661 0.622

19

2024
BiomedGPT-IU

Lehigh University

0.956 0.142 0.375 0.522 0.213 0.543 0.523 0.543

20

2024
CheXpertPlus-CheX

Stanford

0.92 0.157 0.413 0.495 0.153 0.534 0.541 0.548

21

2023
VLCI-MIMIC

SYSU

0.913 0.139 0.364 0.483 0.22 0.578 0.474 0.488

22

2024
CheXagent

Stanford

0.827 0.116 0.353 0.488 0.139 0.503 0.389 0.574

23

2024
GPT4V

OpenAI

0.708 0.076 0.274 0.405 0.146 0.517 0.651 0.55

24

2024
LLM-CXR

KAIST

0.486 0.033 0.186 0.057 0.023 0.28 0.025 0.302

ReXrank Challenge V1.0 - Model Performance on CheXpert Plus

CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.

Rank
Model
1/RadCliQ-v1
BLEU
BertScore
SembScore
RadGraph
RaTEScore
GREEN
1/FineRadScore

1

2025
RadPhi3.5Vision

Microsoft

0.86 0.198 0.353 0.437 0.217 0.51 0.243 0.356

2

2024
CheXpertPlus-CheX-MIMIC

Stanford

0.808 0.153 0.335 0.404 0.207 0.497 0.274 0.348

3

2024
CXRMate-RRG24

CSIRO

0.801 0.157 0.315 0.411 0.218 0.521 0.276 0.35

4

2024
MAIRA-2

Microsoft

0.788 0.163 0.359 0.355 0.189 0.485 0.273 0.352

5

2024
CheXpertPlus-CheX

Stanford

0.786 0.15 0.342 0.377 0.191 0.487 0.237 0.343

6

2025
DD-LLaVA-X

SNUH

0.753 0.085 0.318 0.385 0.172 0.476 0.206 0.343

7

2024
CXRMate-ED

CSIRO

0.723 0.157 0.324 0.316 0.175 0.498 0.265 0.367

8

2024
MedVersa

Harvard

0.719 0.129 0.323 0.344 0.147 0.47 0.243 0.343

9

2024
Libra

University of Glasgow

0.718 0.157 0.319 0.323 0.169 0.466 0.253 0.344

10

2023
RaDialog

TUM

0.709 0.131 0.312 0.353 0.138 0.445 0.211 0.333

11

2025
MedGemma

Google

0.706 0.147 0.328 0.325 0.137 0.511 0.246 0.337

12

2023
RGRG

TUM

0.674 0.154 0.315 0.274 0.14 0.453 0.216 0.337

13

2024
CheXpertPlus-MIMIC

Stanford

0.663 0.14 0.292 0.294 0.134 0.43 0.238 0.344

14

2025
MoERad-MIMIC

IIT Madras

0.641 0.122 0.267 0.3 0.12 0.434 0.166 0.343

15

2024
CheXagent

Stanford

0.638 0.123 0.278 0.269 0.125 0.434 0.183 0.341

16

2023
Cvt2distilgpt2-MIMIC

CSIRO

0.626 0.124 0.267 0.266 0.119 0.42 0.215 0.346

17

2025
MoERad-IU

IIT Madras

0.595 0.075 0.284 0.175 0.102 0.39 0.127 0.341

18

2023
VLCI-MIMIC

SYSU

0.589 0.12 0.229 0.251 0.101 0.384 0.165 0.33

19

2023
Cvt2distilgpt2-IU

CSIRO

0.577 0.084 0.267 0.155 0.098 0.382 0.147 0.332

20

2023
RadFM

SJTU

0.572 0.081 0.235 0.216 0.08 0.396 0.096 0.333

21

2024
GPT4V

OpenAI

0.568 0.081 0.215 0.234 0.082 0.415 0.152 0.339

22

2023
VLCI-IU

SYSU

0.555 0.106 0.22 0.17 0.094 0.418 0.194 0.339

23

2024
BiomedGPT-IU

Lehigh University

0.552 0.022 0.2 0.241 0.056 0.351 0.118 0.32

24

2024
LLM-CXR

KAIST

0.519 0.041 0.162 0.211 0.037 0.321 0.022 0.291

ReXrank Challenge V2.0 - Model Performance

Performance comparison of various vision-language models on medical VQA tasks.

Rank
Model
Overall Accuracy
Differential Diagnosis
Geometric Information
Location Assessment
Negation Assessment
Presence Assessment
1 MedGemma-4B-it

Google

0.8217 0.7671 0.8045 0.8347 0.8503 0.8521
2 Janus-Pro-7B

DeepSeek

0.6656 0.5634 0.7542 0.6462 0.7573 0.6070
3 Qwen2.5VL-7B-Instruct

Qwen

0.6555 0.6361 0.6648 0.6324 0.8327 0.5114
4 Eagle2-9B

NVIDIA

0.6443 0.6817 0.5698 0.5695 0.8632 0.5375
5 Gemini-1.5-Pro

Google

0.6331 0.6221 0.4689 0.5960 0.8568 0.6217
6 Qwen2VL-7B-Instruct

Alibaba

0.5470 0.5265 0.4494 0.5405 0.6269 0.5915
7 Phi35-Vision-Instruct

Microsoft

0.4749 0.6224 0.2215 0.3711 0.7950 0.3644
8 LLaVA-1.5-7B

Meta

0.2661 0.2161 0.2346 0.2761 0.2402 0.3633