ReXrank

Open-Source Radiology Report Generation Leaderboard

What is ReXrank?

ReXrank is an open-source leaderboard for AI-powered radiology report generation from chest x-ray images. We're setting a new standard in healthcare AI by providing a comprehensive, objective evaluation framework for cutting-edge models. Our mission is to accelerate progress in this critical field by fostering healthy competition and collaboration among researchers, clinicians, and AI enthusiasts. Using diverse datasets like MIMIC-CXR, IU-Xray, and CheXpert Plus, ReXrank offers a robust benchmarking system that evolves with clinical needs and technological advancements. Our leaderboard showcases top-performing models, driving innovation that could transform patient care and streamline medical workflows.

Join us in shaping the future of AI-assisted radiology. Develop your models, submit your results, and see how you stack up against the best in the field. Together, we can push the boundaries of what's possible in medical imaging and report generation.

Getting Started

To evaluate your models, we made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate.py <path_to_data> <path_to_predictions> .

Once you have a built a model that works to your expectations on the MIMIC-CXR test set, you submit it to get official scores on our Private test set. Here's a tutorial on the submission for a smooth evaluation process.

Submission Tutorial

Please cite if you find our leaderboard helpful.

To keep up to date with major changes to the leaderboard and dataset, please subscribe here !

Leaderboard Overview

Include top models for different datasets. * denotes model trained on this dataset.

Rank MIMIC-CXR IU-Xray CheXpert Plus

1

MedVersa*

Harvard

MedVersa

Harvard

MedVersa

Harvard

2

RaDialog*

TUM

RGRG

TUM

RaDialog

TUM

3

RGRG*

TUM

RadFM

SJTU

CheXpertPlus-mimic

Stanford

4

CheXpertPlus-mimic*

Stanford

Cvt2distilgpt2

CSIRO

RGRG

TUM

5

CheXagent*

Stanford

RaDialog

TUM

Cvt2distilgpt2

CSIRO

6

Cvt2distilgpt2*

CSIRO

CheXpertPlus-mimic

Stanford

CheXagent

Stanford

7

VLCI*

SYSU

CheXagent

Stanford

RadFM

SJTU

8

RadFM*

SJTU

VLCI

SYSU

GPT4V

OpenAI

9

GPT4V

OpenAI

GPT4V

OpenAI

VLCI

SYSU

10

LLM-CXR*

KAIST

LLM-CXR

KAIST

LLM-CXR

KAIST

Leaderboard on MIMIC-CXR Dataset

MIMIC-CXR contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA. We follow the official split of MIMIC-CXR in the following experiments, and report the score on test set. * denotes the model was trained on MIMIC-CXR.

Rank Model FineRadScore RadCliQ-v1 BLEU BertScore SembScore RadGraph

1

2024
CheXagent*

Stanford

2.976 1.437 0.094 0.304 0.331 0.146

2

2024
CheXpertPlus-mimic*

Stanford

2.846 1.369 0.128 0.308 0.365 0.168

3

2023
Cvt2distilgpt2*

CSIRO

2.944 1.498 0.109 0.278 0.315 0.145

4

2024
MedVersa*

Harvard

2.839 1.088 0.193 0.43 0.315 0.273

5

2023
RadFM*

SJTU

3.092 1.604 0.081 0.281 0.245 0.111

6

2023
RaDialog*

TUM

2.92 1.33 0.112 0.322 0.381 0.168

7

2023
RGRG*

TUM

3.142 1.363 0.125 0.323 0.337 0.176

8

2023
VLCI*

SYSU

2.99 1.573 0.124 0.252 0.293 0.136

9

2024
LLM-CXR*

KAIST

3.681 1.983 0.036 0.158 0.15 0.047

10

2024
GPT4V

OpenAI

3.345 1.819 0.065 0.204 0.19 0.085

Leaderboard on IU Xray Dataset

IU Xray contains 7,470 pairs of radiology reports and chest X-rays from Indiana University. We follow the split given by R2Gen. * denotes the model was trained on IU X-ray.

Rank Model RadCliQ-v1 RadCliQ-v0 BLEU BertScore SembScore RadGraph

1

2024
CheXagent

Stanford

1.137 3.272 0.102 0.38 0.494 0.157

2

2024
CheXpertPlus-mimic

Stanford

1.085 3.225 0.173 0.352 0.588 0.159

3

2023
Cvt2distilgpt2

CSIRO

0.956 3.029 0.192 0.39 0.605 0.2

4

2024
MedVersa

Harvard

0.692 2.581 0.195 0.518 0.601 0.244

5

2023
RadFM

SJTU

0.815 2.8 0.196 0.479 0.556 0.234

6

2023
RaDialog

TUM

0.97 3.044 0.175 0.419 0.545 0.198

7

2023
RGRG

TUM

0.803 2.818 0.24 0.447 0.603 0.248

8

2023
VLCI

SYSU

1.18 3.393 0.115 0.332 0.472 0.203

9

2024
LLM-CXR

KAIST

2.072 4.863 0.035 0.175 0.061 0.023

10

2024
GPT4V

OpenAI

1.462 3.856 0.079 0.235 0.403 0.16

Leaderboard on CheXpert Plus Dataset

CheXpert Plus contains 223,228 unique pairs of radiology reports and chest X-rays from 187,711 studies and 64,725 patients. We follow the official split of CheXpert Plus in the following experiments and use the valid set for evaluation. * denotes the model was trained on CheXpert Plus.

Rank Model RadCliQ-v1 RadCliQ-v0 BLEU BertScore SembScore RadGraph

1

2024
CheXagent

Stanford

2.213 5.147 0.077 -0.056 0.284 0.039

2

2024
CheXpertPlus-mimic

Stanford

2.16 5.06 0.088 -0.041 0.306 0.043

3

2023
Cvt2distilgpt2

CSIRO

2.202 5.132 0.083 -0.052 0.288 0.038

4

2024
MedVersa

Harvard

2.031 4.831 0.09 0.013 0.337 0.05

5

2023
RadFM

SJTU

2.251 5.206 0.067 -0.038 0.229 0.027

6

2023
RaDialog

TUM

2.111 4.967 0.086 -0.035 0.348 0.041

7

2023
RGRG

TUM

2.194 5.139 0.103 -0.039 0.263 0.047

8

2023
VLCI

SYSU

2.317 5.34 0.084 -0.092 0.248 0.032

9

2024
LLM-CXR

KAIST

2.369 5.388 0.032 -0.082 0.201 0.013

10

2024
GPT4V

OpenAI

2.316 5.318 0.055 -0.065 0.208 0.028