Assessing the documentation of publicly available medical image and signal datasets and their impact on bias using the BEAMRAD tool

2024

Assessing Medical Image and Signal Datasets for Bias

Sample size: 37 publication Evidence: moderate

Author Information

Author(s): Maria Galanty, Dieuwertje Luitse, Sijm H. Noteboom, Philip Croon, Alexander P. Vlaar, Thomas Poell, Clara I. Sanchez, Tobias Blanke, Ivana Išgum

Primary Institution: University of Amsterdam

Hypothesis

This study investigates biases stemming from dataset-creation practices in medical imaging and signal datasets.

Conclusion

The study reveals substantial variance in the documentation of medical image and signal datasets, indicating that documentation practices can significantly impact the detection and mitigation of biases.

Supporting Evidence

95% of datasets provide motivation for creation.
Only 5 datasets reported on missing data.
34 out of 37 datasets provided geographical data collection locations.
All ECG datasets reported sample sizes and participant counts.

Takeaway

This study looks at how well medical datasets are documented and finds that many important details are often missing, which can lead to biased results in AI models.

Methodology

The study developed the BEAMRAD tool to evaluate the documentation of medical datasets and conducted a qualitative review of publicly available MRI, CFP, and ECG datasets.

Potential Biases

Insufficient documentation can lead to various biases in AI models, including sampling and selection bias.

Limitations

The study is limited to datasets published between January 2019 and June 2023 and does not cover other data types like X-ray or EEG.

Digital Object Identifier (DOI)

10.1038/s41598-024-83218-5

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home