Assessing Medical Image and Signal Datasets for Bias
Author Information
Author(s): Maria Galanty, Dieuwertje Luitse, Sijm H. Noteboom, Philip Croon, Alexander P. Vlaar, Thomas Poell, Clara I. Sanchez, Tobias Blanke, Ivana IĆĄgum
Primary Institution: University of Amsterdam
Hypothesis
This study investigates biases stemming from dataset-creation practices in medical imaging and signal datasets.
Conclusion
The study reveals substantial variance in the documentation of medical image and signal datasets, indicating that documentation practices can significantly impact the detection and mitigation of biases.
Supporting Evidence
- 95% of datasets provide motivation for creation.
- Only 5 datasets reported on missing data.
- 34 out of 37 datasets provided geographical data collection locations.
- All ECG datasets reported sample sizes and participant counts.
Takeaway
This study looks at how well medical datasets are documented and finds that many important details are often missing, which can lead to biased results in AI models.
Methodology
The study developed the BEAMRAD tool to evaluate the documentation of medical datasets and conducted a qualitative review of publicly available MRI, CFP, and ECG datasets.
Potential Biases
Insufficient documentation can lead to various biases in AI models, including sampling and selection bias.
Limitations
The study is limited to datasets published between January 2019 and June 2023 and does not cover other data types like X-ray or EEG.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website