External validation of AI-based scoring systems in the ICU
Author Information
Author(s): Rockenschaub Patrick, Akay Ela Marie, Carlisle Benjamin Gregory, Hilbert Adam, Wendland Joshua, Meyer-Eschenbach Falk, Näher Anatol-Fiete, Frey Dietmar, Madai Vince Istvan
Primary Institution: Charité - Universitätsmedizin Berlin
Hypothesis
How frequently is external validation performed for machine learning-based risk scores in ICU settings, and how does their performance change in external data?
Conclusion
External validation of machine learning-based scoring systems in the ICU is increasing but remains uncommon, with performance generally lower in external data.
Supporting Evidence
- 14.7% of studies were externally validated, increasing to 23.9% by 2023.
- On average, AUROC was reduced by -0.037 in external data.
- 49.5% of validated studies showed a performance reduction of more than 0.05.
Takeaway
This study looked at how often hospitals check if their AI tools for predicting patient problems work well in different places, and found that they often don't work as well outside the original hospital.
Methodology
Systematic review and meta-analysis of studies using machine learning to predict deterioration in ICU patients, assessing external validation and performance changes.
Potential Biases
Potential overfitting due to reliance on specific datasets and differences in patient populations.
Limitations
The study primarily relied on a few datasets for external validation, which may not represent the broader ICU population.
Participant Demographics
Studies included adult ICU patients from various hospitals, primarily using US data.
Statistical Information
P-Value
p<0.001
Confidence Interval
95% CI -0.052 to -0.027
Statistical Significance
p<0.001
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website