Using a Language Model to Administer a Cognitive Test
Author Information
Author(s): Jaman Rafeeul, Nessen Sarah, Adjei-Poku Michael, Byerley Joella, Sailors Olivia, Karlawish Jason, O’Brien Kyra, Friedman Ari
Primary Institution: University of Pennsylvania
Hypothesis
Using a large language model to administer the Short Blessed Test will mitigate errors in cognitive impairment assessment.
Conclusion
The study found that a large language model can effectively administer a cognitive test with high specificity, though human confirmation is still necessary.
Supporting Evidence
- 54.7% of the LLM scores matched the true scores.
- All errors except 2 were within 3 points of the true score.
- Specificity for mild cognitive impairment was 100.0%.
- Negative predictive value was 50%.
Takeaway
Researchers used a computer program to help test people's thinking skills, and it worked pretty well, but doctors still need to check the results.
Methodology
The study involved prompt engineering on OpenAI GPT4o to create an interactive tool that simulated patient responses to the Short Blessed Test.
Potential Biases
The potential for confabulation and poor arithmetic accuracy in the language model.
Limitations
The language model was not trained on definitive assessments of cognitive impairment and may confabulate.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website