DEVELOPMENT AND ASSESSMENT OF THE PERFORMANCE OF A LARGE LANGUAGE MODEL FOR ADMINISTERING THE SHORT BLESSED TEST
2024

Using a Language Model to Administer a Cognitive Test

Sample size: 57 publication Evidence: moderate

Author Information

Author(s): Jaman Rafeeul, Nessen Sarah, Adjei-Poku Michael, Byerley Joella, Sailors Olivia, Karlawish Jason, O’Brien Kyra, Friedman Ari

Primary Institution: University of Pennsylvania

Hypothesis

Using a large language model to administer the Short Blessed Test will mitigate errors in cognitive impairment assessment.

Conclusion

The study found that a large language model can effectively administer a cognitive test with high specificity, though human confirmation is still necessary.

Supporting Evidence

  • 54.7% of the LLM scores matched the true scores.
  • All errors except 2 were within 3 points of the true score.
  • Specificity for mild cognitive impairment was 100.0%.
  • Negative predictive value was 50%.

Takeaway

Researchers used a computer program to help test people's thinking skills, and it worked pretty well, but doctors still need to check the results.

Methodology

The study involved prompt engineering on OpenAI GPT4o to create an interactive tool that simulated patient responses to the Short Blessed Test.

Potential Biases

The potential for confabulation and poor arithmetic accuracy in the language model.

Limitations

The language model was not trained on definitive assessments of cognitive impairment and may confabulate.

Digital Object Identifier (DOI)

10.1093/geroni/igae098.3948

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication