Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evaluation of Accuracy in Text-Only and Image-Based Questions

2024

Performance of ChatGPT-4o on the Japanese Medical Licensing Examination

Sample size: 400 publication Evidence: high

Author Information

Author(s): Yuki Miyazaki, Masahiro Hata, Hisaki Omori, Atsuya Hirashima, Yuta Nakagawa, Mitsuhiro Eto, Shun Takahashi, Manabu Ikeda

Primary Institution: Osaka University Graduate School of Medicine

Hypothesis

ChatGPT-4o would demonstrate high proficiency in answering text- and image-based questions on the Japanese Medical Licensing Examination.

Conclusion

ChatGPT-4o achieved a high overall accuracy of 93.25% on the Japanese Medical Licensing Examination, with no significant difference in performance between text-only and image-based questions.

Supporting Evidence

ChatGPT-4o achieved an overall correct response rate of 93.2% on the 2024 (118th) JMLE.
The model demonstrated a high level of accuracy overall, with no significant difference in performance between text-only and image-based questions.
Common errors included clinical judgment mistakes and prioritization issues.

Takeaway

This study tested a computer program called ChatGPT on a big medical test in Japan, and it did really well, getting most answers right.

Methodology

ChatGPT-4o was used to complete all 400 questions of the 118th Japanese Medical Licensing Examination, focusing on both text-only and image-based questions.

Potential Biases

Potential for clinical errors in judgment, particularly in prioritization tasks.

Limitations

Errors included clinical judgment mistakes and prioritization issues, indicating areas for improvement.

Statistical Information

P-Value

P=.26

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.2196/63129

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home