Performance of ChatGPT-4o on the Japanese Medical Licensing Examination
Author Information
Author(s): Yuki Miyazaki, Masahiro Hata, Hisaki Omori, Atsuya Hirashima, Yuta Nakagawa, Mitsuhiro Eto, Shun Takahashi, Manabu Ikeda
Primary Institution: Osaka University Graduate School of Medicine
Hypothesis
ChatGPT-4o would demonstrate high proficiency in answering text- and image-based questions on the Japanese Medical Licensing Examination.
Conclusion
ChatGPT-4o achieved a high overall accuracy of 93.25% on the Japanese Medical Licensing Examination, with no significant difference in performance between text-only and image-based questions.
Supporting Evidence
- ChatGPT-4o achieved an overall correct response rate of 93.2% on the 2024 (118th) JMLE.
- The model demonstrated a high level of accuracy overall, with no significant difference in performance between text-only and image-based questions.
- Common errors included clinical judgment mistakes and prioritization issues.
Takeaway
This study tested a computer program called ChatGPT on a big medical test in Japan, and it did really well, getting most answers right.
Methodology
ChatGPT-4o was used to complete all 400 questions of the 118th Japanese Medical Licensing Examination, focusing on both text-only and image-based questions.
Potential Biases
Potential for clinical errors in judgment, particularly in prioritization tasks.
Limitations
Errors included clinical judgment mistakes and prioritization issues, indicating areas for improvement.
Statistical Information
P-Value
P=.26
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website