A topic modeling approach for analyzing and categorizing electronic healthcare documents in Afaan Oromo without label information
2024

Analyzing Healthcare Documents in Afaan Oromo Using Topic Modeling

Sample size: 3000 publication Evidence: moderate

Author Information

Author(s): Dinsa Etana Fikadu, Das Mrinal, Abebe Teklu Urgessa

Primary Institution: Wollega University, Oromia, Ethiopia

Hypothesis

Can topic modeling effectively categorize unstructured health-related documents in Afaan Oromo without label information?

Conclusion

The topic modeling using LDA achieved 79.17% accuracy and 79.66% F1 score for categorizing healthcare documents.

Supporting Evidence

  • The LDA model provided a framework for categorizing documents without requiring predefined labels.
  • The study demonstrated the capability of the model in extracting topics and categorizing documents.
  • Results showed that the model could be applied to classify medical documents effectively.
  • The research highlights the challenges of processing low-resource languages like Afaan Oromo.

Takeaway

This study shows how to group health documents written in Afaan Oromo into topics without needing labels, making it easier to find information.

Methodology

The study used latent dirichlet allocation (LDA) algorithms to extract topics from unstructured health-related documents.

Limitations

The model may struggle with polysemy and high correlation between topics, affecting accuracy and interpretability.

Participant Demographics

The dataset consists of health-related documents in Afaan Oromo, with no specific demographic information provided.

Digital Object Identifier (DOI)

10.1038/s41598-024-83743-3

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication