A topic modeling approach for analyzing and categorizing electronic healthcare documents in Afaan Oromo without label information

2024

Analyzing Healthcare Documents in Afaan Oromo Using Topic Modeling

Sample size: 3000 publication Evidence: moderate

Author Information

Author(s): Dinsa Etana Fikadu, Das Mrinal, Abebe Teklu Urgessa

Primary Institution: Wollega University, Oromia, Ethiopia

Can topic modeling effectively categorize unstructured health-related documents in Afaan Oromo without label information?

The topic modeling using LDA achieved 79.17% accuracy and 79.66% F1 score for categorizing healthcare documents.

The LDA model provided a framework for categorizing documents without requiring predefined labels.
The study demonstrated the capability of the model in extracting topics and categorizing documents.
Results showed that the model could be applied to classify medical documents effectively.
The research highlights the challenges of processing low-resource languages like Afaan Oromo.

This study shows how to group health documents written in Afaan Oromo into topics without needing labels, making it easier to find information.

The study used latent dirichlet allocation (LDA) algorithms to extract topics from unstructured health-related documents.

The model may struggle with polysemy and high correlation between topics, affecting accuracy and interpretability.

The dataset consists of health-related documents in Afaan Oromo, with no specific demographic information provided.

Access the complete publication on the publisher's website