Analyzing Healthcare Documents in Afaan Oromo Using Topic Modeling
Author Information
Author(s): Dinsa Etana Fikadu, Das Mrinal, Abebe Teklu Urgessa
Primary Institution: Wollega University, Oromia, Ethiopia
Hypothesis
Can topic modeling effectively categorize unstructured health-related documents in Afaan Oromo without label information?
Conclusion
The topic modeling using LDA achieved 79.17% accuracy and 79.66% F1 score for categorizing healthcare documents.
Supporting Evidence
- The LDA model provided a framework for categorizing documents without requiring predefined labels.
- The study demonstrated the capability of the model in extracting topics and categorizing documents.
- Results showed that the model could be applied to classify medical documents effectively.
- The research highlights the challenges of processing low-resource languages like Afaan Oromo.
Takeaway
This study shows how to group health documents written in Afaan Oromo into topics without needing labels, making it easier to find information.
Methodology
The study used latent dirichlet allocation (LDA) algorithms to extract topics from unstructured health-related documents.
Limitations
The model may struggle with polysemy and high correlation between topics, affecting accuracy and interpretability.
Participant Demographics
The dataset consists of health-related documents in Afaan Oromo, with no specific demographic information provided.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website