Improving Gene Ontology Annotation with Text Mining
Author Information
Author(s): Ehrler Frédéric, Geissbühler Antoine, Jimeno Antonio, Ruch Patrick
Primary Institution: University of Geneva
Hypothesis
Can text mining methods effectively categorize and retrieve passages for Gene Ontology Annotation in data-poor conditions?
Conclusion
The developed system achieved competitive performance in passage retrieval and text categorization, suggesting it could benefit various information extraction tasks.
Supporting Evidence
- The system achieved the best recall and precision combination for passage retrieval and text categorization.
- Text categorization results were far below those in other data-poor text categorization experiments.
- The top proposed term was relevant in less than 20% of cases.
Takeaway
This study shows how computers can help scientists find the right information about proteins, even when there's not a lot of data to work with.
Methodology
The study used a classifier to compute distances between sentences and Gene Ontology categories, evaluating performance based on precision and recall.
Limitations
The text categorization results were significantly lower than those in other data-rich experiments, indicating a need for better methods in data-poor scenarios.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website