Normalizing Gene Ontology Terms in Text
Author Information
Author(s): S. Gaudan, A. Jimeno Yepes, V. Lee, D. Rebholz-Schuhmann
Primary Institution: European Bioinformatics Institute
Hypothesis
Can a novel method improve the automatic identification of Gene Ontology (GO) terms in natural language text?
Conclusion
The study presents a new method that integrates evidence, specificity, and proximity to improve the identification of GO terms in text, achieving a precision of 0.34 at a recall of 0.34.
Supporting Evidence
- The method was evaluated on the BioCreAtIvE corpus.
- Precision reached 0.34 at a recall of 0.34 for the identified terms at rank 1.
- The identification of GO terms in the cellular component subbranch was more accurate than in other subbranches.
Takeaway
This study created a new way to find important biology terms in text, making it easier for scientists to understand and use information about genes and proteins.
Methodology
The method evaluates GO terms based on evidence from text, the proximity of words, and the specificity of the terms.
Limitations
The method's performance varies across different branches of Gene Ontology, with lower accuracy for the biological process branch.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website