Automatic Extraction of Gene Ontology Annotation and Its Correlation with Protein Networks
Author Information
Author(s): Daraselia Nikolai, Yuryev Anton, Egorov Sergei, Mazo Ilya, Ispolatov Iaroslav
Primary Institution: Ariadne Genomics, Inc
Hypothesis
The study aims to validate the relationship between protein functional annotations and protein network topology using automatic extraction methods.
Conclusion
The study demonstrates that protein functional annotations extracted by NLP technology enhance the existing Gene Ontology annotation system and correlate with clustering in physical interaction networks.
Supporting Evidence
- The NLP technology extracted over 400,000 protein-GO associations from the literature.
- The precision of the automatic extraction method was found to be over 90%.
- Proteins within biological annotation groups formed significantly denser linked network clusters than expected by chance.
Takeaway
The researchers created a computer program that reads scientific papers to find out what proteins do, and they found that proteins that work together are often mentioned together in the papers.
Methodology
The study used Natural Language Processing to automatically extract protein functional annotations from scientific literature and compared these annotations with existing Gene Ontology data.
Potential Biases
The NLP method may misinterpret ambiguous statements as true associations.
Limitations
The study's NLP method primarily analyzed abstracts, which may miss relevant information found in full texts.
Statistical Information
P-Value
p<0.001
Statistical Significance
p<0.001
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website