Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
2007

Automatic Extraction of Gene Ontology Annotation and Its Correlation with Protein Networks

Sample size: 407044 publication Evidence: high

Author Information

Author(s): Daraselia Nikolai, Yuryev Anton, Egorov Sergei, Mazo Ilya, Ispolatov Iaroslav

Primary Institution: Ariadne Genomics, Inc

Hypothesis

The study aims to validate the relationship between protein functional annotations and protein network topology using automatic extraction methods.

Conclusion

The study demonstrates that protein functional annotations extracted by NLP technology enhance the existing Gene Ontology annotation system and correlate with clustering in physical interaction networks.

Supporting Evidence

  • The NLP technology extracted over 400,000 protein-GO associations from the literature.
  • The precision of the automatic extraction method was found to be over 90%.
  • Proteins within biological annotation groups formed significantly denser linked network clusters than expected by chance.

Takeaway

The researchers created a computer program that reads scientific papers to find out what proteins do, and they found that proteins that work together are often mentioned together in the papers.

Methodology

The study used Natural Language Processing to automatically extract protein functional annotations from scientific literature and compared these annotations with existing Gene Ontology data.

Potential Biases

The NLP method may misinterpret ambiguous statements as true associations.

Limitations

The study's NLP method primarily analyzed abstracts, which may miss relevant information found in full texts.

Statistical Information

P-Value

p<0.001

Statistical Significance

p<0.001

Digital Object Identifier (DOI)

10.1186/1471-2105-8-243

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication