Protein annotation as term categorization in the gene ontology using word proximity networks
2005

Protein Annotation Using Word Proximity Networks

Sample size: 1076 publication Evidence: moderate

Author Information

Author(s): Karin Verspoor, Judith Cohn, Cliff Joslyn, Sue Mniszewski, Andreas Rechtsteiner, Luis M Rocha, Tiago Simas

Primary Institution: Los Alamos National Laboratory

Hypothesis

Can word proximity networks improve protein annotation in the Gene Ontology?

Conclusion

The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.

Supporting Evidence

  • The method for expanding words associated with GO nodes achieved a 38% success rate in selecting appropriate evidence text.
  • The term categorization methodology achieved a precision of 16% for annotation within the correct extended family.
  • Subsequent analysis indicated that the precision could be improved with different parameter settings.

Takeaway

The study looked at how to better label proteins using nearby words in documents, and found some methods that worked well.

Methodology

The study used an unsupervised algorithm for word expansion and a categorization methodology for protein annotation.

Potential Biases

The reliance on specific protein names may have limited the effectiveness of the annotation methods.

Limitations

The methods were not very successful on the evidence text component, and there were issues with unknown proteins in the test data.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S20

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication