Protein Annotation Using Word Proximity Networks
Author Information
Author(s): Karin Verspoor, Judith Cohn, Cliff Joslyn, Sue Mniszewski, Andreas Rechtsteiner, Luis M Rocha, Tiago Simas
Primary Institution: Los Alamos National Laboratory
Hypothesis
Can word proximity networks improve protein annotation in the Gene Ontology?
Conclusion
The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.
Supporting Evidence
- The method for expanding words associated with GO nodes achieved a 38% success rate in selecting appropriate evidence text.
- The term categorization methodology achieved a precision of 16% for annotation within the correct extended family.
- Subsequent analysis indicated that the precision could be improved with different parameter settings.
Takeaway
The study looked at how to better label proteins using nearby words in documents, and found some methods that worked well.
Methodology
The study used an unsupervised algorithm for word expansion and a categorization methodology for protein annotation.
Potential Biases
The reliance on specific protein names may have limited the effectiveness of the annotation methods.
Limitations
The methods were not very successful on the evidence text component, and there were issues with unknown proteins in the test data.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website