Learning Statistical Models for Protein Annotation
Author Information
Author(s): Ray Soumya, Mark Craven
Primary Institution: University of Wisconsin, Madison
Hypothesis
Can statistical models effectively annotate proteins with Gene Ontology codes using biomedical literature?
Conclusion
The system performs well in annotating proteins with Gene Ontology codes, especially when using external data sources.
Supporting Evidence
- The system uses statistical models to predict Gene Ontology codes based on text.
- Using external data sources improves the accuracy of the predictions.
- The system was evaluated against other methods in a competitive setting.
Takeaway
The researchers created a computer program that reads scientific articles to help label proteins with their functions. It works better when it uses extra information from other databases.
Methodology
The system uses statistical analyses and machine learning models to predict protein annotations based on text from articles.
Potential Biases
The reliance on weakly labeled data may introduce bias in the predictions.
Limitations
The system's performance is limited by the quality and quantity of training data available for each Gene Ontology code.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website