Learning Statistical Models for Annotating Proteins with Function Information using Biomedical Text
2005

Learning Statistical Models for Protein Annotation

publication Evidence: moderate

Author Information

Author(s): Ray Soumya, Mark Craven

Primary Institution: University of Wisconsin, Madison

Hypothesis

Can statistical models effectively annotate proteins with Gene Ontology codes using biomedical literature?

Conclusion

The system performs well in annotating proteins with Gene Ontology codes, especially when using external data sources.

Supporting Evidence

  • The system uses statistical models to predict Gene Ontology codes based on text.
  • Using external data sources improves the accuracy of the predictions.
  • The system was evaluated against other methods in a competitive setting.

Takeaway

The researchers created a computer program that reads scientific articles to help label proteins with their functions. It works better when it uses extra information from other databases.

Methodology

The system uses statistical analyses and machine learning models to predict protein annotations based on text from articles.

Potential Biases

The reliance on weakly labeled data may introduce bias in the predictions.

Limitations

The system's performance is limited by the quality and quantity of training data available for each Gene Ontology code.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S1-S18

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication