Using Machine Learning to Find Errors in Protein Function Annotations
Author Information
Author(s): Carson Andorf, Drena Dobbs, Vasant Honavar
Primary Institution: Iowa State University
Hypothesis
Can a machine learning approach effectively identify potential errors in protein function annotations?
Conclusion
The study suggests that most predicted annotations are likely correct and that the machine learning approach can be used to detect errors in GO annotations.
Supporting Evidence
- 201 out of 211 GO annotations were inconsistent with UniProt functions.
- 97% of predicted annotations were consistent with UniProt annotations.
- The machine learning approach outperformed traditional methods in identifying errors.
Takeaway
The researchers used a computer program to check if the names given to proteins were correct, and they found many mistakes that could be fixed.
Methodology
The study employed a machine learning classifier trained on human protein kinases to predict the functional classifications of mouse protein kinases.
Potential Biases
Potential bias due to reliance on automated annotations from databases that may contain errors.
Limitations
The study focused on a small subset of protein kinases, which may not represent all protein annotations.
Participant Demographics
The study analyzed mouse protein kinases, specifically those with RCA evidence codes.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website