Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach
2007

Using Machine Learning to Find Errors in Protein Function Annotations

Sample size: 211 publication Evidence: moderate

Author Information

Author(s): Carson Andorf, Drena Dobbs, Vasant Honavar

Primary Institution: Iowa State University

Hypothesis

Can a machine learning approach effectively identify potential errors in protein function annotations?

Conclusion

The study suggests that most predicted annotations are likely correct and that the machine learning approach can be used to detect errors in GO annotations.

Supporting Evidence

  • 201 out of 211 GO annotations were inconsistent with UniProt functions.
  • 97% of predicted annotations were consistent with UniProt annotations.
  • The machine learning approach outperformed traditional methods in identifying errors.

Takeaway

The researchers used a computer program to check if the names given to proteins were correct, and they found many mistakes that could be fixed.

Methodology

The study employed a machine learning classifier trained on human protein kinases to predict the functional classifications of mouse protein kinases.

Potential Biases

Potential bias due to reliance on automated annotations from databases that may contain errors.

Limitations

The study focused on a small subset of protein kinases, which may not represent all protein annotations.

Participant Demographics

The study analyzed mouse protein kinases, specifically those with RCA evidence codes.

Digital Object Identifier (DOI)

10.1186/1471-2105-8-284

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication