Predicting Protein Function Using Machine Learning
Author Information
Author(s): Al-Shahib Ali, Breitling Rainer, Gilbert David R
Primary Institution: The University of Birmingham
Hypothesis
Can machine learning classifiers effectively predict the function of proteins with unknown functions based on their amino acid sequences?
Conclusion
Machine learning classifiers can successfully predict the function of uncharacterized proteins by leveraging knowledge from proteins with known functions.
Supporting Evidence
- Proteins with known and unknown functions differ significantly.
- Classifiers trained on known proteins can generalize to unknown proteins.
- The median AUC for distinguishing known from unknown proteins is 63%.
- Classifiers perform well across species boundaries with minimal accuracy loss.
Takeaway
Scientists are trying to figure out what new proteins do just by looking at their building blocks, and they found that computers can help guess their jobs pretty well.
Methodology
The study used Support Vector Machine classifiers to analyze amino acid sequences from seven bacterial pathogens.
Potential Biases
Potential bias due to the selection of training and test sets based on sequence similarity.
Limitations
The classifiers may not perform well on highly specialized or wrongly predicted proteins.
Participant Demographics
Proteins from seven bacterial species causing sexually transmitted diseases in humans.
Statistical Information
P-Value
0.02
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website