Predicting Protein Locations Using Machine Learning
Author Information
Author(s): Habib Tanwir, Zhang Chaoyang, Yang Jack Y, Yang Mary Qu, Deng Youping
Primary Institution: University of Southern Mississippi
Hypothesis
Can a support vector machine (SVM) improve the prediction of protein subcellular localization using amino acid composition?
Conclusion
The study demonstrates that using amino acid and amino acid pair composition significantly improves the accuracy of predicting protein subcellular locations.
Supporting Evidence
- The SVM achieved a total prediction accuracy of 77.0% using amino acid pair composition.
- Using 12 classes for subcellular locations improved prediction accuracy compared to fewer classes.
- The RBF kernel outperformed polynomial and linear kernels in accuracy.
Takeaway
Scientists used a computer program to guess where proteins are located in a cell, and they found that using more information about the proteins helps make better guesses.
Methodology
The study used support vector machines (SVM) with amino acid and amino acid pair compositions to classify proteins into subcellular locations.
Potential Biases
Potential bias due to uneven distribution of protein sequences among classes.
Limitations
The dataset needs to be balanced to improve prediction accuracy, and the method may overfit with unbalanced data.
Statistical Information
P-Value
0.001
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website