Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition
2008

Predicting Protein Locations Using Machine Learning

Sample size: 9943 publication 10 minutes Evidence: high

Author Information

Author(s): Habib Tanwir, Zhang Chaoyang, Yang Jack Y, Yang Mary Qu, Deng Youping

Primary Institution: University of Southern Mississippi

Hypothesis

Can a support vector machine (SVM) improve the prediction of protein subcellular localization using amino acid composition?

Conclusion

The study demonstrates that using amino acid and amino acid pair composition significantly improves the accuracy of predicting protein subcellular locations.

Supporting Evidence

  • The SVM achieved a total prediction accuracy of 77.0% using amino acid pair composition.
  • Using 12 classes for subcellular locations improved prediction accuracy compared to fewer classes.
  • The RBF kernel outperformed polynomial and linear kernels in accuracy.

Takeaway

Scientists used a computer program to guess where proteins are located in a cell, and they found that using more information about the proteins helps make better guesses.

Methodology

The study used support vector machines (SVM) with amino acid and amino acid pair compositions to classify proteins into subcellular locations.

Potential Biases

Potential bias due to uneven distribution of protein sequences among classes.

Limitations

The dataset needs to be balanced to improve prediction accuracy, and the method may overfit with unbalanced data.

Statistical Information

P-Value

0.001

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2164-9-S1-S16

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication