Improving Prediction of Peptide-MHC Class I Binding
Author Information
Author(s): Ana Paula Sales, Georgia D. Tomaras, Thomas B. Kepler
Primary Institution: Duke University
Hypothesis
Can cost-sensitive methods improve the prediction accuracy of peptide-MHC class I binding using unbalanced datasets?
Conclusion
Using cost-balancing techniques significantly improves the accuracy of decision trees in predicting peptide-MHC class I binding.
Supporting Evidence
- Highly unbalanced training sets can reduce the accuracy of classifier predictions.
- Cost-sensitive methods significantly improve accuracy compared to cost-insensitive classifiers.
- Resampling methods do not consistently improve classifier performance.
Takeaway
This study shows that when trying to predict how well certain peptides bind to immune molecules, using special methods to balance the data can help make better predictions.
Methodology
The study developed a decision-theoretic framework to construct cost-sensitive trees and compared resampling and cost-sensitive methods to address data imbalance.
Potential Biases
The dataset was heavily biased towards nonbinders, which could affect the generalizability of the findings.
Limitations
The study primarily focused on decision trees and may not generalize to all types of classifiers.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website