Predicting O-GlcNAc Modification Sites in Proteins Using Machine Learning
Author Information
Author(s): Khalid Ayesha, Kaleem Afshan, Qazi Wajahat, Abdullah Roheena, Iqtedar Mehwish, Naz Shagufta
Primary Institution: Department of Biotechnology, Lahore College for Women University, Lahore, Pakistan
Hypothesis
Can the ESM-2 model accurately predict O-GlcNAc modification sites in human proteins?
Conclusion
The ESM-2 model effectively predicts O-GlcNAc sites in human proteins with an accuracy of 78.30%.
Supporting Evidence
- The ESM-2 model achieved an accuracy of 78.30%, recall of 78.30%, precision of 61.31%, and F1-score of 68.74%.
- Compared to traditional models, the ESM-2 model showed improved performance in predicting O-GlcNAc sites.
- High false positive and false negative rates were observed, indicating areas for improvement.
Takeaway
This study shows that a computer program can help scientists find specific parts of proteins that are modified, which is important for understanding diseases.
Methodology
The study used the ESM-2 model to predict O-GlcNAc sites based on approximately 1100 O-linked glycoprotein sequences.
Potential Biases
The model showed overfitting in traditional ML algorithms due to limited data.
Limitations
The model may struggle with identifying non-modified sites and has a high false positive rate.
Participant Demographics
Human proteins were the focus of the study.
Statistical Information
P-Value
0.21
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website