Site-specific prediction of O-GlcNAc modification in proteins using evolutionary scale model
2024

Predicting O-GlcNAc Modification Sites in Proteins Using Machine Learning

Sample size: 1100 publication 10 minutes Evidence: moderate

Author Information

Author(s): Khalid Ayesha, Kaleem Afshan, Qazi Wajahat, Abdullah Roheena, Iqtedar Mehwish, Naz Shagufta

Primary Institution: Department of Biotechnology, Lahore College for Women University, Lahore, Pakistan

Hypothesis

Can the ESM-2 model accurately predict O-GlcNAc modification sites in human proteins?

Conclusion

The ESM-2 model effectively predicts O-GlcNAc sites in human proteins with an accuracy of 78.30%.

Supporting Evidence

  • The ESM-2 model achieved an accuracy of 78.30%, recall of 78.30%, precision of 61.31%, and F1-score of 68.74%.
  • Compared to traditional models, the ESM-2 model showed improved performance in predicting O-GlcNAc sites.
  • High false positive and false negative rates were observed, indicating areas for improvement.

Takeaway

This study shows that a computer program can help scientists find specific parts of proteins that are modified, which is important for understanding diseases.

Methodology

The study used the ESM-2 model to predict O-GlcNAc sites based on approximately 1100 O-linked glycoprotein sequences.

Potential Biases

The model showed overfitting in traditional ML algorithms due to limited data.

Limitations

The model may struggle with identifying non-modified sites and has a high false positive rate.

Participant Demographics

Human proteins were the focus of the study.

Statistical Information

P-Value

0.21

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1371/journal.pone.0316215

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication