Identifying Harmful Genetic Variations Using Protein Sequences
Author Information
Author(s): Hu Jing, Yan Changhui
Primary Institution: Utah State University
Hypothesis
Can we classify non-synonymous single nucleotide polymorphisms (SAPs) into disease-causing and neutral mutations using only protein sequence information?
Conclusion
The proposed method is a useful tool for the classification of SAPs, especially when the structure of the protein is not available.
Supporting Evidence
- The method achieved 82.6% accuracy in classifying SAPs.
- Using selected features, the decision tree method showed a Matthews Correlation Coefficient (MCC) of 0.607 in cross-validation.
- The method allows reliable predictions even when protein structures are not available.
Takeaway
This study created a way to tell if certain genetic changes are harmful just by looking at the protein sequences, even when we don't have the protein's structure.
Methodology
The study used a decision tree algorithm to classify SAPs based on 686 features derived from protein sequences.
Limitations
The method may not perform as well on SAPs with structural information available, as it relies solely on sequence data.
Statistical Information
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website