Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
2008

Identifying Harmful Genetic Variations Using Protein Sequences

Sample size: 3438 publication Evidence: moderate

Author Information

Author(s): Hu Jing, Yan Changhui

Primary Institution: Utah State University

Hypothesis

Can we classify non-synonymous single nucleotide polymorphisms (SAPs) into disease-causing and neutral mutations using only protein sequence information?

Conclusion

The proposed method is a useful tool for the classification of SAPs, especially when the structure of the protein is not available.

Supporting Evidence

  • The method achieved 82.6% accuracy in classifying SAPs.
  • Using selected features, the decision tree method showed a Matthews Correlation Coefficient (MCC) of 0.607 in cross-validation.
  • The method allows reliable predictions even when protein structures are not available.

Takeaway

This study created a way to tell if certain genetic changes are harmful just by looking at the protein sequences, even when we don't have the protein's structure.

Methodology

The study used a decision tree algorithm to classify SAPs based on 686 features derived from protein sequences.

Limitations

The method may not perform as well on SAPs with structural information available, as it relies solely on sequence data.

Statistical Information

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-9-297

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication