A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis
2008

A New Method for Protein Homology Detection Using Top-n-grams

Sample size: 4352 publication Evidence: moderate

Author Information

Author(s): Liu Bin, Wang Xiaolong, Lin Lei, Dong Qiwen, Wang Xuan

Primary Institution: Harbin Institute of Technology Shenzhen Graduate School

Hypothesis

Can Top-n-grams improve the performance of protein remote homology detection and fold recognition?

Conclusion

The method based on Top-n-grams significantly outperforms many other methods for protein homology detection.

Supporting Evidence

  • Top-n-grams improve prediction performance for remote homology detection.
  • The method outperforms traditional methods like N-grams and binary profiles.
  • Latent semantic analysis enhances the effectiveness of Top-n-grams.

Takeaway

This study introduces a new way to look at proteins that helps scientists find similarities between them better than before.

Methodology

The study uses Top-n-grams derived from protein sequence frequency profiles and applies SVM classifiers for detection.

Limitations

The method may not perform as well as Profile and SW-PSSM in some cases.

Statistical Information

P-Value

3e-9

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-9-510

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication