Automated protein subfamily identification and classification
2007

Automated Protein Subfamily Identification and Classification

Sample size: 515 publication Evidence: high

Author Information

Author(s): Brown Duncan P, Krishnamurthy Nandini, Sjölander Kimmen

Primary Institution: University of California, Berkeley

Hypothesis

Can a computational pipeline improve the accuracy of protein subfamily identification and classification?

Conclusion

The SCI-PHY algorithm significantly enhances the classification of proteins into subfamilies, improving specificity and reducing errors in functional annotation.

Supporting Evidence

  • SCI-PHY subfamilies correspond closely to functional subtypes defined by experts.
  • Subfamily HMMs improve the separation between homologous and non-homologous proteins.
  • Extensive validation shows high specificity in classification of novel sequences.

Takeaway

This study created a computer program that helps scientists figure out what different proteins do by grouping them into families based on their similarities.

Methodology

The study used a computational pipeline that includes de novo subfamily identification and hidden Markov models for classification.

Limitations

The methods may be sensitive to alignment errors and the definitions of subfamilies can be somewhat arbitrary.

Digital Object Identifier (DOI)

10.1371/journal.pcbi.0030160

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication