Comparing Statistical Measures for Protein Sequences
Author Information
Author(s): Dai Qi, Wang Tianming
Primary Institution: Dalian University of Technology
Hypothesis
Can statistical measures based on protein 'sequence space' improve the classification ability of protein sequences?
Conclusion
The study found that exploring the information on 'sequence space' significantly improves the classification abilities of statistical measures for protein comparison.
Supporting Evidence
- Statistical measures based on protein 'sequence space' showed improved classification abilities.
- Alignment-based measures performed better on high redundant data.
- The novel statistical measure gsm.k was found to be the most efficient among the measures tested.
- Phylogenetic analysis confirmed the reliability of the Gdis.k measure.
Takeaway
This study shows that using related protein sequences can help scientists better compare proteins and understand their functions.
Methodology
The study compared various statistical measures for protein sequences using ROC analysis on three different datasets.
Limitations
The study acknowledges that some statistical measures perform poorly on classification tasks, especially with less redundant data.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website