Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'
2008

Comparing Statistical Measures for Protein Sequences

Sample size: 121 publication Evidence: moderate

Author Information

Author(s): Dai Qi, Wang Tianming

Primary Institution: Dalian University of Technology

Hypothesis

Can statistical measures based on protein 'sequence space' improve the classification ability of protein sequences?

Conclusion

The study found that exploring the information on 'sequence space' significantly improves the classification abilities of statistical measures for protein comparison.

Supporting Evidence

  • Statistical measures based on protein 'sequence space' showed improved classification abilities.
  • Alignment-based measures performed better on high redundant data.
  • The novel statistical measure gsm.k was found to be the most efficient among the measures tested.
  • Phylogenetic analysis confirmed the reliability of the Gdis.k measure.

Takeaway

This study shows that using related protein sequences can help scientists better compare proteins and understand their functions.

Methodology

The study compared various statistical measures for protein sequences using ROC analysis on three different datasets.

Limitations

The study acknowledges that some statistical measures perform poorly on classification tasks, especially with less redundant data.

Digital Object Identifier (DOI)

10.1186/1471-2105-9-394

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication