Assessing the Universal Similarity Metric for Biological Data
Author Information
Author(s): Paolo Ferragina, Raffaele Giancarlo, Valentina Greco, Giovanni Manzini, Gabriel Valiente
Hypothesis
Can the Universal Similarity Metric (USM) and its approximations effectively classify biological sequences and structures?
Conclusion
The USM methodology is robust and competitive with existing techniques for classifying biological data.
Supporting Evidence
- UCD and NCD yield similar results, while CD performs worse.
- UPGMA outperforms NJ in classification tasks.
- PPMd and Gencompress are the best performing compression algorithms.
Takeaway
This study tested a new way to compare biological sequences using a method called the Universal Similarity Metric, which works well even with large data sets.
Methodology
The study used 25 compression algorithms and three approximations of USM to classify biological sequences across six relevant data sets.
Limitations
Only a few data compressors were tested, and comparisons among USM approximations and existing methods were limited.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website