Fast Approximate Hierarchical Clustering Using Similarity Heuristics

Sample size: 15521 publication 10 minutes Evidence: high

Author Information

Author(s): Kull Meelis, Vilo Jaak

Primary Institution: Institute of Computer Science, University of Tartu

Hypothesis

Can we develop a faster algorithm for agglomerative hierarchical clustering that maintains clustering quality?

Conclusion

The HappieClust algorithm is effective for large-scale gene expression analysis, providing results significantly faster than traditional methods.

Supporting Evidence

The algorithm achieved clustering results more than an order of magnitude faster than full AHC.
Quality measures indicated that the approximate clustering maintained biological relevance.
Using similarity heuristics improved the clustering quality compared to random distance calculations.

Takeaway

This study created a new way to group similar data quickly, which is really helpful when dealing with lots of information, like genes.

Methodology

The study developed an approximate hierarchical clustering algorithm that uses similarity heuristics to limit the number of pairwise distances calculated.

Potential Biases

Potential bias in the selection of pivots could affect clustering quality.

Limitations

The algorithm's quality depends on the number of distances calculated and the choice of pivots.

Digital Object Identifier (DOI)

10.1186/1756-0381-1-9

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home

Previous Next