Fast Approximate Hierarchical Clustering Using Similarity Heuristics
Author Information
Author(s): Kull Meelis, Vilo Jaak
Primary Institution: Institute of Computer Science, University of Tartu
Hypothesis
Can we develop a faster algorithm for agglomerative hierarchical clustering that maintains clustering quality?
Conclusion
The HappieClust algorithm is effective for large-scale gene expression analysis, providing results significantly faster than traditional methods.
Supporting Evidence
- The algorithm achieved clustering results more than an order of magnitude faster than full AHC.
- Quality measures indicated that the approximate clustering maintained biological relevance.
- Using similarity heuristics improved the clustering quality compared to random distance calculations.
Takeaway
This study created a new way to group similar data quickly, which is really helpful when dealing with lots of information, like genes.
Methodology
The study developed an approximate hierarchical clustering algorithm that uses similarity heuristics to limit the number of pairwise distances calculated.
Potential Biases
Potential bias in the selection of pivots could affect clustering quality.
Limitations
The algorithm's quality depends on the number of distances calculated and the choice of pivots.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website