Evaluating Clustering Algorithms in Microarray Research
Author Information
Author(s): Garge Nikhil R, Page Grier P, Sprague Alan P, Gorman Bernard S, Allison David B
Primary Institution: University of Alabama at Birmingham
Hypothesis
The validity of clustering methods should be based on classifications that yield reproducible findings beyond chance levels.
Conclusion
The study found that all four clustering algorithms showed low stability scores, suggesting that microarray datasets may lack natural clustering structure.
Supporting Evidence
- All four clustering routines show increased stability with larger sample sizes.
- K-means and SOM showed a gradual increase in stability with increasing sample size.
- CLARA and Fuzzy C-means yielded low stability scores until sample sizes approached 30.
- Average stability never exceeded 0.55 for the four clustering routines, even at a sample size of 50.
Takeaway
This study looked at different ways to group genes based on their similarities and found that the methods used often don't give reliable results, especially with small datasets.
Methodology
The study evaluated four clustering algorithms on 37 microarray datasets, measuring the stability of clustering outputs using Cramer's v2.
Limitations
The algorithms studied may not be well suited to producing reliable results, and sample sizes typically used in microarray research may be too small.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website