Reproducible Clusters from Microarray Research: Whither?
2006

Evaluating Clustering Algorithms in Microarray Research

Sample size: 37 publication Evidence: low

Author Information

Author(s): Garge Nikhil R, Page Grier P, Sprague Alan P, Gorman Bernard S, Allison David B

Primary Institution: University of Alabama at Birmingham

Hypothesis

The validity of clustering methods should be based on classifications that yield reproducible findings beyond chance levels.

Conclusion

The study found that all four clustering algorithms showed low stability scores, suggesting that microarray datasets may lack natural clustering structure.

Supporting Evidence

  • All four clustering routines show increased stability with larger sample sizes.
  • K-means and SOM showed a gradual increase in stability with increasing sample size.
  • CLARA and Fuzzy C-means yielded low stability scores until sample sizes approached 30.
  • Average stability never exceeded 0.55 for the four clustering routines, even at a sample size of 50.

Takeaway

This study looked at different ways to group genes based on their similarities and found that the methods used often don't give reliable results, especially with small datasets.

Methodology

The study evaluated four clustering algorithms on 37 microarray datasets, measuring the stability of clustering outputs using Cramer's v2.

Limitations

The algorithms studied may not be well suited to producing reliable results, and sample sizes typically used in microarray research may be too small.

Digital Object Identifier (DOI)

10.1186/1471-2105-6-S2-S10

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication