Model order selection for bio-molecular data clustering
2007

Model Order Selection for Bio-Molecular Data Clustering

Sample size: 60 publication 10 minutes Evidence: high

Author Information

Author(s): Bertoni Alberto, Valentini Giorgio

Primary Institution: Università degli Studi di Milano

Hypothesis

Can a stability method based on randomized maps effectively determine the optimal number of clusters in bio-molecular data?

Conclusion

The proposed model order selection methods are competitive with existing algorithms and can detect multiple levels of structure in both synthetic and gene expression data.

Supporting Evidence

  • The proposed methods successfully detected the correct number of clusters in synthetic data.
  • Results from the leukemia data set indicated significant clustering at a high confidence level.
  • The method was able to identify multiple structures in gene expression data.

Takeaway

This study helps scientists find the right number of groups in complex biological data, like genes, by using smart math tricks.

Methodology

The study uses a stability method based on randomized maps and a χ2-based statistical test to assess clustering solutions.

Potential Biases

Potential bias may arise from the choice of random projections and the inherent limitations of clustering algorithms.

Limitations

The method may not perform well if the data does not conform to the assumptions of normality or if the clustering algorithms used are not appropriate.

Participant Demographics

The study involved synthetic data and gene expression data from leukemia and lymphoma samples.

Statistical Information

P-Value

p<0.0001

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-8-S2-S7

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication