Model Order Selection for Bio-Molecular Data Clustering
Author Information
Author(s): Bertoni Alberto, Valentini Giorgio
Primary Institution: Università degli Studi di Milano
Hypothesis
Can a stability method based on randomized maps effectively determine the optimal number of clusters in bio-molecular data?
Conclusion
The proposed model order selection methods are competitive with existing algorithms and can detect multiple levels of structure in both synthetic and gene expression data.
Supporting Evidence
- The proposed methods successfully detected the correct number of clusters in synthetic data.
- Results from the leukemia data set indicated significant clustering at a high confidence level.
- The method was able to identify multiple structures in gene expression data.
Takeaway
This study helps scientists find the right number of groups in complex biological data, like genes, by using smart math tricks.
Methodology
The study uses a stability method based on randomized maps and a χ2-based statistical test to assess clustering solutions.
Potential Biases
Potential bias may arise from the choice of random projections and the inherent limitations of clustering algorithms.
Limitations
The method may not perform well if the data does not conform to the assumptions of normality or if the clustering algorithms used are not appropriate.
Participant Demographics
The study involved synthetic data and gene expression data from leukemia and lymphoma samples.
Statistical Information
P-Value
p<0.0001
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website