Comparative Study of Clustering Methods for Cancer Gene Expression Data
Author Information
Author(s): de Souto Marcilio CP, Costa Ivan G, Araujo Daniel SA, Ludermir Teresa B, Schliep Alexander
Primary Institution: Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
Hypothesis
Which clustering methods perform best for analyzing cancer gene expression data?
Conclusion
The finite mixture of Gaussians and k-means methods showed the best performance in recovering the true structure of cancer gene expression data sets.
Supporting Evidence
- The finite mixture of Gaussians and k-means methods exhibited the best performance in recovering the true structure of the data sets.
- Hierarchical methods showed poorer recovery performance compared to other methods evaluated.
- A common group of benchmark data sets was provided for future comparisons of clustering methods.
Takeaway
This study looked at different ways to group cancer data and found that some methods work better than others for understanding cancer types.
Methodology
The study compared seven clustering methods and four proximity measures using 35 cancer gene expression data sets.
Potential Biases
The reliance on specific clustering methods may introduce bias in interpreting the results.
Limitations
The study primarily focused on clustering methods and did not explore other potential factors affecting clustering performance.
Participant Demographics
The data sets included various cancer types from different tissues, but specific demographic details were not provided.
Statistical Information
P-Value
p<0.05
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website