Clustering cancer gene expression data: a comparative study
2008

Comparative Study of Clustering Methods for Cancer Gene Expression Data

Sample size: 35 publication 10 minutes Evidence: high

Author Information

Author(s): de Souto Marcilio CP, Costa Ivan G, Araujo Daniel SA, Ludermir Teresa B, Schliep Alexander

Primary Institution: Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany

Hypothesis

Which clustering methods perform best for analyzing cancer gene expression data?

Conclusion

The finite mixture of Gaussians and k-means methods showed the best performance in recovering the true structure of cancer gene expression data sets.

Supporting Evidence

  • The finite mixture of Gaussians and k-means methods exhibited the best performance in recovering the true structure of the data sets.
  • Hierarchical methods showed poorer recovery performance compared to other methods evaluated.
  • A common group of benchmark data sets was provided for future comparisons of clustering methods.

Takeaway

This study looked at different ways to group cancer data and found that some methods work better than others for understanding cancer types.

Methodology

The study compared seven clustering methods and four proximity measures using 35 cancer gene expression data sets.

Potential Biases

The reliance on specific clustering methods may introduce bias in interpreting the results.

Limitations

The study primarily focused on clustering methods and did not explore other potential factors affecting clustering performance.

Participant Demographics

The data sets included various cancer types from different tissues, but specific demographic details were not provided.

Statistical Information

P-Value

p<0.05

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2105-9-497

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication