Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
2008

Clustering DNA Methylation Data Using a New Algorithm

Sample size: 217 publication Evidence: high

Author Information

Author(s): E. Houseman, B. C. Christensen, R. F. Yeh, C. J. Marsit, M. R. Karagas, M. Wrensch, H. H. Nelson, J. Wiemels, S. Zheng, J. K. Wiencke, K. T. Kelsey

Primary Institution: Harvard School of Public Health

Hypothesis

How can we effectively cluster DNA methylation data from high-dimensional arrays?

Conclusion

The proposed method is an effective and computationally efficient way to cluster DNA methylation data.

Supporting Evidence

  • The proposed method outperformed nonparametric clustering approaches in simulations.
  • The method was computationally efficient compared to conventional mixture model methods.
  • Clusters identified were associated with tissue type and age.

Takeaway

The researchers created a new way to group DNA data that helps scientists understand how genes are turned on or off in different tissues.

Methodology

The study used a recursive-partitioning algorithm based on a beta mixture model to cluster DNA methylation data from normal tissue samples.

Potential Biases

Potential biases from plate-to-plate variability were noted, but not fully addressed.

Limitations

The study did not normalize different plates used in laboratory analysis, which may introduce variability.

Participant Demographics

The study included 217 normal tissue samples from various types, including blood, brain, and placenta, with a mix of adult and newborn samples.

Statistical Information

P-Value

<0.0001

Statistical Significance

p<0.0001

Digital Object Identifier (DOI)

10.1186/1471-2105-9-365

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication