Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
2011

Improving ChIP-seq Data Analysis with Histone Modification Methods

Sample size: 9882 publication Evidence: high

Author Information

Author(s): Hoang Stephen A, Xu Xiaojiang, Bekiranov Stefan

Primary Institution: University of Virginia Health System

Hypothesis

Can different methods for estimating ChIP-seq enrichment levels improve the performance of data mining and machine learning applications?

Conclusion

Using data across the entire gene body and incorporating the spatial distribution of enrichment improves the accuracy of model predictions in ChIP-seq data analysis.

Supporting Evidence

  • Model-based methods of enrichment estimation improved accuracy over tag counting methods.
  • Incorporating data across the entire gene body enhanced model predictions.
  • The study utilized a dataset of 9882 genes for analysis.

Takeaway

This study shows that to understand how genes are controlled, we need to look at the whole gene, not just the beginning, and how the data is spread out.

Methodology

The study compared various methods for estimating gene-wise ChIP-seq enrichment using the MARS regression algorithm on a dataset of histone modifications.

Potential Biases

Potential biases in enrichment estimation due to genomic coordinate dependence and the presence of outliers.

Limitations

The study may not account for all types of histone modifications and their unique deposition patterns.

Participant Demographics

Human CD4+ T-cells were used for the ChIP-seq dataset.

Statistical Information

P-Value

<0.001

Statistical Significance

p<0.001

Digital Object Identifier (DOI)

10.1186/1756-0500-4-288

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication