The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis
2008

Improving Breast Cancer Data Analysis by Removing Bias

Sample size: 1107 publication Evidence: high

Author Information

Author(s): Andrew H Sims, Graeme J Smethurst, Yvonne Hey, Michal J Okoniewski, Stuart D Pepper, Anthony Howell, Crispin J Miller, Robert B Clarke

Primary Institution: Applied Bioinformatics of Cancer Research Group, Breakthrough Research Unit, Edinburgh Cancer Research Centre

Hypothesis

Can systematic biases in breast cancer gene expression datasets be removed to improve meta-analysis and prognosis prediction?

Conclusion

By reconciling systematic biases, raw data from different gene expression datasets can be integrated, leading to improved statistical power and biological insights.

Supporting Evidence

  • The study demonstrated that systematic biases can be removed, allowing for the integration of datasets.
  • Combining datasets after bias correction led to improved prognostic predictions.
  • The largest gene expression dataset of primary breast tumors was assembled from six studies.

Takeaway

This study shows that when we fix errors in breast cancer data, we can combine information from many studies to get better predictions about how patients will do.

Methodology

The study used Affymetrix data to demonstrate the removal of systematic biases through batch mean-centering, allowing for the integration of multiple datasets.

Potential Biases

Systematic, multiplicative biases were present in all datasets, which could affect the results if not corrected.

Limitations

The study could not use a single definition of follow-up endpoint across datasets, and there was variation in patient age and tumor size.

Participant Demographics

The study included breast cancer patients with varying characteristics, including age and tumor size.

Statistical Information

P-Value

p<0.05

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1755-8794-1-42

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication