Improving Breast Cancer Data Analysis by Removing Bias
Author Information
Author(s): Andrew H Sims, Graeme J Smethurst, Yvonne Hey, Michal J Okoniewski, Stuart D Pepper, Anthony Howell, Crispin J Miller, Robert B Clarke
Primary Institution: Applied Bioinformatics of Cancer Research Group, Breakthrough Research Unit, Edinburgh Cancer Research Centre
Hypothesis
Can systematic biases in breast cancer gene expression datasets be removed to improve meta-analysis and prognosis prediction?
Conclusion
By reconciling systematic biases, raw data from different gene expression datasets can be integrated, leading to improved statistical power and biological insights.
Supporting Evidence
- The study demonstrated that systematic biases can be removed, allowing for the integration of datasets.
- Combining datasets after bias correction led to improved prognostic predictions.
- The largest gene expression dataset of primary breast tumors was assembled from six studies.
Takeaway
This study shows that when we fix errors in breast cancer data, we can combine information from many studies to get better predictions about how patients will do.
Methodology
The study used Affymetrix data to demonstrate the removal of systematic biases through batch mean-centering, allowing for the integration of multiple datasets.
Potential Biases
Systematic, multiplicative biases were present in all datasets, which could affect the results if not corrected.
Limitations
The study could not use a single definition of follow-up endpoint across datasets, and there was variation in patient age and tumor size.
Participant Demographics
The study included breast cancer patients with varying characteristics, including age and tumor size.
Statistical Information
P-Value
p<0.05
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website