Pooling Breast Cancer Datasets Improves Classification Performance
Author Information
Author(s): van Vliet Martin H, Reyal Fabien, Horlings Hugo M, van de Vijver Marc J, Reinders Marcel JT, Wessels Lodewyk FA
Primary Institution: Delft University of Technology
Hypothesis
Does pooling breast cancer datasets improve classification performance and signature stability?
Conclusion
Pooling datasets results in more accurate classification and a convergence of signature genes.
Supporting Evidence
- Pooling datasets showed a synergetic effect on classification performance in 73% of cases.
- A significant positive correlation was found between the number of datasets pooled and the validation performance.
- The study advocates for analyzing new data within the context of a compendium rather than in isolation.
Takeaway
When scientists combine data from different studies about breast cancer, they can make better predictions about how patients will do.
Methodology
The study used a double loop cross-validation method to evaluate the performance of classifiers trained on pooled datasets.
Potential Biases
Potential biases due to small sample sizes in individual datasets.
Limitations
The study may not account for heterogeneity among datasets, which could affect results.
Participant Demographics
The study analyzed data from multiple breast cancer datasets, but specific demographics were not detailed.
Statistical Information
P-Value
1.1e-24
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website