Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability

2008

Pooling Breast Cancer Datasets Improves Classification Performance

Sample size: 947 publication 10 minutes Evidence: high

Author Information

Author(s): van Vliet Martin H, Reyal Fabien, Horlings Hugo M, van de Vijver Marc J, Reinders Marcel JT, Wessels Lodewyk FA

Primary Institution: Delft University of Technology

Does pooling breast cancer datasets improve classification performance and signature stability?

Pooling datasets results in more accurate classification and a convergence of signature genes.

Pooling datasets showed a synergetic effect on classification performance in 73% of cases.
A significant positive correlation was found between the number of datasets pooled and the validation performance.
The study advocates for analyzing new data within the context of a compendium rather than in isolation.

When scientists combine data from different studies about breast cancer, they can make better predictions about how patients will do.

The study used a double loop cross-validation method to evaluate the performance of classifiers trained on pooled datasets.

Potential biases due to small sample sizes in individual datasets.

The study may not account for heterogeneity among datasets, which could affect results.

The study analyzed data from multiple breast cancer datasets, but specific demographics were not detailed.

1.1e-24

p<0.05

Access the complete publication on the publisher's website