Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability
2008

Pooling Breast Cancer Datasets Improves Classification Performance

Sample size: 947 publication 10 minutes Evidence: high

Author Information

Author(s): van Vliet Martin H, Reyal Fabien, Horlings Hugo M, van de Vijver Marc J, Reinders Marcel JT, Wessels Lodewyk FA

Primary Institution: Delft University of Technology

Hypothesis

Does pooling breast cancer datasets improve classification performance and signature stability?

Conclusion

Pooling datasets results in more accurate classification and a convergence of signature genes.

Supporting Evidence

  • Pooling datasets showed a synergetic effect on classification performance in 73% of cases.
  • A significant positive correlation was found between the number of datasets pooled and the validation performance.
  • The study advocates for analyzing new data within the context of a compendium rather than in isolation.

Takeaway

When scientists combine data from different studies about breast cancer, they can make better predictions about how patients will do.

Methodology

The study used a double loop cross-validation method to evaluate the performance of classifiers trained on pooled datasets.

Potential Biases

Potential biases due to small sample sizes in individual datasets.

Limitations

The study may not account for heterogeneity among datasets, which could affect results.

Participant Demographics

The study analyzed data from multiple breast cancer datasets, but specific demographics were not detailed.

Statistical Information

P-Value

1.1e-24

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/1471-2164-9-375

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication