Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
2008

Improving Cancer Diagnosis with New Learning Method

Sample size: 5000 publication Evidence: moderate

Author Information

Author(s): Harris Cole, Ghaffari Noushin

Primary Institution: Exagen Diagnostics, Inc.

Hypothesis

Can combining labeled and unlabeled microarray datasets improve classifier robustness?

Conclusion

The study found that adding unannotated data significantly improves the accuracy of cancer classification models.

Supporting Evidence

  • Adding unlabeled samples increased the mean accuracy of models significantly.
  • In the AML-ALL group, accuracy improved from ~40% to 100% with unlabeled data.
  • In CML, minimum accuracy improved from 0% to 11.11% with unlabeled samples.
  • For DLBCL, maximum accuracy increased from 90% to 100% by adding unlabeled samples.

Takeaway

This study shows that using both labeled and unlabeled data helps make better predictions about cancer.

Methodology

The study used a Genetic Algorithm for feature selection across labeled and unlabeled datasets.

Potential Biases

Potential bias due to the reliance on datasets from different sources.

Limitations

The method may not be applicable to all types of datasets, especially those with very different statistical distributions.

Digital Object Identifier (DOI)

10.1186/1471-2164-9-S2-S7

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication