Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
2008
Improving Cancer Diagnosis with New Learning Method
Sample size: 5000
publication
Evidence: moderate
Author Information
Author(s): Harris Cole, Ghaffari Noushin
Primary Institution: Exagen Diagnostics, Inc.
Hypothesis
Can combining labeled and unlabeled microarray datasets improve classifier robustness?
Conclusion
The study found that adding unannotated data significantly improves the accuracy of cancer classification models.
Supporting Evidence
- Adding unlabeled samples increased the mean accuracy of models significantly.
- In the AML-ALL group, accuracy improved from ~40% to 100% with unlabeled data.
- In CML, minimum accuracy improved from 0% to 11.11% with unlabeled samples.
- For DLBCL, maximum accuracy increased from 90% to 100% by adding unlabeled samples.
Takeaway
This study shows that using both labeled and unlabeled data helps make better predictions about cancer.
Methodology
The study used a Genetic Algorithm for feature selection across labeled and unlabeled datasets.
Potential Biases
Potential bias due to the reliance on datasets from different sources.
Limitations
The method may not be applicable to all types of datasets, especially those with very different statistical distributions.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website