Small Sample Issues for Microarray-Based Classification
Author Information
Author(s): Edward R. Dougherty
Primary Institution: Texas A&M University
Hypothesis
How do small sample sizes affect the design and performance of classifiers in microarray data analysis?
Conclusion
Small sample sizes significantly complicate the design and error estimation of classifiers based on microarray data.
Supporting Evidence
- Small samples can lead to a large number of gene sets with low error estimates, which may not reflect true classifier performance.
- Error estimation becomes biased and less reliable when using small sample sizes.
- Constrained classifiers can reduce design error but may increase the error of the best possible classifier.
Takeaway
When scientists use small samples to study gene expression, it can lead to mistakes in classifying diseases because there isn't enough data to make accurate predictions.
Methodology
The paper reviews issues related to classifier design, error estimation, and feature selection in the context of small sample sizes in microarray studies.
Potential Biases
The use of small samples can lead to classifiers that appear accurate but are actually misleading due to high variance in error estimates.
Limitations
The review discusses the challenges of small sample sizes, including biased error estimates and the difficulty of selecting features from a large set of variables.
Want to read the original?
Access the complete publication on the publisher's website