The Illusion of Distribution-Free Small-Sample Classification in Genomics
Author Information
Author(s): Dougherty Edward R, Zollanvari Amin, Braga-Neto Ulisses M
Primary Institution: Texas A&M University
Hypothesis
Can classification rules in bioinformatics be effectively applied to small labeled data sets without making distributional assumptions?
Conclusion
The study concludes that meaningful distribution-free classification in high-throughput, small-sample biology is an illusion due to the lack of accurate error estimation.
Supporting Evidence
- Classification rules in bioinformatics often ignore the underlying feature-label distribution.
- Error estimation accuracy is crucial for the validity of classifiers.
- Without distributional assumptions, error estimates are essentially meaningless.
Takeaway
This study says that trying to classify data without knowing the underlying patterns is like guessing; it doesn't work well, especially with small amounts of data.
Potential Biases
There is a risk of using classifiers that may not perform well due to the absence of proper distributional assumptions.
Limitations
The study highlights the lack of distributional assumptions and the inadequacy of error estimation in small sample settings.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website