Which Is Better: Holdout or Full-Sample Classifier Design?

Sample size: 295 publication Evidence: high

Author Information

Author(s): Marcel Brun, Xu Qian, Edward R Dougherty

Primary Institution: Translational Genomics Research Institute

Hypothesis

Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset?

Conclusion

Full-sample design consistently outperforms holdout design in classifier performance.

Supporting Evidence

Full-sample design provides better classifiers than holdout design.
Holdout error estimation generally has higher expected bounds than full-sample error estimators.
The study uses a variety of classification rules including 3-nearest neighbor and linear discriminant analysis.

Takeaway

This study looks at two ways to test how well a computer program can classify data: using all the data at once or splitting it into two parts. It finds that using all the data is usually better.

Methodology

The study uses simulations to compare full-sample and holdout designs across various classification rules and data models.

Potential Biases

Potential bias in error estimation due to the choice of training and testing data splits.

Limitations

The study primarily focuses on simulated data and may not fully capture real-world complexities.

Participant Demographics

The study includes data from 295 breast cancer patients, with 115 in the 'good prognosis' class and 180 in the 'poor prognosis' class.

Digital Object Identifier (DOI)

10.1155/2008/297945

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home

Previous Next