Understanding Classifier Errors in High-Dimensional Data
Author Information
Author(s): Blaise Hanczar, Hua Jianping, Dougherty Edward R
Primary Institution: Texas A&M University
Hypothesis
How does the correlation between true and estimated classifier errors affect error estimation in high-dimensional settings?
Conclusion
High dimensionality tends to decrease the correlation between true and estimated errors, impacting error estimation more due to decorrelation than variance.
Supporting Evidence
- The study shows that true and estimated errors are more correlated when using a known feature set compared to feature selection or using all features.
- High dimensionality impacts error estimation more through its decorrelating effects than through variance changes.
Takeaway
When trying to predict things using lots of data, sometimes the guesses we make can be really different from the actual answers, especially when we have too much information.
Methodology
The study analyzed the correlation between true and estimated errors using synthetic and real data across various feature-selection methods and classification rules.
Limitations
The study primarily focuses on feature selection, which may limit the applicability of findings to other scenarios.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website