Decorrelation of the True and Estimated Classifier Errors in High-Dimensional Settings
2007

Understanding Classifier Errors in High-Dimensional Data

publication

Author Information

Author(s): Blaise Hanczar, Hua Jianping, Dougherty Edward R

Primary Institution: Texas A&M University

Hypothesis

How does the correlation between true and estimated classifier errors affect error estimation in high-dimensional settings?

Conclusion

High dimensionality tends to decrease the correlation between true and estimated errors, impacting error estimation more due to decorrelation than variance.

Supporting Evidence

  • The study shows that true and estimated errors are more correlated when using a known feature set compared to feature selection or using all features.
  • High dimensionality impacts error estimation more through its decorrelating effects than through variance changes.

Takeaway

When trying to predict things using lots of data, sometimes the guesses we make can be really different from the actual answers, especially when we have too much information.

Methodology

The study analyzed the correlation between true and estimated errors using synthetic and real data across various feature-selection methods and classification rules.

Limitations

The study primarily focuses on feature selection, which may limit the applicability of findings to other scenarios.

Digital Object Identifier (DOI)

10.1155/2007/38473

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication