Optimal Nearest Centroid Classification from Genomic Data
Author Information
Author(s): Dabney Alan R., Storey John D.
Primary Institution: Texas A&M University
Hypothesis
Can a new feature selection approach improve nearest centroid classification in high-dimensional genomic data?
Conclusion
The proposed method can outperform existing nearest centroid classifiers in clinical classification based on gene-expression microarrays.
Supporting Evidence
- The proposed method incorporates correlation between features for better classification.
- The study demonstrates improvements in prediction accuracy over existing methods.
- A greedy algorithm is used to estimate the optimal feature subset.
Takeaway
This study shows a new way to pick important features from a lot of data to help classify diseases better.
Methodology
The study introduces a greedy algorithm for feature selection based on minimizing misclassification rates in nearest centroid classifiers.
Potential Biases
Potential bias may arise from the estimation of parameters in high-dimensional settings.
Limitations
The practical implementation of the optimal feature selection is limited by the need for estimating class centroids and covariance matrices.
Participant Demographics
The study involved 83 samples from patients with small round blue cell tumors.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website