Optimality Driven Nearest Centroid Classification from Genomic Data
2007

Optimal Nearest Centroid Classification from Genomic Data

Sample size: 83 publication Evidence: moderate

Author Information

Author(s): Dabney Alan R., Storey John D.

Primary Institution: Texas A&M University

Hypothesis

Can a new feature selection approach improve nearest centroid classification in high-dimensional genomic data?

Conclusion

The proposed method can outperform existing nearest centroid classifiers in clinical classification based on gene-expression microarrays.

Supporting Evidence

  • The proposed method incorporates correlation between features for better classification.
  • The study demonstrates improvements in prediction accuracy over existing methods.
  • A greedy algorithm is used to estimate the optimal feature subset.

Takeaway

This study shows a new way to pick important features from a lot of data to help classify diseases better.

Methodology

The study introduces a greedy algorithm for feature selection based on minimizing misclassification rates in nearest centroid classifiers.

Potential Biases

Potential bias may arise from the estimation of parameters in high-dimensional settings.

Limitations

The practical implementation of the optimal feature selection is limited by the need for estimating class centroids and covariance matrices.

Participant Demographics

The study involved 83 samples from patients with small round blue cell tumors.

Digital Object Identifier (DOI)

10.1371/journal.pone.0001002

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication