Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
2009

Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data

Sample size: 295 publication 10 minutes Evidence: moderate

Author Information

Author(s): Youting Sun, Ulisses Braga-Neto, Edward R. Dougherty

Primary Institution: Texas A&M University

Hypothesis

How do different missing value imputation methods affect classification accuracy in DNA microarray gene expression data?

Conclusion

Applying missing value imputation can improve classification accuracy under certain conditions, but at high missing value rates, it is not recommended.

Supporting Evidence

  • Imputation is beneficial when noise is high and variance is low.
  • Classification accuracy improves with imputation at low missing value rates.
  • At high missing value rates, imputation methods can degrade performance.
  • Different imputation methods yield varying results based on dataset characteristics.

Takeaway

When scientists study gene data, sometimes they have missing information. This study shows that fixing those gaps can help, but if too much is missing, it can make things worse.

Methodology

The study compares six imputation algorithms and their effects on classification accuracy using synthetic and real cancer datasets.

Potential Biases

Potential bias due to the imputation methods affecting the classification results.

Limitations

The study's findings may not generalize to all datasets, especially those with different characteristics than the ones tested.

Participant Demographics

The study involved tumor samples from 295 breast cancer patients and 71 prostate cancer patients.

Digital Object Identifier (DOI)

10.1155/2009/504069

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication