Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data
Author Information
Author(s): Youting Sun, Ulisses Braga-Neto, Edward R. Dougherty
Primary Institution: Texas A&M University
Hypothesis
How do different missing value imputation methods affect classification accuracy in DNA microarray gene expression data?
Conclusion
Applying missing value imputation can improve classification accuracy under certain conditions, but at high missing value rates, it is not recommended.
Supporting Evidence
- Imputation is beneficial when noise is high and variance is low.
- Classification accuracy improves with imputation at low missing value rates.
- At high missing value rates, imputation methods can degrade performance.
- Different imputation methods yield varying results based on dataset characteristics.
Takeaway
When scientists study gene data, sometimes they have missing information. This study shows that fixing those gaps can help, but if too much is missing, it can make things worse.
Methodology
The study compares six imputation algorithms and their effects on classification accuracy using synthetic and real cancer datasets.
Potential Biases
Potential bias due to the imputation methods affecting the classification results.
Limitations
The study's findings may not generalize to all datasets, especially those with different characteristics than the ones tested.
Participant Demographics
The study involved tumor samples from 295 breast cancer patients and 71 prostate cancer patients.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website