Impact of Missing Data on Genetic Analysis
Author Information
Author(s): Pamela A McCaskie, Kim W Carter, Simon R McCaskie, Lyle J Palmer
Primary Institution: Western Australian Institute for Medical Research, University of Western Australia
Hypothesis
Increasing proportions of missing data would result in a decreased ability to detect regions of linkage disequilibrium and a concomitant reduction in power to detect haplotype associations.
Conclusion
Ignoring individuals with missing data affects the number of regions of linkage disequilibrium detected, but haplotype analysis remains robust to missing data up to a level of 10%.
Supporting Evidence
- LD analysis showed that ignoring missing data affects the accuracy of mapping regions of LD.
- The number of pair-wise comparisons exhibiting strong LD tended to increase as missing data increased.
- Haplotype analysis was found to be robust to missing data up to a level of 10%.
Takeaway
If we lose some information when studying genes, it can change how we see connections between them, but we can still find useful patterns even with some missing pieces.
Methodology
Linkage disequilibrium was assessed using JLIN software, and haplotype analysis was performed using SIMHAP with varying levels of missing data.
Limitations
The study primarily focused on simulated data, which may not fully represent real-world scenarios.
Statistical Information
P-Value
p<0.05 for some haplotypes
Confidence Interval
95% confidence intervals increased with missing data
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website