The effect of missing data on linkage disequilibrium mapping and haplotype association analysis in the GAW14 simulated datasets
2005

Impact of Missing Data on Genetic Analysis

publication Evidence: moderate

Author Information

Author(s): Pamela A McCaskie, Kim W Carter, Simon R McCaskie, Lyle J Palmer

Primary Institution: Western Australian Institute for Medical Research, University of Western Australia

Hypothesis

Increasing proportions of missing data would result in a decreased ability to detect regions of linkage disequilibrium and a concomitant reduction in power to detect haplotype associations.

Conclusion

Ignoring individuals with missing data affects the number of regions of linkage disequilibrium detected, but haplotype analysis remains robust to missing data up to a level of 10%.

Supporting Evidence

  • LD analysis showed that ignoring missing data affects the accuracy of mapping regions of LD.
  • The number of pair-wise comparisons exhibiting strong LD tended to increase as missing data increased.
  • Haplotype analysis was found to be robust to missing data up to a level of 10%.

Takeaway

If we lose some information when studying genes, it can change how we see connections between them, but we can still find useful patterns even with some missing pieces.

Methodology

Linkage disequilibrium was assessed using JLIN software, and haplotype analysis was performed using SIMHAP with varying levels of missing data.

Limitations

The study primarily focused on simulated data, which may not fully represent real-world scenarios.

Statistical Information

P-Value

p<0.05 for some haplotypes

Confidence Interval

95% confidence intervals increased with missing data

Digital Object Identifier (DOI)

10.1186/1471-2156-6-S1-S151

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication