Evolving hard problems: Generating human genetics datasets with a complex etiology
2011

Generating Complex Genetic Datasets for Disease Research

Sample size: 800 publication Evidence: high

Author Information

Author(s): Daniel S. Himmelstein, Casey S. Greene, Jason H. Moore

Primary Institution: Dartmouth Medical School

Hypothesis

Can we create datasets that reflect complex gene-disease interactions without relying on predefined genetic models?

Conclusion

The study successfully developed a method to generate 76,600 datasets that exhibit complex gene-disease relationships, which are now available for researchers to test new methods.

Supporting Evidence

  • The method generated datasets that successfully minimized first-order associations while maximizing higher-order interactions.
  • The evolution strategy outperformed random searches in generating datasets with complex gene-disease relationships.
  • The datasets created are available for public use, allowing for rigorous testing of new genetic analysis methods.

Takeaway

The researchers created a lot of fake genetic data to help scientists study how genes might work together to cause diseases, without sticking to any specific rules.

Methodology

The study used an evolution strategy to generate datasets with complex gene-disease relationships by optimizing for high-order interactions while minimizing lower-order effects.

Potential Biases

The absence of recombination in the evolutionary algorithm may limit the diversity of generated datasets.

Limitations

The optimal mutation rates for different sample sizes were estimated rather than directly tested, which may not guarantee the best results.

Statistical Information

P-Value

p < 0.001

Statistical Significance

p < 0.001

Digital Object Identifier (DOI)

10.1186/1756-0381-4-21

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication