Cataloging Coding Sequence Variations in Human Genome Databases

Sample size: 1069 publication Evidence: high

Author Information

Author(s): Won Hong-Hee, Kim Hee-Jin, Lee Kyung-A, Kim Jong-Won

Primary Institution: Samsung Biomedical Research Institute, Samsung Medical Center, Gangnam-Gu, Seoul, South Korea

Hypothesis

How can we systematically collect and curate variation data in human genome databases?

Conclusion

The study found significant overlap in coding sequence variations between HGMD and dbSNP, highlighting the need for caution in interpreting their phenotypic relevance.

Supporting Evidence

8.11% of coding variations from dbSNP are also found in HGMD.
4.36% of coding variations from HGMD are also found in dbSNP.
The proposed SVM combination outperformed individual prediction programs.

Takeaway

This study looked at changes in human genes and found that many of these changes are recorded in different databases, which can sometimes overlap.

Methodology

The study analyzed coding sequence variations from three databases (HGMD, dbSNP, and HapMap) using bioinformatic programs and a combinatorial approach with Support Vector Machines.

Potential Biases

Potential bias in the interpretation of concurrent variations due to differences in database characteristics.

Limitations

The study primarily focused on three databases and may not represent all coding sequence variations in the human genome.

Participant Demographics

The study analyzed variations from databases without specific demographic information on participants.

Statistical Information

P-Value

p<3.9×10−33

Statistical Significance

p<3.9×10−33

Digital Object Identifier (DOI)

10.1371/journal.pone.0003575

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication

Home