Cataloging Coding Sequence Variations in Human Genome Databases
Author Information
Author(s): Won Hong-Hee, Kim Hee-Jin, Lee Kyung-A, Kim Jong-Won
Primary Institution: Samsung Biomedical Research Institute, Samsung Medical Center, Gangnam-Gu, Seoul, South Korea
Hypothesis
How can we systematically collect and curate variation data in human genome databases?
Conclusion
The study found significant overlap in coding sequence variations between HGMD and dbSNP, highlighting the need for caution in interpreting their phenotypic relevance.
Supporting Evidence
- 8.11% of coding variations from dbSNP are also found in HGMD.
- 4.36% of coding variations from HGMD are also found in dbSNP.
- The proposed SVM combination outperformed individual prediction programs.
Takeaway
This study looked at changes in human genes and found that many of these changes are recorded in different databases, which can sometimes overlap.
Methodology
The study analyzed coding sequence variations from three databases (HGMD, dbSNP, and HapMap) using bioinformatic programs and a combinatorial approach with Support Vector Machines.
Potential Biases
Potential bias in the interpretation of concurrent variations due to differences in database characteristics.
Limitations
The study primarily focused on three databases and may not represent all coding sequence variations in the human genome.
Participant Demographics
The study analyzed variations from databases without specific demographic information on participants.
Statistical Information
P-Value
p<3.9×10−33
Statistical Significance
p<3.9×10−33
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website