SAMQA: A Tool for Quality Assurance in High-Throughput Sequencing Data
Author Information
Author(s): Thomas Robinson, Sarah Killcoyne, Ryan Bressler, John Boyle
Primary Institution: Institute for Systems Biology
Hypothesis
Can the SAMQA tool effectively identify errors in population-scale sequence data?
Conclusion
The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences.
Supporting Evidence
- SAMQA was used on 324 exome and 42 full genome samples from COAD/READ cancer data.
- The tool identified poor quality data prior to secondary analysis in significantly less time using high-performance computing.
- Technical tests automatically rejected samples that failed quality checks.
Takeaway
SAMQA is a tool that helps scientists check if the DNA sequencing data is good enough to use, making sure there are no mistakes before they analyze it.
Methodology
The SAMQA tool uses a high-performance computing framework to run a series of technical and biological tests on sequenced read data.
Potential Biases
Biases may be introduced at multiple levels, including sample collection and sequencing methods.
Limitations
The tool may not address all types of errors and relies on expert analysis for biological plausibility.
Participant Demographics
Samples were taken from cancer genome data from The Cancer Genome Atlas, including various cancer types.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website