SAMQA: error classification and validation of high-throughput sequenced read data
2011

SAMQA: A Tool for Quality Assurance in High-Throughput Sequencing Data

Sample size: 366 publication Evidence: moderate

Author Information

Author(s): Thomas Robinson, Sarah Killcoyne, Ryan Bressler, John Boyle

Primary Institution: Institute for Systems Biology

Hypothesis

Can the SAMQA tool effectively identify errors in population-scale sequence data?

Conclusion

The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences.

Supporting Evidence

  • SAMQA was used on 324 exome and 42 full genome samples from COAD/READ cancer data.
  • The tool identified poor quality data prior to secondary analysis in significantly less time using high-performance computing.
  • Technical tests automatically rejected samples that failed quality checks.

Takeaway

SAMQA is a tool that helps scientists check if the DNA sequencing data is good enough to use, making sure there are no mistakes before they analyze it.

Methodology

The SAMQA tool uses a high-performance computing framework to run a series of technical and biological tests on sequenced read data.

Potential Biases

Biases may be introduced at multiple levels, including sample collection and sequencing methods.

Limitations

The tool may not address all types of errors and relies on expert analysis for biological plausibility.

Participant Demographics

Samples were taken from cancer genome data from The Cancer Genome Atlas, including various cancer types.

Digital Object Identifier (DOI)

10.1186/1471-2164-12-419

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication