Improving DNA Sequencing Data Analysis
Author Information
Author(s): Rougemont Jacques, Amzallag Arnaud, Iseli Christian, Farinelli Laurent, Xenarios Ioannis, Naef Felix
Primary Institution: Ecole Polytechnique Fédérale de Lausanne (EPFL)
Hypothesis
Can a novel base calling algorithm improve the accuracy and efficiency of Solexa sequencing data processing?
Conclusion
The proposed method enhances genome coverage and increases the number of usable tags by an average of 15% compared to Solexa's pipeline.
Supporting Evidence
- The new algorithm increases the specific mapping of tags onto reference genomes by about 15%.
- The method is implemented in a freely distributed software called Rolexa.
- The study shows that the Rolexa base-calling improves coverage significantly among low-quality sequences.
Takeaway
This study created a new way to read DNA sequences that helps scientists get more useful information from their data, making it easier to study genes.
Methodology
The study developed a base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and optimize sub-tags.
Potential Biases
Potential biases in the data acquisition process and the algorithm's assumptions may affect results.
Limitations
The method does not completely resolve the imbalance between complementary bases in the sequencing data.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website