MACSE: A Tool for Aligning Coding Sequences with Frameshifts and Stop Codons
Author Information
Author(s): Vincent Ranwez, Sébastien Harispe, Frédéric Delsuc, Emmanuel J. P. Douzery
Primary Institution: Institut des Sciences de l'Evolution, UMR5554-CNRS, Université Montpellier II, Montpellier, France
Hypothesis
Can we develop an algorithm that aligns nucleotide sequences containing open reading frames while accounting for frameshifts and stop codons?
Conclusion
The MACSE program effectively aligns protein-coding sequences, including those with frameshifts and stop codons, improving the accuracy of multiple sequence alignments.
Supporting Evidence
- MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences without disrupting the underlying codon structure.
- MACSE has been shown to detect undocumented frameshifts in public database sequences.
- The program can align high-throughput sequencing reads against reference coding sequences effectively.
Takeaway
MACSE is a computer program that helps scientists line up DNA sequences, even when there are mistakes in the sequences, so they can study them better.
Methodology
The study presents an algorithm that extends the classical Needleman-Wunsch algorithm to accommodate sequencing errors and biological deviations, implemented in the MACSE program for multiple sequence alignment.
Limitations
The algorithm may not handle unexpected frameshifting substitutions optimally, and the computational time is longer compared to some existing methods.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website