Efficient computation of absent words in genomic sequences
2008

Efficient Computation of Absent Words in Genomic Sequences

publication Evidence: high

Author Information

Author(s): Herold Julia, Kurtz Stefan, Giegerich Robert

Primary Institution: Center of Biotechnology, Bielefeld University

Hypothesis

Can we develop a more efficient algorithm for computing absent words in genomic sequences?

Conclusion

The new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space.

Supporting Evidence

  • The algorithm computes unwords of human and mouse genomes efficiently.
  • It requires only 2.5 Mb of space for computation.
  • The program can analyze large genomic datasets quickly.

Takeaway

This study created a new computer program that quickly finds words missing from DNA sequences, which can help scientists understand genomes better.

Methodology

The study developed a new algorithm and software for computing absent words without the need for large index structures.

Limitations

The software may have limitations in handling very large datasets beyond the tested genome sizes.

Digital Object Identifier (DOI)

10.1186/1471-2105-9-167

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication