Gk Arrays: A New Way to Index Large Read Collections
Author Information
Author(s): Philippe Nicolas, Mikaël Salson, Thierry Lecroq, Martine Léonard, Thérèse Commes, Eric Rivals
Primary Institution: LIRMM, UMR 5506, CNRS and Université de Montpellier
Hypothesis
The study proposes a new data structure, Gk arrays, to efficiently index large collections of reads for bioinformatics analysis.
Conclusion
Gk arrays provide a versatile and efficient method for read analysis, requiring less memory and allowing for faster queries compared to existing methods.
Supporting Evidence
- Gk arrays can handle larger read collections with less memory.
- The structure allows for fast querying of k-mers in various read analysis contexts.
- Gk arrays are available as a C++ library under a GPL compliant license.
Takeaway
This study introduces a new tool that helps scientists quickly find information in large sets of DNA sequences, making it easier to study genes and other biological data.
Methodology
The study developed Gk arrays, a data structure that indexes reads in main memory and allows for efficient querying of k-mers.
Limitations
The study does not address the question of read mapping and focuses solely on read indexing.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website