Querying large read collections in main memory: a versatile data structure
2011

Gk Arrays: A New Way to Index Large Read Collections

publication Evidence: high

Author Information

Author(s): Philippe Nicolas, Mikaël Salson, Thierry Lecroq, Martine Léonard, Thérèse Commes, Eric Rivals

Primary Institution: LIRMM, UMR 5506, CNRS and Université de Montpellier

Hypothesis

The study proposes a new data structure, Gk arrays, to efficiently index large collections of reads for bioinformatics analysis.

Conclusion

Gk arrays provide a versatile and efficient method for read analysis, requiring less memory and allowing for faster queries compared to existing methods.

Supporting Evidence

  • Gk arrays can handle larger read collections with less memory.
  • The structure allows for fast querying of k-mers in various read analysis contexts.
  • Gk arrays are available as a C++ library under a GPL compliant license.

Takeaway

This study introduces a new tool that helps scientists quickly find information in large sets of DNA sequences, making it easier to study genes and other biological data.

Methodology

The study developed Gk arrays, a data structure that indexes reads in main memory and allows for efficient querying of k-mers.

Limitations

The study does not address the question of read mapping and focuses solely on read indexing.

Digital Object Identifier (DOI)

10.1186/1471-2105-12-242

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication