ReadDB: Efficient Storage for Mapped Short Reads
Author Information
Author(s): Rolfe P Alexander, Gifford David K
Primary Institution: Massachusetts Institute of Technology
Hypothesis
ReadDB aims to provide an efficient storage solution for large collections of aligned high-throughput sequencing datasets.
Conclusion
ReadDB offers a high-performance solution for storing and accessing genome-aligned reads, significantly improving query performance compared to traditional methods.
Supporting Evidence
- ReadDB performs similarly to local-disk access and is three to five times faster than remote BAM or BigWig files.
- The theoretical query time for ReadDB is O(log(n) + m), allowing it to scale to much larger datasets.
- ReadDB provides fast and compact access to aligned short-read datasets where mismatch information is unnecessary.
Takeaway
ReadDB is like a smart filing cabinet for DNA data that helps scientists quickly find and use important information without needing a lot of space.
Methodology
ReadDB was tested against various storage methods using datasets of different sizes to evaluate its performance in querying genomic data.
Limitations
ReadDB does not implement analysis algorithms or visualization tools itself.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website