MQF and buffered MQF: quotient filters for efficient storage of k-mers with their counts and metadata
In: BMC Bioinformatics, Jg. 22 (2021), Heft 1, S. 1-14
Online
academicJournal
Zugriff:
Abstract Background Specialized data structures are required for online algorithms to efficiently handle large sequencing datasets. The counting quotient filter (CQF), a compact hashtable, can efficiently store k-mers with a skewed distribution. Result Here, we present the mixed-counters quotient filter (MQF) as a new variant of the CQF with novel counting and labeling systems. The new counting system adapts to a wider range of data distributions for increased space efficiency and is faster than the CQF for insertions and queries in most of the tested scenarios. A buffered version of the MQF can offload storage to disk, trading speed of insertions and queries for a significant memory reduction. The labeling system provides a flexible framework for assigning labels to member items while maintaining good data locality and a concise memory representation. These labels serve as a minimal perfect hash function but are ~ tenfold faster than BBhash, with no need to re-analyze the original data for further insertions or deletions. Conclusions The MQF is a flexible and efficient data structure that extends our ability to work with high throughput sequencing data.
Titel: |
MQF and buffered MQF: quotient filters for efficient storage of k-mers with their counts and metadata
|
---|---|
Autor/in / Beteiligte Person: | Shokrof, Moustafa ; C. Titus Brown ; Mansour, Tamer A. |
Link: | |
Zeitschrift: | BMC Bioinformatics, Jg. 22 (2021), Heft 1, S. 1-14 |
Veröffentlichung: | BMC, 2021 |
Medientyp: | academicJournal |
ISSN: | 1471-2105 (print) |
DOI: | 10.1186/s12859-021-03996-x |
Schlagwort: |
|
Sonstiges: |
|