Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding ; Lien entre Transformée de Burrows-Wheeler (BWT) et BWT étendue (XBW) via l'automate d'Aho-Corasick : applications en compression par encodage des longueurs
In: CPM 2019 - 30th Annual Symposium on Combinatorial Pattern Matching, 2019
Online
Konferenz
Zugriff:
URN: urn:nbn:de:0030-drops-104955URL: http://drops.dagstuhl.de/opus/volltexte/2019/10495/ISBN ={978-3-95977-103-0},ISSN ={1868-8969}, ; International audience ; The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.
Titel: |
Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding ; Lien entre Transformée de Burrows-Wheeler (BWT) et BWT étendue (XBW) via l'automate d'Aho-Corasick : applications en compression par encodage des longueurs
|
---|---|
Autor/in / Beteiligte Person: | Cazaux, Bastien ; Rivals, Eric ; Méthodes et Algorithmes pour la Bioinformatique (MAB) ; Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM) ; Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) ; Institut de Biologie Computationnelle (IBC) ; Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) ; Support from the Institut de Biologie Computationnelle (ANR-11-BINF-0002).Eric Rivals: thanks the GEM Flagship project funded from Labex NUMEV (ANR-10-LABX-0020). ; University of Pisa ; Pisanti, Nadia ; Pissis, Solon P. ; ANR-11-BINF-0002,IBC,Institut de biologie Computationnelle(2011) ; ANR-10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010) |
Link: | |
Zeitschrift: | CPM 2019 - 30th Annual Symposium on Combinatorial Pattern Matching, 2019 |
Veröffentlichung: | HAL CCSD ; Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2019 |
Medientyp: | Konferenz |
DOI: | 10.4230/LIPIcs.CPM.2019.24 |
Schlagwort: |
|
Sonstiges: |
|