Compressing and Indexing Labeled Trees, with Applications
In: Journal of the Association for Computing Machinery, Jg. 57 (2010), Heft 1
Online
academicJournal
- print, 2 p.1/4
Zugriff:
Consider an ordered, static tree T where each node has a label from alphabet Σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, ...) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-known Burrows-Wheeler transform for strings [ 1994]. The XBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and path-search operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is significantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.
Titel: |
Compressing and Indexing Labeled Trees, with Applications
|
---|---|
Autor/in / Beteiligte Person: | FERRAGINA, Paolo ; LUCCIO, Fabrizio ; MANZINI, Giovanni ; MUTHUKRISHNAN, S |
Link: | |
Zeitschrift: | Journal of the Association for Computing Machinery, Jg. 57 (2010), Heft 1 |
Veröffentlichung: | New York, NY: Association for Computing Machinery, 2010 |
Medientyp: | academicJournal |
Umfang: | print, 2 p.1/4 |
ISSN: | 0004-5411 (print) |
Schlagwort: |
|
Sonstiges: |
|