Compact and hash based variants of the suffix array
BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES(2017)
摘要
Full-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Makinen's compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2-3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2n-1.1n bytes of extra space, where n is the text length. FBCSA, in one of the presented variants, reduces the suffix array size by a factor of about 1.5-2, while it gets close in search times, winning in speed with its competitors known from the literature, MakCSA and LCSA.
更多查看译文
关键词
string matching,full-text indexing,suffix array,compact indexes,hashing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要