Succinct Non-overlapping Indexing

Algorithmica(2019)

引用 6|浏览38
暂无评分
摘要
Text indexing is a fundamental problem in computer science. The objective is to preprocess a text T , so that, given a pattern P , we can find all starting positions (or simply, occurrences) of P in T efficiently. In some cases, additional restrictions are imposed. We consider two variants, namely the non-overlapping indexing problem, and the range non-overlapping indexing problem. Given a text T having n characters, the non-overlapping indexing problem is defined as follows: pre-process T into a data structure, such that for any pattern P , containing | P | characters, we can report a set containing the maximum number of non-overlapping occurrences of P in T . Cohen and Porat (in: Algorithms and computation, 20th international symposium, ISAAC 2009, Honolulu, Hawaii. Proceedings, 2009 ) showed that by maintaining a linear space index in which the suffix tree of T is augmented with an O ( n ) word data structure, a query P can be answered in optimal time O(|P|+nocc) , where nocc is the number of occurrences reported. We present the following new result. Let 𝖢𝖲𝖠 (not necessarily a compressed suffix array) be an index of T that can compute (i) the suffix range of P in 𝗌𝖾𝖺𝗋𝖼𝗁(P) time, and (ii) a suffix array or an inverse suffix array value in 𝗍_𝖲𝖠 time. By using 𝖢𝖲𝖠 alone, we can answer a query P in 𝗌𝖾𝖺𝗋𝖼𝗁(P)+𝗌𝗈𝗋𝗍(nocc)+O(nocc·𝗍_𝖲𝖠) time. The function 𝗌𝗈𝗋𝗍(k) denotes the time for sorting k numbers in {1,2,… ,n} . In the range non-overlapping indexing problem, along with the pattern P , two integers a and b , b ≥ a , are provided as input. The task is to report a set containing the maximum number of non-overlapping occurrences of P that lie within the range [ a , b ]. For any arbitrarily small positive constant ϵ , we present an O(n log ^ϵ n) word index with O(|P| + nocc_a,b) query time, where nocc_a,b is the number of occurrences reported. Our index improves upon the result of Cohen and Porat [ 6 ].
更多
查看译文
关键词
Succinct data structures, Range queries, Suffix trees, String algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要