Indexing weighted sequences: Neat and efficient

Information and Computation(2020)

引用 25|浏览104
暂无评分
摘要
A weighted sequence is a sequence of probability mass functions over a finite alphabet. A weighted index is a data structure constructed for a weighted sequence and a threshold 1z that, given a string pattern, reports all positions where it occurs in the weighted sequence with probability at least 1z. We present an O(nz)-time construction of an O(nz)-sized weighted index for a weighted sequence of length n that answers queries in optimal time. The previous solution by Amir et al. (2008) required O(nz2log⁡z) time and space. Our main tools are a construction of a family of ⌊z⌋ strings that carries the information about all the strings that occur in a weighted sequence and a more straightforward solution to so-called property indexing. We present applications of our weighted index, in particular in approximate and general scenarios that were introduced by Biswas et al. (2016), and provide its implementation.
更多
查看译文
关键词
Weighted sequence,Position weight matrix (PWM),Text indexing,Suffix tree,Property indexing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要