Pattern Matching with Mismatches and Wildcards
CoRR(2024)
摘要
In this work, we address the problem of approximate pattern matching with
wildcards. Given a pattern P of length m containing D wildcards, a text
T of length n, and an integer k, our objective is to identify all
fragments of T within Hamming distance k from P.
Our primary contribution is an algorithm with runtime O(n+(D+k)(G+k)·
n/m) for this problem. Here, G ≤ D represents the number of maximal
wildcard fragments in P. We derive this algorithm by elaborating in a
non-trivial way on the ideas presented by [Charalampopoulos et al., FOCS'20]
for pattern matching with mismatches (without wildcards). Our algorithm
improves over the state of the art when D, G, and k are small relative to
n. For instance, if m = n/2, k=G=n^2/5, and D=n^3/5, our algorithm
operates in O(n) time, surpassing the Ω(n^6/5) time requirement of
all previously known algorithms.
In the case of exact pattern matching with wildcards (k=0), we present a
much simpler algorithm with runtime O(n+DG· n/m) that clearly illustrates
our main technical innovation: the utilisation of positions of P that do not
belong to any fragment of P with a density of wildcards much larger than
D/m as anchors for the sought (approximate) occurrences. Notably, our
algorithm outperforms the best-known O(nlog m)-time FFT-based algorithms of
[Cole and Hariharan, STOC'02] and [Clifford and Clifford, IPL'04] if DG =
o(mlog m).
We complement our algorithmic results with a structural characterization of
the k-mismatch occurrences of P. We demonstrate that in a text of length
O(m), these occurrences can be partitioned into O((D+k)(G+k)) arithmetic
progressions. Additionally, we construct an infinite family of examples with
Ω((D+k)k) arithmetic progressions of occurrences, leveraging a
combinatorial result on progression-free sets [Elkin, SODA'10].
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要