An Improved Hashing Approach for Biological Sequence to Solve Exact Pattern Matching Problems

Prince Mahmud,Anisur Rahman,Kamrul Hasan Talukder

APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING（2023）

引用 0|浏览5

暂无评分

摘要

Pattern matching algorithms have gained a lot of importance in computer science, primarily because they are used in various domains such as computational biology, video retrieval, intrusion detection systems, and fraud detection. Finding one or more patterns in a given text is known as pattern matching. Two important things that are used to judge how well exact pattern matching algorithms work are the total number of attempts and the character comparisons that are made during the matching process. The primary focus of our proposed method is reducing the size of both components wherever possible. Despite sprinting, hash-based pattern matching algorithms may have hash collisions. The Efficient Hashing Method (EHM) algorithm is improved in this research. Despite the EHM algorithm's effectiveness, it takes a lot of time in the preprocessing phase, and some hash collisions are generated. A novel hashing method has been proposed, which has reduced the preprocessing time and hash collision of the EHM algorithm. We devised the Hashing Approach for Pattern Matching (HAPM) algorithm by taking the best parts of the EHM and Quick Search (QS) algorithms and adding a way to avoid hash collisions. The preprocessing step of this algorithm combines the bad character table from the QS algorithm, the hashing strategy from the EHM algorithm, and the collision-reducing mechanism. To analyze the performance of our HAPM algorithm, we have used three types of datasets: E. coli, DNA sequences, and protein sequences. We looked at six algorithms discussed in the literature and compared our proposed method. The Hash-q with Unique FNG (HqUF) algorithm was only compared with E. coli and DNA datasets because it creates unique bits for DNA sequences. Our proposed HAPM algorithm also overcomes the problems of the HqUF algorithm. The new method beats older ones regarding average runtime, number of attempts, and character comparisons for long and short text patterns, though it did worse on some short patterns.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要