Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine

Lei Gong,Chao Wang,Haojun Xia,Xianglan Chen,Xi Li,Xuehai Zhou

IEEE Transactions on Computers（2023）

引用 3|浏览69

暂无评分

摘要

Growing pattern matching applications are employing finite automata as their basic processing model. These applications match tens to thousands of patterns on a large amount of data, which brings a great challenge to conventional processors. Therefore hardware-based solutions have emerged frequently and achieved high throuphput automata processing. However, existing methods are generally difficult to achieve both processing speed and storage efficiency, and are often too heavy to be integrated into a small chip and have to rely on off-chip DRAMs or other high capacity memories even on some simple data sets, leading to the potential area and power consumption issues. In this paper, we focus on building a more lightweight automata processing engine, hoping to store the whole automata model into on-chip memory and run effectively and independently. We propose LAP, a lightweight automata processing engine. Powered with a novel automata model (A-DFA) and efficient packing algorithms, extremely high storage efficiency compared with traditional DFA is achieved in LAP. Meanwhile, we identify the key parallelization factors in the A-DFA model and then propose a specialized microarchitecture with novel instructions to further accelerate the state transition process. As a result, LAP can obtain more effective trade-off between processing speed and storage efficiency. Evaluation results show that LAP achieves extremely high storage efficiency on simple data sets, exceeding IBM's RegX by 8x, and achieves significant improvements in processing speed ranging from 1.32x to 1.91x compared with previous lightweight hardware implementations. Moreover, LAP has good scalability in hardware architecture. It is easy to build an acceleration system with higher throughput by increasing the number of cores. We prototype a 16-core system into Xilinx ZC702 FPGA and a 64-core system into Xilinx ZCU102 FPGA respectively. The prototype system on ZC702 on average achieves 3.5 GB/s throughput on simple data sets, and the prototype system on ZCU102 can obtain higher throughput and compute density values on part of large datasets in ANMLZoo compared with modern in-memory NFA-based solutions.

查看译文

关键词

Automata,Pattern matching,Hardware,Program processors,Computational modeling,Engines,Computer architecture,A-DFA,automata processor,FPGA,lightweight engine,pattern matching,regular expression

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要