PVE: A log parsing method based on VAE using embedding vectors

Information Processing & Management(2023)

引用 0|浏览12
暂无评分
摘要
Log parsing is a critical task that converts unstructured raw logs into structured data for downstream tasks. Existing methods often rely on manual string-matching rules to extract template tokens, leading to lower adaptability on different log datasets. To address this issue, we propose an automated log parsing method, PVE, which leverages Variational Auto-Encoder (VAE) to build a semi-supervised model for categorizing log tokens. Inspired by the observation that log template tokens often consist of words, we choose common words and their combinations to serve as training data to enhance the diversity of structure features of template tokens. Specifically, PVE constructs two types of embedding vectors, the sum embedding and the n-gram embedding, for each word and word combination. The structure features of template tokens can be learned by training VAE on these embeddings. PVE categorizes a token as a template token if it is similar to the training data when log parsing. To improve efficiency, we use the average similarity between token embedding and VAE samples to determine the token type, rather than the reconstruction error. Evaluations on 16 real-world log datasets demonstrate that our method has an average accuracy of 0.878, which outperforms comparison methods in terms of parsing accuracy and adaptability.
更多
查看译文
关键词
log,vae
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要