MicroNet: Operation Aware Root Cause Identification of Microservice System Anomalies

IEEE Transactions on Network and Service Management(2024)

引用 0|浏览0
Microservice architecture has been widely adopted in large-scale applications. However, it also brings new challenges to ensuring reliable performance and maintenance due to the huge volume of data and complex dependencies of microservices. Existing approaches still suffer from the over-aggregation of data, interference from anomaly propagation, and ignoration of component differences. To solve these issues, this paper builds a root cause diagnosis framework at the operation granularity, named as MicroNet. Since operations are subfunctions of microservices, recorded as invocation purposes, we propose the operation-centric perspective, to realize fine-grained data aggregation and operation-level anomaly backtracking. We decompose the diagnosis task into four phases: dependency graph construction, anomaly detection, anomaly evaluation, and culprit location. To construct the invocation dependency accurately, we propose the concept of meta call, defined as the triple (caller, operation, callee), the smallest unit that can be aggregated. Based on the dependency graph, we quantify the operation’s abnormality by analyzing the operation execution process, to backtrack the propagated anomalies. Then, we customize a personalized PageRank algorithm to identify the root cause in which invocation latency and different invocation relationships are considered simultaneously. Our experimental evaluation on an open dataset shows that MicroNet can effectively locate root causes with 90% mean average precision, outperforming state-of-the-art methods.
Microservice architecture,operation analysis,root cause location
AI 理解论文
Chat Paper