Locating the Clues of Declining Success Rate of Service Calls

2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)(2020)

引用 5|浏览11
暂无评分
摘要
For many on-line systems with massive users, to provide services continuously and steadily is vital for business, which requires the anomalies of services should be located and resolved in a timely manner. As a common IT infrastructure, various APM (Application Performance Management) systems/frameworks have been adopted to monitor each call request to a service. Nevertheless, the call request may contain multidimensional attributes (e.g., City, ISP, Platform, etc.), which may further contain multiple values (e.g., ISP could be T-Mobile, CMCC, etc.). As a result, an anomaly such as DSR (Declining Success Rate) to service typically occurs with a combination of such attribute values, which creates major challenges to locate the root cause of the anomaly due to potentially huge numbers of the combinations. In this paper, we propose a novel method, ImpAPTr (Impact Analysis based on Pruning Tree), to identify the combination of dimensional attributes as the clues leading to the root cause of anomalies regarding DSR timely. In the evaluation with the simulated dataset, ImpAPTr detects valid clues in milliseconds with an accuracy of 99.37% (within the top 10 candidate results), 97.72% (top 5), and 94.51% (top 3), respectively, which outperforms previous approaches to a large degree. A field test with a production environment dataset indicates that ImpAPTr is able to detect valid clues in a few seconds.
更多
查看译文
关键词
On-line service,Continuity,Anomaly,Multiple attributes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要