KGroot: Enhancing Root Cause Analysis through Knowledge Graphs and Graph Convolutional Neural Networks
CoRR(2024)
Abstract
Fault localization is challenging in online micro-service due to the wide
variety of monitoring data volume, types, events and complex interdependencies
in service and components. Faults events in services are propagative and can
trigger a cascade of alerts in a short period of time. In the industry, fault
localization is typically conducted manually by experienced personnel. This
reliance on experience is unreliable and lacks automation. Different modules
present information barriers during manual localization, making it difficult to
quickly align during urgent faults. This inefficiency lags stability assurance
to minimize fault detection and repair time. Though actionable methods aimed to
automatic the process, the accuracy and efficiency are less than satisfactory.
The precision of fault localization results is of paramount importance as it
underpins engineers trust in the diagnostic conclusions, which are derived from
multiple perspectives and offer comprehensive insights. Therefore, a more
reliable method is required to automatically identify the associative
relationships among fault events and propagation path. To achieve this, KGroot
uses event knowledge and the correlation between events to perform root cause
reasoning by integrating knowledge graphs and GCNs for RCA. FEKG is built based
on historical data, an online graph is constructed in real-time when a failure
event occurs, and the similarity between each knowledge graph and online graph
is compared using GCNs to pinpoint the fault type through a ranking strategy.
Comprehensive experiments demonstrate KGroot can locate the root cause with
accuracy of 93.5
matches the level of real-time fault diagnosis in the industrial environment
and significantly surpasses state-of-the-art baselines in RCA in terms of
effectiveness and efficiency.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined