Anthropomorphic diagnosis of runtime hidden behaviors in OpenMP multi-threaded applications

Journal of Parallel and Distributed Computing(2023)

Cited 1|Views21
No score
Abstract
Extreme-scale computing involves hundreds of millions of threads with multi-level parallelism running on large-scale hierarchical and heterogeneous hardware. Some OpenMP multi-threaded applications increasingly suffer from runtime hidden behaviors owning to shared resource contention as well as software- and hardware-related problems. Such hidden behaviors can result in failure and inefficiencies and are among the main challenges in system resiliency. To minimize the impact of hidden behaviors, one must quickly and accurately detect and diagnose the hidden behaviors that cause the failures. However, it is difficult to identify hidden behaviors in the dynamic and noisy data collected by OpenMP multi-threaded monitoring infrastructures. This paper presents an anthropomorphic diagnosis framework for hidden behaviors of OpenMP multi-threaded applications. In the framework, we first design injected heartbeat functions for OpenMP multi-threaded applications. Then, we leverage the heartbeat sequences to extract features of hidden behaviors. Finally, we develop a feature learning-based algorithm using heartbeat analysis, namely HSA, to diagnose hidden behaviors. To evaluate our framework, the NAS Parallel NPB benchmark, EPCC OpenMP micro-benchmark suite, and Jacobi benchmark are used to test the performance of our proposed framework. The experimental results demonstrate that our framework successfully identifies 90.3% of the injected hidden behaviors of OpenMP multi-threaded applications while acquiring low overhead.
More
Translated text
Key words
High performance computing,OpenMP,Machine learning,Heartbeat,Hidden behaviors
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined