Detecting Performance Variance for Parallel Applications Without Source Code

IEEE Transactions on Parallel and Distributed Systems(2022)

Cited 1|Views99
No score
For parallel applications, performance variance is a critical issue that can degrade performance and make applications’ behavior difficult to explain. Therefore, users and application developers should be able to detect and diagnose performance variance. Previous detection methods either introduce too much overhead and slow down applications, or rely on nontrivial source code analysis, which is impractical for production-run parallel systems. In this article, we propose Vapro , a framework for detecting and diagnosing performance variance in production-run parallel systems. Our method is based on an observation that most parallel programs contain code snippets that are executed repeatedly with a fixed workload and can be utilized to detect performance variance. We present State Transition Graph (STG) to track program execution and then do light-weight workload analysis on STG to locate performance variance. Vapro is able to successfully identify these snippets at runtime even without program source code. To diagnose the discovered variation, Vapro uses a progressive diagnosis method based on a hybrid model combining variance breakdown and statistical analysis. According to evaluating results, Vapro 's performance overhead is only 1.38% on average. Vapro can identify performance variance in real applications caused by hardware issues, such as memory and IO. The standard deviation of the execution time is decreased by up to 73.5% when the identified variance is fixed. Vapro achieves 30.0% larger detection coverage than the state-of-the-art variance detection approach based on source code analysis.
Translated text
Key words
Performance variance,anomaly detection,system noise,parallel computing
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined