Proactive Failure-Aware Task Scheduling Framework For Cloud Computing

IEEE ACCESS(2021)

引用 6|浏览9
暂无评分
摘要
Cloud computing is a widely adopted platform for executing tasks of different application types that belong to the end users. In the cloud, application task is prone to failure for several reasons, such as software bug or exception, virtual or physical infrastructure failure. Cloud service providers are responsible for managing availability of scheduled computing tasks in order to provide high level QoS for their customers. Protecting task against failure is a challenging and not a trivial mission due to dynamic, heterogeneous and large distributed structure of the cloud environment. The existing works in the literature focus on task failure prediction and neglect the remedy (post) actions. In this work, we first study and analyze three publicly available large cluster datasets from Google, Alibaba, and Trinity, to characterize task failure in cloud computing platform. We then propose a failure-aware task scheduling framework that can predict the termination status for a set of given tasks during the runtime, and take the appropriate remedy actions. The framework uses deep learning methods named Artificial and Convolutional Neural Network, ANN and CNN, for different prediction purposes. In addition, we formalize the actions selection problem as Integer Linear Programming (ILP) model and propose a heuristic optimization solution that aims to minimize the failure probability of tasks and their resources usage. The results show ANN and CNN can achieve prediction accuracy of up to 94% and 92%, respectively using Google dataset. Moreover, the framework can protect up to 40% of tasks that are predicted as failed using Alibaba dataset by taking the appropriate remedy actions, and hence save many of cluster's resources such as CPU and RAM.
更多
查看译文
关键词
Task analysis, Cloud computing, Random access memory, Predictive models, Software, Quality of service, Processor scheduling, Task failure prediction, deep learning, task scheduling, cloud computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要