Competition-based failure-aware scheduling for High-Throughput Computing systems on peer-to-peer networks

Carlos Pérez-Miguel,Alexander Mendiburu,Jose Miguel-Alonso

Cluster Computing（2015）

引用 1|浏览21

暂无评分

摘要

In a High-Throughput Computing (HTC) system, system failures and churning pose an important performance limitation. The time used by tasks running in a node that suddenly fails (or abandons the system) constitutes a waste of resources. These aborted tasks are usually reinserted into the system for automatic re-execution, causing additional overheads. This problem has been partially addressed via fault tolerant techniques such as checkpointing and replication. However, these solutions cause additional overheads. In this work, we present several failure-aware scheduling policies that aim to reduce the waste of resources by means of mechanisms to match the submitted tasks with the best node to run it, taking into consideration the (predicted) duration of the task and the (expected) survival time of the nodes. Experimentation through simulation, in the context of an HTC system built on top of a peer-to-peer network, confirms that our policies, compared to several state-of-the-art alternatives, result in a more effective distribution of workload whose consequence is a higher task throughput.

查看译文

关键词

High-Throughput Computing,Peer to peer systems,Failure-aware scheduling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要