Automating Job Monitoring System for an Ecosystem of High Performance Computing.

MEDES(2017)

引用 0|浏览5
暂无评分
摘要
Many countries have founded national high performance computing center aiming to provide computational resources to their scientists upon requests. The resources provided are not efficient because the job requests are not relative to the real use leading to unnecessary resource consumption. In this paper, we present a method to monitor and manage High Performance Computing (HPC) resources more efficiently. Usually, the HPC resources are managed by a Portable Batch System (PBS) as the Job Management System (JMS) for effective job scheduling and resource allocation. However, the HPC resources often engage in inefficient job requests. For instance, a job request may have for four processors running per node for two hours, but the actual usage engages four processors per node for one hour. Hence, the HPC resources lose an hour of productivity. As a consequence, the queues for job execution are longer. The automated job monitoring system proposed in this paper would scan all the jobs on every HPC Node and compare the job requests conditions with preset criteria. If the conditions meet the criteria, then the inefficient jobs are forced to cancel from the HPC queue. The results show that more HPC resources are available for executing other jobs in the queue, leading to saved resources in the HPC environment and Stabilization of HPC hardware, promoting an HPC infrastructure ecosystem.
更多
查看译文
关键词
Job Monitoring System, Resource-saving High Performance Computing Management System, Ineffective HPC Job detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要