Tools and Techniques for Managing Clusters for SciDAC Lattice QCD at Fermilab
Clinical Orthopaedics and Related Research(2003)
摘要
Fermilab operates several clusters for lattice gauge computing. Minimal
manpower is available to manage these clusters. We have written a number of
tools and developed techniques to cope with this task. We describe our tools
which use the IPMI facilities of our systems for hardware management tasks such
as remote power control, remote system resets, and health monitoring. We
discuss our techniques involving network booting for installation and upgrades
of the operating system on these computers, and for reloading BIOS and other
firmware. Finally, we discuss our tools for parallel command processing and
their use in monitoring and administrating the PBS batch queue system used on
our clusters.
更多查看译文
关键词
lattice qcd,cluster computing,operating system,power control
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要