Integrating a heterogeneous and shared computer cluster into grids

V. Büge,U. Felzmann, C. Jung,U. Kerzel, M. Kreps,G. Quast, A. Vest

semanticscholar（2006）

引用 0|浏览2

暂无评分

摘要

Integrating existing computer clusters at universities in to grids is quite a challenge, because these clusters are usually shared among many groups. As an example, the Linux cluster at the “Institut für Experimentelle Kernphysik” (IEKP), University of Karlsruhe, is shared between working groups of the high energy physics experiments AMS, CDF and CMS, and has successfully been integrated into the SAMGrid for CDF and the LHC computing grid LCG for CMS while it still supports local users. This shared usage of the cluster effects heterogeneous software environments, grid middleware and access policies. Within the LCG, the IEKP site realises the concept of a Tier-2/3 prototype centre. The installation procedure and setup of the LCG middleware has been modified according to the local conditions. With this dedicated configuration, the IEKP site offers the full grid functionality such as data transfe r , CMS software installation and grid based physics analyses. The need for prioritisation of certain user groups has been temporarily satisfied by supporting different Virtual Orga nisations within the same experiment. The virtualisation of the LCG components, which can improve the utilisation of resources and security aspects, will be implemented in the near future. WHY DO UNIVERSITY GROUPS NEED GRID COMPUTING? Current and future high energy physics experiments at Tevatron or the LHC need to deal with huge data production rates and event sizes. Therefore, the demand for computing power and mass storage has significantly increased. For example, the CMS detector has a final recording rate of 225 MB per second [1] which has to be stored online for later processing and data reconstruction. Corresponding Monte Carlo simulations have to be generated and stored as well. Currently, there are already: • Simulated data in the LHC experiments, O(100 TB) • Real data in the HEP experiments CDF, D0, H1, ZEUS, BaBar, etc., O(1 PB) Processing power is widely available in the associated institutes, but the worldwide distributed datasets for analy ses are only accessible using grid technologies. Therefore , the participating groups – particularly at universities – c ope with these challenges using grid tools. This leads to an opportunistic or shared use of the resources between local users and grid users in the collaborating groups. The main benefits of integrating an institute’s cluster into grids are: • minimisation of idle times • interception of peak loads, e.g. before conferences • shared data storage • shared deployment effort of common services COMPUTING ENVIRONMENTS AT UNIVERSITIES In this section, the peculiarities of a typical university cluster are described using the representative example of the Linux cluster at the Institut für Experimentelle Kernphysik (IEKP) at the University of Karlsruhe. In general, a typical university cluster has to cope with diverse challenges: • A computer cluster at an institute usually has a heterogeneous structure in hardware, software, funding and ownership. • Furthermore, it has to support multiple groups with different applications and sometimes conflicting interests. • Typically, the cluster’s infrastructural facilities have grown in the course of time. Thus, the cluster has developed its own characteristical history and resulting inhomogeneities. • Moreover, a cluster is embedded in structures imposed by institute, faculty and university. For these reasons, the integration of a university cluster i n o existing grids is not easy at all. Moreover, the idea of sharing resources is still not present in all minds. Representative example: The IEKP at Karlsruhe As an example, the IEKP Linux cluster, called “EKPplus”, at Karlsruhe has successfully been integrated in the Sequential Access via Meta-data Grid (SAMGrid) [2] for CDF and the LHC Computing Grid (LCG) [3] for CMS, while it still supports local users. This shared usage of the cluster leads to heterogeneous environments concerning • software and hardware • local and grid users • access policies • grid middleware A detailed description of the integration of the IEKP Linux cluster into the LHC Computing Grid can be found in [4] and references therein. The specification of the IEKP Linux cluster components which is representative for many other institutes is shortl y described in the following: • There are one or more portal machines for each experiment (3 for CDF, 1 for CMS and 1 for AMS). • Five file servers provide a disk space of about 15 – 20 TB in total. • The local batch system consists of 27 computing nodes with a total of 36 CPU’s. The EKPplus cluster is independent of the desktop cluster. Figure 1 depicts the overall architecture of the EKPplus cluster and the integration of the SAMGrid and LCG components. Peculiarities of a typical university cluster The main issues to be considered when setting up and running an institute’s cluster are described in the followi ng, where the IEKP cluster is again taken as instance. Network architecture The inner network of the cluster hosts the computing nodes, several file servers and a dedicated cluster control machine. This control machine takes care about local users, manages the job queues for the batch system and provides the root file system for the computing nodes. The outer network consists of publicly accessible portals which serve as testbeds for the development of analysis software. Via multiple Ethernet cards, the portals are also connected to the inner private network. Thus, they offer access to the file servers and the usage of the worker nodes (WNs) via the local batch system. Network protocols and services User accounts are exported by the cluster control machine to all nodes via the Network Information Service (NIS). File and root systems are exported via the Network File System (NFS) services. Furthermore, the protocols GSIFTP and SRM [5] are supported. Local batch system The worker nodes are controlled by the Open Source PBS/Torque [6] batch system on the cluster control machine. The scheduler used to send a job to the next free worker node is the flexible system MAUI [7]. This system supports the fair share principle and is able to manage both, group and user fair share. In addition to the local batch queues, one grid queue according to each Virtual Organisation (VO) supported by the LCG site is configured. These queues are dedicated to jobs submitted via the grid and have the same names as the respective VOs. Firewall The grid components are placed behind the firewall of the institute. To allow external access to the services run by LCG, some ports of the firewall have to be opened to these dedicated hosts. The internal campus net is in general protected by the University’s computing department and is switched off for the IEKP cluster net. Desktop cluster The desktop cluster comprises user workstations which can be used as access points to the portal machines. The EKPplus cluster is connected to the IEKP desktop network by a 1 GBit connection. Operating systems The software on the portal machines and worker nodes is experiment dependent. The operating system on all machines is Linux but the flavour is not identical on all components due to experiment specific extensions and modifications. At the IEKP, the following operating systems are used: • The CDF portals use a Linux distribution based on Fermi RedHat 7.3. • On the CMS portal, the operating system Scientific Linux CERN (SLC) 3 is used. • The underlying operating system for the cluster control machine is Scientific Linux CERN 3, as well as for the LCG hosts. • On the worker nodes, a slightly modified release of Scientific Linux CERN 3 is used since it is the only operating system under which all AMS, CDF, CMS software and grid software runs at the moment. Besides, the worker nodes run a 32-bit operating system; an upgrade to a 64-bit operating system is foreseen in the near future. It is worth mentioning that no major problems occurred running different Linux distributions on the same cluster. GRID MIDDLEWARE REQUIREMENTS AND SITE SPECIFIC GRID SERVICES Grid middleware requirements Integrating a computer cluster into grids requires that the grid middleware is able to adapt to the needs of the grid site, which are: • Flexibility: The installation procedure and setup of the grid middleware should be modifiable according to local conditions. • Interoperability: The grid middleware should be compatible with other grid middlewares. • Dynamic: It must be possible to add or remove resources during the running grid service. • Encapsulation: Experiment and analysis software must be shielded from changes in the underlying grid environment. • Level of abstraction: The access to computing and storage resources must be independent of their physical location and local setup. 1 2 9 .1 3 .1 3 3 .0 2 5 5 .2 5 5 .2 5 5 .1 9 2 1 9 2 .1 6 8 .1 0 1 .0 2 5 5 .2 5 5 .2 5 5 .0 Portals Access e kp lx 1 e kp fs 4 e kp p lu sn a t FW/NAT In te rn et / G R ID

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要