Virtualization in MetaCentrum
VirtCloud: Virtual Network for User-controlled Virtual Clusters
Virtual Cloud Computing represents the next wave of virtualization and offers significant market opportunities by providing a new, simpler, and much more pervasive platform for on-demand, desktop and application service delivery. Creating a virtual cloud is not trivial, the security, storage, performance, networking and elasticity are really hard to obtain. But once it is done, it can offer many revolutionary new services.
Key benefits
- Optimalization - Server-side virtualization helps optimize data center resources.
- Security - The key to virtual cloud computing is in the universal dialtone which securely links users to applications and content resources, on demand, via any device. This allows for both protecting the cluster from the outside world and protecting the world from the cluster (e.g., in case of user-supplied virtual images). The system is driven by Grid middleware. While VirtCloud uses services of the backbone network, it is designed to run without the need of run-time configuration of the core network.
- Accessibility - The cool thing is that because of virtualization it is much easier to replicate provision and allocate resource in a multi-tenant environment while keeping the environments separated.
MetaCentrum, the Czech national Grid infrastructure, provides VirtCloud system for interconnecting virtual clusters in a state-wide network based on advanced features available in academic networks. The system supports dynamic creation of virtual clusters without the need of run-time administrative privileges on the backbone core network, encapsulation of the clusters, controlled access to external sources for cluster hosts, full user access to the clusters, and optional publishing of the clusters.
Major part of MetaCentrum computation resources is currently virtualised. Virtual machines are managed by Magrathea, a service MetaCentrum has designed and implemented. The virtual nature of the resources is mostly hidden to the end users due to integration with the resource management system.
Virtualising computer clusters as the basic building block of the Grid environments also involves the interconnecting networking infrastructure. Traditionally, the network is understood as a "fixed resource" in the Grid, an omnipresent substrate for data transfers. This view is not sufficient for virtual clusters. Virtual clusters are dynamically mapped to the physical infrastructure, and this mapping is indirectly controlled by the users by means of Grid middleware. While steps to virtualising the network inside the cluster have already been taken by several groups, MetaCentrum focuses on building virtualised networking infrastructure that scales enough to interconnect clusters in wide-area networks and that performs up to the expectations of the high-performance applications.
VirtCloud Implementation in the MetaCentrum using CESNET2 Network
MetaCentrum as the national Grid infrastructure utilizes Czech national research and educational network CESNET2. The CESNET2 network provides DWDM interconnects among major cities in the Czech Republic, production 10/40 Gbps IP backbone for normal traffic as well as experimental services available to other projects. For traffic engineering of the IP backbone, it uses Multi-Protocol Label Switching (MPLS). MetaCentrum project has its nodes in three cities in the Czech Republic: Prague (Praha), Brno, and Pilsen (Plzeň), all of them located close to the CESNET2 point of presence. The distances (over optical cable) are approximately 300 km between Prague and Brno and 100 km between Prague and Pilsen.
VirtCloud spans four levels:
- L2 core network - The following technologies has been identified to fulfil the requirements of the VirtCloud L2
core network that can be implemented using CESNET2 network:
- IEEE 802.1ad (QinQ) allowing encapsulation of the 802.1q tagging into another 802.1q VLAN
- Virtual private LAN service (VPLS) technology for the network that use MPLS traffic engineering
- Cisco Xponder technology uses Cisco 15454 platform to create a distributed switch based on dedicated DWDM optical circuit interconnects
- Site network - Each site uses Layer 2 infrastructure implemented on mix of Force10, Hewlett-Packard, and Cisco Ethernet switches. Each site has parallel uplinks to public IP routed network, Xponder network and VPLS network. For production purposes, the Xponder network is used under normal circumstances as it provides higher capacity since the traffic does not mix with normal routed traffic on the MPLS backbone (that is shared with the standard academic backbone traffic).
- Host configuration - Hosts deploy Xen virtual machine monitor. The hypervisor domain manages user domain virtual machines and provides network connection to them via an Ethernet bridge. Logical network interfaces of each user domain must be bridged to VLANs depending on membership of the user domain in virtual clusters. Taking into account that users may even have administrator privileges in their virtual machines, the tagging must necessarily be performed by the hypervisor, out of user's reach. Addressing of the user domain interfaces can be either IPv4 or IPv6 and it can be fully controlled by the user. The user can use, e.g., private addresses and/or even addresses from user's organisation in order to publish the cluster machines.
- VLAN life cycle implementation - VLAN allocation is controlled by a stateful service called SBF. Users initiate building virtual clusters by means of submitting a special job to resource manager PBS, which allocates a set of physical nodes to run virtual cluster nodes and requests allocation of VLAN number from SBF. SBF configures active elements and returns a VLAN number. PBS in cooperation with Magrathea configures bridging in Xen hypervisor domains and boots requested virtual machine images. The configuration may be torn down by time-out, user and/or administrative action. Then the configuration is removed from all network elements and the VLAN number can be allocated to another virtual cluster. All the distributed operations must be necessarily performed as transactions in order not to bring the infrastructure into an undefined state.
Access from/to the Virtual Clusters
Currently we provide two services for the virtual clusters: file system access and user remote access. Both are implemented in similar way - NFSv4 file servers as well as OpenVPN server used for the remote access have access to all the VLANs of all the virtual clusters, thus becoming part of it. OpenVPN access implementation is very similar to what Nimbus system uses for remote access.
To learn more about architecture of the system, and prototype implementation in MetaCentrum see Technical report.