Prescriptive examples The Conseil Européen pour la Recherche Nucléaire (CERN), also known as the European Organization for, Nuclear Research provides particle accelerators and other infrastructure for high-energy physics research. As of 2011 CERN operated these two compute centers in Europe with plans to add a third. Data centerApproximate capacity Geneva, Switzerland 3.5 Mega Watts 91000 cores 120 PB HDD 100 PB Tape 310 TB Memory Budapest, Hungary 2.5 Mega Watts 20000 cores 6 PB HDD To support a growing number of compute heavy users of experiments related to the Large Hadron Collider (LHC) CERN ultimately elected to deploy an OpenStack cloud using Scientific Linux and RDO. This effort aimed to simplify the management of the center's compute resources with a view to doubling compute capacity through the addition of an additional data center in 2013 while maintaining the same levels of compute staff. The CERN solution uses cells for segregation of compute resources and to transparently scale between different data centers. This decision meant trading off support for security groups and live migration. In addition some details like flavors needed to be manually replicated across cells. In spite of these drawbacks cells were determined to provide the required scale while exposing a single public API endpoint to users. A compute cell was created for each of the two original data centers and a third was created when a new data center was added in 2013. Each cell contains three availability zones to further segregate compute resources and at least three RabbitMQ message brokers configured to be clustered with mirrored queues for high availability. The API cell, which resides behind a HAProxy load balancer, is located in the data center in Switzerland and directs API calls to compute cells using a customized variation of the cell scheduler. The customizations allow certain workloads to be directed to a specific data center or "all" data centers with cell selection determined by cell RAM availability in the latter case. There is also some customization of the filter scheduler that handles placement within the cells: ImagePropertiesFilter - To provide special handling depending on the guest operating system in use (Linux-based or Windows-based). ProjectsToAggregateFilter - To provide special handling depending on the project the instance is associated with. default_schedule_zones - Allows the selection of multiple default availability zones, rather than a single default. The MySQL database server in each cell is managed by a central database team and configured in an active/passive configuration with a NetApp storage back end. Backups are performed ever 6 hours.
Network architecture To integrate with existing CERN networking infrastructure customizations were made to legacy networking (nova-network). This was in the form of a driver to integrate with CERN's existing database for tracking MAC and IP address assignments. The driver facilitates selection of a MAC address and IP for new instances based on the compute node the scheduler places the instance on The driver considers the compute node that the scheduler placed an instance on and then selects a MAC address and IP from the pre-registered list associated with that node in the database. The database is then updated to reflect the instance the addresses were assigned to.
Storage architecture The OpenStack Image Service is deployed in the API cell and configured to expose version 1 (V1) of the API. As a result the image registry is also required. The storage back end in use is a 3 PB Ceph cluster. A small set of "golden" Scientific Linux 5 and 6 images are maintained which applications can in turn be placed on using orchestration tools. Puppet is used for instance configuration management and customization but Orchestration deployment is expected.
Monitoring Although direct billing is not required, the Telemetry module is used to perform metering for the purposes of adjusting project quotas. A sharded, replicated, MongoDB back end is used. To spread API load, instances of the nova-api service were deployed within the child cells for Telemetry to query against. This also meant that some supporting services including keystone, glance-api and glance-registry needed to also be configured in the child cells. Additional monitoring tools in use include Flume, Elastic Search, Kibana, and the CERN developed Lemon project.
References The authors of the Architecture Design Guide would like to thank CERN for publicly documenting their OpenStack deployment in these resources, which formed the basis for this chapter: http://openstack-in-production.blogspot.fr Deep dive into the CERN Cloud Infrastructure