Technical considerations

Technical considerations In a compute-focused OpenStack cloud, the type of instance workloads being provisioned heavily influences technical decision making. For example, specific use cases that demand multiple short running jobs present different requirements than those that specify long-running jobs, even though both situations are considered "compute focused." Public and private clouds require deterministic capacity planning to support elastic growth in order to meet user SLA expectations. Deterministic capacity planning is the path to predicting the effort and expense of making a given process consistently performant. This process is important because, when a service becomes a critical part of a user's infrastructure, the user's fate becomes wedded to the SLAs of the cloud itself. In cloud computing, a service’s performance will not be measured by its average speed but rather by the consistency of its speed. There are two aspects of capacity planning to consider: planning the initial deployment footprint, and planning expansion of it to stay ahead of the demands of cloud users. Planning the initial footprint for an OpenStack deployment is typically done based on existing infrastructure workloads and estimates based on expected uptake. The starting point is the core count of the cloud. By applying relevant ratios, the user can gather information about: The number of instances expected to be available concurrently: (overcommit fraction × cores) / virtual cores per instance How much storage is required: flavor disk size × number of instances These ratios can be used to determine the amount of additional infrastructure needed to support the cloud. For example, consider a situation in which you require 1600 instances, each with 2 vCPU and 50 GB of storage. Assuming the default overcommit rate of 16:1, working out the math provides an equation of: 1600 = (16 × (number of physical cores)) / 2 storage required = 50 GB × 1600 On the surface, the equations reveal the need for 200 physical cores and 80 TB of storage for /var/lib/nova/instances/. However, it is also important to look at patterns of usage to estimate the load that the API services, database servers, and queue servers are likely to encounter. Consider, for example, the differences between a cloud that supports a managed web-hosting platform with one running integration tests for a development project that creates one instance per code commit. In the former, the heavy work of creating an instance happens only every few months, whereas the latter puts constant heavy load on the cloud controller. The average instance lifetime must be considered, as a larger number generally means less load on the cloud controller. Aside from the creation and termination of instances, the impact of users must be considered when accessing the service, particularly on nova-api and its associated database. Listing instances garners a great deal of information and, given the frequency with which users run this operation, a cloud with a large number of users can increase the load significantly. This can even occur unintentionally. For example, the OpenStack Dashboard instances tab refreshes the list of instances every 30 seconds, so leaving it open in a browser window can cause unexpected load. Consideration of these factors can help determine how many cloud controller cores are required. A server with 8 CPU cores and 8 GB of RAM server would be sufficient for up to a rack of compute nodes, given the above caveats. Key hardware specifications are also crucial to the performance of user instances. Be sure to consider budget and performance needs, including storage performance (spindles/core), memory availability (RAM/core), network bandwidth (Gbps/core), and overall CPU performance (CPU/core). The cloud resource calculator is a useful tool in examining the impacts of different hardware and instance load outs. It is available at: https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods

Expansion planning A key challenge faced when planning the expansion of cloud compute services is the elastic nature of cloud infrastructure demands. Previously, new users or customers would be forced to plan for and request the infrastructure they required ahead of time, allowing time for reactive procurement processes. Cloud computing users have come to expect the agility provided by having instant access to new resources as they are required. Consequently, this means planning should be delivered for typical usage, but also more importantly, for sudden bursts in usage. Planning for expansion can be a delicate balancing act. Planning too conservatively can lead to unexpected oversubscription of the cloud and dissatisfied users. Planning for cloud expansion too aggressively can lead to unexpected underutilization of the cloud and funds spent on operating infrastructure that is not being used efficiently. The key is to carefully monitor the spikes and valleys in cloud usage over time. The intent is to measure the consistency with which services can be delivered, not the average speed or capacity of the cloud. Using this information to model performance results in capacity enables users to more accurately determine the current and future capacity of the cloud.

CPU and RAM (Adapted from: http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice) In current generations, CPUs have up to 12 cores. If an Intel CPU supports Hyper-Threading, those 12 cores are doubled to 24 cores. If a server is purchased that supports multiple CPUs, the number of cores is further multiplied. Hyper-Threading is Intel's proprietary simultaneous multi-threading implementation, used to improve parallelization on their CPUs. Consider enabling Hyper-Threading to improve the performance of multithreaded applications. Whether the user should enable Hyper-Threading on a CPU depends upon the use case. For example, disabling Hyper-Threading can be beneficial in intense computing environments. Performance testing conducted by running local workloads with both Hyper-Threading on and off can help determine what is more appropriate in any particular case. If the Libvirt/KVM hypervisor driver are the intended use cases, then the CPUs used in the compute nodes must support virtualization by way of the VT-x extensions for Intel chips and AMD-v extensions for AMD chips to provide full performance. OpenStack enables the user to overcommit CPU and RAM on compute nodes. This allows an increase in the number of instances running on the cloud at the cost of reducing the performance of the instances. OpenStack Compute uses the following ratios by default: CPU allocation ratio: 16:1 RAM allocation ratio: 1.5:1 The default CPU allocation ratio of 16:1 means that the scheduler allocates up to 16 virtual cores per physical core. For example, if a physical node has 12 cores, the scheduler sees 192 available virtual cores. With typical flavor definitions of 4 virtual cores per instance, this ratio would provide 48 instances on a physical node. Similarly, the default RAM allocation ratio of 1.5:1 means that the scheduler allocates instances to a physical node as long as the total amount of RAM associated with the instances is less than 1.5 times the amount of RAM available on the physical node. For example, if a physical node has 48 GB of RAM, the scheduler allocates instances to that node until the sum of the RAM associated with the instances reaches 72 GB (such as nine instances, in the case where each instance has 8 GB of RAM). The appropriate CPU and RAM allocation ratio must be selected based on particular use cases.

Additional hardware Certain use cases may benefit from exposure to additional devices on the compute node. Examples might include: High performance computing jobs that benefit from the availability of graphics processing units (GPUs) for general-purpose computing. Cryptographic routines that benefit from the availability of hardware random number generators to avoid entropy starvation. Database management systems that benefit from the availability of SSDs for ephemeral storage to maximize read/write time when it is required. Host aggregates are used to group hosts that share similar characteristics, which can include hardware similarities. The addition of specialized hardware to a cloud deployment is likely to add to the cost of each node, so careful consideration must be given to whether all compute nodes, or just a subset which is targetable using flavors, need the additional customization to support the desired workloads.

Utilization Infrastructure-as-a-Service offerings, including OpenStack, use flavors to provide standardized views of virtual machine resource requirements that simplify the problem of scheduling instances while making the best use of the available physical resources. In order to facilitate packing of virtual machines onto physical hosts, the default selection of flavors are constructed so that the second largest flavor is half the size of the largest flavor in every dimension. It has half the vCPUs, half the vRAM, and half the ephemeral disk space. The next largest flavor is half that size again. As a result, packing a server for general purpose computing might look conceptually something like this figure: On the other hand, a CPU optimized packed server might look like the following figure: These default flavors are well suited to typical load outs for commodity server hardware. To maximize utilization, however, it may be necessary to customize the flavors or create new ones, to better align instance sizes to the available hardware. Workload characteristics may also influence hardware choices and flavor configuration, particularly where they present different ratios of CPU versus RAM versus HDD requirements. For more information on Flavors refer to: http://docs.openstack.org/openstack-ops/content/flavors.html

Performance The infrastructure of a cloud should not be shared, so that it is possible for the workloads to consume as many resources as are made available, and accommodations should be made to provide large scale workloads. The duration of batch processing differs depending on individual workloads that are launched. Time limits range from seconds, minutes to hours, and as a result it is considered difficult to predict when resources will be used, for how long, and even which resources will be used.

Security The security considerations needed for this scenario are similar to those of the other scenarios discussed in this book. A security domain comprises users, applications, servers or networks that share common trust requirements and expectations within a system. Typically they have the same authentication and authorization requirements and users. These security domains are: Public Guest Management Data These security domains can be mapped individually to the installation, or they can also be combined. For example, some deployment topologies combine both guest and data domains onto one physical network, whereas in other cases these networks are physically separated. In each case, the cloud operator should be aware of the appropriate security concerns. Security domains should be mapped out against specific OpenStack deployment topology. The domains and their trust requirements depend upon whether the cloud instance is public, private, or hybrid. The public security domain is an entirely untrusted area of the cloud infrastructure. It can refer to the Internet as a whole or simply to networks over which the user has no authority. This domain should always be considered untrusted. Typically used for compute instance-to-instance traffic, the guest security domain handles compute data generated by instances on the cloud; not services that support the operation of the cloud, for example API calls. Public cloud providers and private cloud providers who do not have stringent controls on instance use or who allow unrestricted Internet access to instances should consider this domain to be untrusted. Private cloud providers may want to consider this network as internal and therefore trusted only if they have controls in place to assert that they trust instances and all their tenants. The management security domain is where services interact. Sometimes referred to as the "control plane", the networks in this domain transport confidential data such as configuration parameters, user names, and passwords. In most deployments this domain is considered trusted. The data security domain is concerned primarily with information pertaining to the storage services within OpenStack. Much of the data that crosses this network has high integrity and confidentiality requirements and depending on the type of deployment there may also be strong availability requirements. The trust level of this network is heavily dependent on deployment decisions and as such we do not assign this any default level of trust. When deploying OpenStack in an enterprise as a private cloud it is assumed to be behind a firewall and within the trusted network alongside existing systems. Users of the cloud are typically employees or trusted individuals that are bound by the security requirements set forth by the company. This tends to push most of the security domains towards a more trusted model. However, when deploying OpenStack in a public-facing role, no assumptions can be made and the attack vectors significantly increase. For example, the API endpoints and the software behind it will be vulnerable to potentially hostile entities wanting to gain unauthorized access or prevent access to services. This can result in loss of reputation and must be protected against through auditing and appropriate filtering. Consideration must be taken when managing the users of the system, whether it is the operation of public or private clouds. The identity service allows for LDAP to be part of the authentication process, and includes such systems as an OpenStack deployment that may ease user management if integrated into existing systems. It is strongly recommended that the API services are placed behind hardware that performs SSL termination. API services transmit user names, passwords, and generated tokens between client machines and API endpoints and therefore must be secured. More information on OpenStack Security can be found at http://docs.openstack.org/security-guide/

OpenStack components Due to the nature of the workloads that will be used in this scenario, a number of components will be highly beneficial in a Compute-focused cloud. This includes the typical OpenStack components: OpenStack Compute (nova) OpenStack Image Service (glance) OpenStack Identity (keystone) Also consider several specialized components: Orchestration module (heat) It is safe to assume that, given the nature of the applications involved in this scenario, these will be heavily automated deployments. Making use of Orchestration will be highly beneficial in this case. Deploying a batch of instances and running an automated set of tests can be scripted, however it makes sense to use the Orchestration module to handle all these actions. Telemetry module (ceilometer) Telemetry and the alarms it generates are required to support autoscaling of instances using Orchestration. Users that are not using the Orchestration module do not need to deploy the Telemetry module and may choose to use other external solutions to fulfill their metering and monitoring requirements. See also: http://docs.openstack.org/openstack-ops/content/logging_monitoring.html OpenStack Block Storage (cinder) Due to the burst-able nature of the workloads and the applications and instances that will be used for batch processing, this cloud will utilize mainly memory or CPU, so the need for add-on storage to each instance is not a likely requirement. This does not mean that OpenStack Block Storage (cinder) will not be used in the infrastructure, but typically it will not be used as a central component. Networking When choosing a networking platform, ensure that it either works with all desired hypervisor and container technologies and their OpenStack drivers, or includes an implementation of an ML2 mechanism driver. Networking platforms that provide ML2 mechanisms drivers can be mixed.