From bc4723c23af3fa08ef806ae108b5f58105cc90be Mon Sep 17 00:00:00 2001 From: daz Date: Wed, 15 Mar 2017 13:04:48 +1100 Subject: [PATCH] [arch-design] Minor edits to the compute design section Minor IA and heading edits, remove duplication, and update links. Change-Id: I88ee48c883cf04272822d81bbd5ee1c568ebef20 Implements: blueprint arch-design-pike --- doc/arch-design/source/design-compute.rst | 8 +- ...e-concepts.rst => design-compute-arch.rst} | 13 +- .../design-compute/design-compute-cpu.rst | 35 ++-- .../design-compute-hardware.rst | 162 ++++++------------ .../design-compute-hypervisor.rst | 24 ++- .../design-compute/design-compute-logging.rst | 54 +++--- .../design-compute-networking.rst | 4 +- .../design-compute-overcommit.rst | 19 +- .../design-compute/design-compute-storage.rst | 4 +- doc/arch-design/source/design-networking.rst | 6 +- 10 files changed, 138 insertions(+), 191 deletions(-) rename doc/arch-design/source/design-compute/{design-compute-concepts.rst => design-compute-arch.rst} (95%) diff --git a/doc/arch-design/source/design-compute.rst b/doc/arch-design/source/design-compute.rst index 2592a95be9..a5cd1546f9 100644 --- a/doc/arch-design/source/design-compute.rst +++ b/doc/arch-design/source/design-compute.rst @@ -1,11 +1,11 @@ -============= -Compute nodes -============= +=================== +Compute node design +=================== .. toctree:: :maxdepth: 3 - design-compute/design-compute-concepts + design-compute/design-compute-arch design-compute/design-compute-cpu design-compute/design-compute-hypervisor design-compute/design-compute-hardware diff --git a/doc/arch-design/source/design-compute/design-compute-concepts.rst b/doc/arch-design/source/design-compute/design-compute-arch.rst similarity index 95% rename from doc/arch-design/source/design-compute/design-compute-concepts.rst rename to doc/arch-design/source/design-compute/design-compute-arch.rst index 7781349cd7..05abd2f050 100644 --- a/doc/arch-design/source/design-compute/design-compute-concepts.rst +++ b/doc/arch-design/source/design-compute/design-compute-arch.rst @@ -1,5 +1,5 @@ ==================================== -Compute Server Architecture Overview +Compute server architecture overview ==================================== When designing compute resource pools, consider the number of processors, @@ -7,11 +7,12 @@ amount of memory, network requirements, the quantity of storage required for each hypervisor, and any requirements for bare metal hosts provisioned through ironic. -When architecting an OpenStack cloud, as part of the planning process, the -architect must not only determine what hardware to utilize but whether compute +When architecting an OpenStack cloud, as part of the planning process, you +must not only determine what hardware to utilize but whether compute resources will be provided in a single pool or in multiple pools or -availability zones. Will the cloud provide distinctly different profiles for -compute? +availability zones. You should consider if the cloud will provide distinctly +different profiles for compute. + For example, CPU, memory or local storage based compute nodes. For NFV or HPC based clouds, there may even be specific network configurations that should be reserved for those specific workloads on specific compute nodes. This @@ -83,7 +84,7 @@ the hardware layout of your compute nodes. and cause a potential increase in a noisy neighbor. Insufficient disk capacity could also have a negative effect on overall -performance including CPU and memory usage. Depending on the back-end +performance including CPU and memory usage. Depending on the back end architecture of the OpenStack Block Storage layer, capacity includes adding disk shelves to enterprise storage systems or installing additional Block Storage nodes. Upgrading directly attached storage diff --git a/doc/arch-design/source/design-compute/design-compute-cpu.rst b/doc/arch-design/source/design-compute/design-compute-cpu.rst index 5180b2dc28..3611e5b5b3 100644 --- a/doc/arch-design/source/design-compute/design-compute-cpu.rst +++ b/doc/arch-design/source/design-compute/design-compute-cpu.rst @@ -1,3 +1,5 @@ +.. _choosing-a-cpu: + ============== Choosing a CPU ============== @@ -9,9 +11,10 @@ and *AMD-v* for AMD chips. .. tip:: Consult the vendor documentation to check for virtualization support. For - Intel, read `“Does my processor support Intel® Virtualization Technology?” - `_. For AMD, read - `AMD Virtualization + Intel CPUs, see + `Does my processor support Intel® Virtualization Technology? + `_. For AMD CPUs, + see `AMD Virtualization `_. Your CPU may support virtualization but it may be disabled. Consult your BIOS documentation for how to enable CPU features. @@ -23,16 +26,17 @@ purchase a server that supports multiple CPUs, the number of cores is further multiplied. As of the Kilo release, key enhancements have been added to the -OpenStack code to improve guest performance. These improvements allow OpenStack -nova to take advantage of greater insight into a Compute host's physical layout -and therefore make smarter decisions regarding workload placement. -Administrators can use this functionality to enable smarter planning choices -for use cases like NFV (Network Function Virtualization) and HPC (High +OpenStack code to improve guest performance. These improvements allow the +Compute service to take advantage of greater insight into a compute host's +physical layout and therefore make smarter decisions regarding workload +placement. Administrators can use this functionality to enable smarter planning +choices for use cases like NFV (Network Function Virtualization) and HPC (High Performance Computing). -Considering NUMA is important when selecting CPU sizes and types, there are use -cases that use NUMA pinning to reserve host cores for OS processes. These -reduce the available CPU for workloads and protects the OS. +Considering non-uniform memory access (NUMA) is important when selecting CPU +sizes and types, as there are use cases that use NUMA pinning to reserve host +cores for operating system processes. These reduce the available CPU for +workloads and protects the operating system. .. tip:: @@ -55,11 +59,12 @@ reduce the available CPU for workloads and protects the OS. Additionally, CPU selection may not be one-size-fits-all across enterprises, but more of a list of SKUs that are tuned for the enterprise workloads. -A deeper discussion about NUMA can be found in `CPU topologies in the Admin -Guide `_. +For more information about NUMA, see `CPU topologies +`_ in +the Administrator Guide. -In order to take advantage of these new enhancements in OpenStack nova, Compute -hosts must be using NUMA capable CPUs. +In order to take advantage of these new enhancements in the Compute service, +compute hosts must be using NUMA capable CPUs. .. tip:: diff --git a/doc/arch-design/source/design-compute/design-compute-hardware.rst b/doc/arch-design/source/design-compute/design-compute-hardware.rst index 1d2a18c1ec..a4e3ef69d8 100644 --- a/doc/arch-design/source/design-compute/design-compute-hardware.rst +++ b/doc/arch-design/source/design-compute/design-compute-hardware.rst @@ -1,8 +1,8 @@ -========================= +======================== Choosing server hardware -========================= +======================== -Consider the following factors when selecting compute (server) hardware: +Consider the following factors when selecting compute server hardware: * Server density A measure of how many servers can fit into a given measure of @@ -20,10 +20,6 @@ Consider the following factors when selecting compute (server) hardware: The relative cost of the hardware weighed against the total amount of capacity available on the hardware based on predetermined requirements. - -Compute (server) hardware selection -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Weigh these considerations against each other to determine the best design for the desired purpose. For example, increasing server density means sacrificing resource capacity or expandability. It also can decrease availability and @@ -32,103 +28,36 @@ expandability can increase cost but decrease server density. Decreasing cost often means decreasing supportability, availability, server density, resource capacity, and expandability. -The primary job of the OpenStack architect is to determine the requirements for -the cloud prior to constructing the cloud, and planning for expansion -and new features that may require different hardware. Planning for hardware -lifecycles is also the job of the architect. However, if the cloud is initially -built with near end of life, but cost effective hardware, then the -performance and capacity demand of new workloads will drive the purchase of -more modern hardware quicker. With individual harware components changing over -time, companies may prefer to manage configurations as stock keeping units -(SKU)s. This method provides an enterprise with a standard configuration unit -of compute (server) that can be placed in any IT service manager or vendor -supplied ordering system that can be triggered manually or through advanced -operational automations. This simplifies ordering, provisioning, and -activating additional compute resources. For example, there are plug-ins for -several commercial service management tools that enable integration with -hardware APIs. This configures and activates new compute resources from standby -hardware based on a standard configurations. Using this methodology, spare -hardware can be ordered for a datacenter and provisioned based on -capacity data derived from OpenStack Telemetry. +Determine the requirements for the cloud prior to constructing the cloud, +and plan for hardware lifecycles, and expansion and new features that may +require different hardware. + +If the cloud is initially built with near end of life, but cost effective +hardware, then the performance and capacity demand of new workloads will drive +the purchase of more modern hardware. With individual hardware components +changing over time, you may prefer to manage configurations as stock keeping +units (SKU)s. This method provides an enterprise with a standard +configuration unit of compute (server) that can be placed in any IT service +manager or vendor supplied ordering system that can be triggered manually or +through advanced operational automations. This simplifies ordering, +provisioning, and activating additional compute resources. For example, there +are plug-ins for several commercial service management tools that enable +integration with hardware APIs. These configure and activate new compute +resources from standby hardware based on a standard configurations. Using this +methodology, spare hardware can be ordered for a datacenter and provisioned +based on capacity data derived from OpenStack Telemetry. Compute capacity (CPU cores and RAM capacity) is a secondary consideration for selecting server hardware. The required server hardware must supply adequate -CPU sockets, additional CPU cores, and adequate RAM, and is discussed in detail -under the CPU selection secution. +CPU sockets, additional CPU cores, and adequate RA. For more information, see +:ref:`choosing-a-cpu`. -However, there are also network and storage considerations for any compute -server. Network considerations are discussed in the -`network section -`_ -of this chapter. +In compute server architecture design, you must also consider network and +storage requirements. For more information on network considerations, see +:ref:`network-design`. - -Scaling your cloud -~~~~~~~~~~~~~~~~~~ - -For a compute-focused cloud, emphasis should be on server -hardware that can offer more CPU sockets, more CPU cores, and more RAM. -Network connectivity and storage capacity are less critical. - -When designing a OpenStack cloud compute server architecture, you must -consider whether you intend to scale up or scale out. Selecting a -smaller number of larger hosts, or a larger number of smaller hosts, -depends on a combination of factors: cost, power, cooling, physical rack -and floor space, support-warranty, and manageability. Typically, the scale out -model has been popular for OpenStack because it further reduces the number of -possible failure domains by spreading workloads across more infrastructure. -However, the downside is the cost of additional servers and the datacenter -resources needed to power, network, and cool them. - - -Hardware selection considerations -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Consider the following in selecting server hardware form factor suited for -your OpenStack design architecture: - -* Most blade servers can support dual-socket multi-core CPUs. To avoid - this CPU limit, select ``full width`` or ``full height`` blades. Be - aware, however, that this also decreases server density. For example, - high density blade servers such as HP BladeSystem or Dell PowerEdge - M1000e support up to 16 servers in only ten rack units. Using - half-height blades is twice as dense as using full-height blades, - which results in only eight servers per ten rack units. - -* 1U rack-mounted servers have the ability to offer greater server density - than a blade server solution, but are often limited to dual-socket, - multi-core CPU configurations. It is possible to place forty 1U servers - in a rack, providing space for the top of rack (ToR) switches, compared - to 32 full width blade servers. - - To obtain greater than dual-socket support in a 1U rack-mount form - factor, customers need to buy their systems from Original Design - Manufacturers (ODMs) or second-tier manufacturers. - - .. warning:: - - This may cause issues for organizations that have preferred - vendor policies or concerns with support and hardware warranties - of non-tier 1 vendors. - -* 2U rack-mounted servers provide quad-socket, multi-core CPU support, - but with a corresponding decrease in server density (half the density - that 1U rack-mounted servers offer). - -* Larger rack-mounted servers, such as 4U servers, often provide even - greater CPU capacity, commonly supporting four or even eight CPU - sockets. These servers have greater expandability, but such servers - have much lower server density and are often more expensive. - -* ``Sled servers`` are rack-mounted servers that support multiple - independent servers in a single 2U or 3U enclosure. These deliver - higher density as compared to typical 1U or 2U rack-mounted servers. - For example, many sled servers offer four independent dual-socket - nodes in 2U for a total of eight CPU sockets in 2U. - - -Other factors to consider -~~~~~~~~~~~~~~~~~~~~~~~~~ +Considerations when choosing hardware +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here are some other factors to consider when selecting hardware for your compute servers. @@ -158,27 +87,27 @@ cooling. The number of hosts (or hypervisors) that can be fitted into a given metric (rack, rack unit, or floor tile) is another important method of sizing. Floor weight is an often overlooked consideration. -The data center floor must be able to support the -weight of the proposed number of hosts within a rack or set of -racks. These factors need to be applied as part of the host density -calculation and server hardware selection. + +The data center floor must be able to support the weight of the proposed number +of hosts within a rack or set of racks. These factors need to be applied as +part of the host density calculation and server hardware selection. Power and cooling density ------------------------- The power and cooling density requirements might be lower than with -blade,sled, or 1U server designs due to lower host density (by +blade, sled, or 1U server designs due to lower host density (by using 2U, 3U or even 4U server designs). For data centers with older infrastructure, this might be a desirable feature. Data centers have a specified amount of power fed to a given rack or -set of racks. Older data centers may have a power density as power -as low as 20 AMPs per rack, while more recent data centers can be -architected to support power densities as high as 120 AMP per rack. -The selected server hardware must take power density into account. +set of racks. Older data centers may have power densities as low as 20A per +rack, and current data centers can be designed to support power densities as +high as 120A per rack. The selected server hardware must take power density +into account. -Specific hardware concepts -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Selecting hardware form factor +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider the following in selecting server hardware form factor suited for your OpenStack design architecture: @@ -221,3 +150,16 @@ your OpenStack design architecture: higher density as compared to typical 1U or 2U rack-mounted servers. For example, many sled servers offer four independent dual-socket nodes in 2U for a total of eight CPU sockets in 2U. + +Scaling your cloud +~~~~~~~~~~~~~~~~~~ + +When designing a OpenStack cloud compute server architecture, you must +decide whether you intend to scale up or scale out. Selecting a +smaller number of larger hosts, or a larger number of smaller hosts, +depends on a combination of factors: cost, power, cooling, physical rack +and floor space, support-warranty, and manageability. Typically, the scale out +model has been popular for OpenStack because it reduces the number of possible +failure domains by spreading workloads across more infrastructure. +However, the downside is the cost of additional servers and the datacenter +resources needed to power, network, and cool the servers. diff --git a/doc/arch-design/source/design-compute/design-compute-hypervisor.rst b/doc/arch-design/source/design-compute/design-compute-hypervisor.rst index 4ef7da2cb9..2cbc7da222 100644 --- a/doc/arch-design/source/design-compute/design-compute-hypervisor.rst +++ b/doc/arch-design/source/design-compute/design-compute-hypervisor.rst @@ -21,22 +21,20 @@ parity, documentation, and the level of community experience. As per the recent OpenStack user survey, KVM is the most widely adopted hypervisor in the OpenStack community. Besides KVM, there are many deployments -that run other hypervisors such as LXC, VMware, Xen and Hyper-V. However, these -hypervisors are either less used, are niche hypervisors, or have limited -functionality based on the more commonly used hypervisors. This is due to gaps -in feature parity. - -In addition, the nova configuration reference below details feature support for -hypervisors as well as ironic and Virtuozzo (formerly Parallels). - -The best information available to support your choice is found on the -`Hypervisor Support Matrix -`_ -and in the `configuration reference -`_. +that run other hypervisors such as LXC, VMware, Xen, and Hyper-V. However, +these hypervisors are either less used, are niche hypervisors, or have limited +functionality compared to more commonly used hypervisors. .. note:: It is also possible to run multiple hypervisors in a single deployment using host aggregates or cells. However, an individual compute node can run only a single hypervisor at a time. + +For more information about feature support for +hypervisors as well as ironic and Virtuozzo (formerly Parallels), see +`Hypervisor Support Matrix +`_ +and `Hypervisors +`_ +in the Configuration Reference. diff --git a/doc/arch-design/source/design-compute/design-compute-logging.rst b/doc/arch-design/source/design-compute/design-compute-logging.rst index 2c8e7d0f03..6ed6bad16a 100644 --- a/doc/arch-design/source/design-compute/design-compute-logging.rst +++ b/doc/arch-design/source/design-compute/design-compute-logging.rst @@ -5,12 +5,13 @@ Compute server logging The logs on the compute nodes, or any server running nova-compute (for example in a hyperconverged architecture), are the primary points for troubleshooting issues with the hypervisor and compute services. Additionally, operating system -logs can also provide useful information. However, as environments grow, the -amount of log data increases exponentially. Enabling debugging on either the -OpenStack services or the operating system logging further compounds the data -issues. +logs can also provide useful information. -Logging is detailed more fully in the `Operations Guide +As the cloud environment grows, the amount of log data increases exponentially. +Enabling debugging on either the OpenStack services or the operating system +further compounds the data issues. + +Logging is described in more detail in the `Operations Guide `_. However, it is an important design consideration to take into account before commencing operations of your cloud. @@ -18,7 +19,7 @@ operations of your cloud. OpenStack produces a great deal of useful logging information, but for the information to be useful for operations purposes, you should consider having a central logging server to send logs to, and a log parsing/analysis -system (such as Elastic Stack [formerly known as ELK]). +system such as Elastic Stack [formerly known as ELK]. Elastic Stack consists of mainly three components: Elasticsearch (log search and analysis), Logstash (log intake, processing and output) and Kibana (log @@ -35,28 +36,28 @@ Redis and Memcached. In newer versions of Elastic Stack, a file buffer called similar purpose but adds a "backpressure-sensitive" protocol when sending data to Logstash or Elasticsearch. -Many times, log analysis requires disparate logs of differing formats, Elastic -Stack (namely logstash) was created to take many different log inputs and then -transform them into a consistent format that elasticsearch can catalog and +Log analysis often requires disparate logs of differing formats. Elastic +Stack (namely Logstash) was created to take many different log inputs and +transform them into a consistent format that Elasticsearch can catalog and analyze. As seen in the image above, the process of ingestion starts on the -servers by logstash, is forwarded to the elasticsearch server for storage and -searching and then displayed via Kibana for visual analysis and interaction. +servers by Logstash, is forwarded to the Elasticsearch server for storage and +searching, and then displayed through Kibana for visual analysis and +interaction. -For instructions on installing Logstash, Elasticsearch and Kibana see `the -elastic stack documentation. +For instructions on installing Logstash, Elasticsearch and Kibana, see the +`Elasticsearch reference `_. There are some specific configuration parameters that are needed to -configure logstash for OpenStack. For example, in order to get logstash to -collect, parse and send the correct portions of log files and send them to the -elasticsearch server, you need to format the configuration file properly. There -are input, output and filter configurations. Input configurations tell logstash -who and what to recieve data from (log files/ -forwarders/filebeats/StdIn/Eventlog/etc.), output specifies where to put the -data, and filter configurations define the input contents to forward to the -output. +configure Logstash for OpenStack. For example, in order to get Logstash to +collect, parse, and send the correct portions of log files to the Elasticsearch +server, you need to format the configuration file properly. There +are input, output and filter configurations. Input configurations tell Logstash +where to recieve data from (log files/forwarders/filebeats/StdIn/Eventlog), +output configurations specify where to put the data, and filter configurations +define the input contents to forward to the output. -The logstash filter performs intermediary processing on each event. Conditional +The Logstash filter performs intermediary processing on each event. Conditional filters are applied based on the characteristics of the input and the event. Some examples of filtering are: @@ -88,16 +89,15 @@ representation of events such as: These input, output and filter configurations are typically stored in :file:`/etc/logstash/conf.d` but may vary by linux distribution. Separate -configuration files should be created for different logging systems(syslog, -apache, OpenStack, etc.) +configuration files should be created for different logging systems such as +syslog, Apache, and OpenStack. General examples and configuration guides can be found on the Elastic `Logstash Configuration page -`_ +`_. OpenStack input, output and filter examples can be found at -`https://github.com/sorantis/elkstack/tree/master/elk/logstash -`_ +https://github.com/sorantis/elkstack/tree/master/elk/logstash. Once a configuration is complete, Kibana can be used as a visualization tool for OpenStack and system logging. This will allow operators to configure custom diff --git a/doc/arch-design/source/design-compute/design-compute-networking.rst b/doc/arch-design/source/design-compute/design-compute-networking.rst index 737541fa0c..494d3dafc0 100644 --- a/doc/arch-design/source/design-compute/design-compute-networking.rst +++ b/doc/arch-design/source/design-compute/design-compute-networking.rst @@ -1,6 +1,6 @@ -===================== +==================== Network connectivity -===================== +==================== The selected server hardware must have the appropriate number of network connections, as well as the right type of network connections, in order to diff --git a/doc/arch-design/source/design-compute/design-compute-overcommit.rst b/doc/arch-design/source/design-compute/design-compute-overcommit.rst index baa112a2b8..ea4deee06a 100644 --- a/doc/arch-design/source/design-compute/design-compute-overcommit.rst +++ b/doc/arch-design/source/design-compute/design-compute-overcommit.rst @@ -1,11 +1,11 @@ -============== -Overcommitting -============== +========================== +Overcommitting CPU and RAM +========================== OpenStack allows you to overcommit CPU and RAM on compute nodes. This -allows you to increase the number of instances you can have running on -your cloud, at the cost of reducing the performance of the instances. -OpenStack Compute uses the following ratios by default: +allows you to increase the number of instances running on your cloud at the +cost of reducing the performance of the instances. The Compute service uses the +following ratios by default: * CPU allocation ratio: 16:1 * RAM allocation ratio: 1.5:1 @@ -20,13 +20,13 @@ The formula for the number of virtual instances on a compute node is ``(OR*PC)/VC``, where: OR - CPU overcommit ratio (virtual cores per physical core) + CPU overcommit ratio (virtual cores per physical core) PC - Number of physical cores + Number of physical cores VC - Number of virtual cores per instance + Number of virtual cores per instance Similarly, the default RAM allocation ratio of 1.5:1 means that the scheduler allocates instances to a physical node as long as the total @@ -39,6 +39,7 @@ with the instances reaches 72 GB (such as nine instances, in the case where each instance has 8 GB of RAM). .. note:: + Regardless of the overcommit ratio, an instance can not be placed on any physical node with fewer raw (pre-overcommit) resources than the instance flavor requires. diff --git a/doc/arch-design/source/design-compute/design-compute-storage.rst b/doc/arch-design/source/design-compute/design-compute-storage.rst index c525478a61..3c0fd11d09 100644 --- a/doc/arch-design/source/design-compute/design-compute-storage.rst +++ b/doc/arch-design/source/design-compute/design-compute-storage.rst @@ -2,7 +2,7 @@ Instance storage solutions ========================== -As part of the architecture design for a compute cluster, you must specify some +As part of the architecture design for a compute cluster, you must specify storage for the disk on which the instantiated instance runs. There are three main approaches to providing temporary storage: @@ -122,7 +122,7 @@ from one physical host to another, a necessity for performing upgrades that require reboots of the compute hosts, but only works well with shared storage. -Live migration can also be done with nonshared storage, using a feature +Live migration can also be done with non-shared storage, using a feature known as *KVM live block migration*. While an earlier implementation of block-based migration in KVM and QEMU was considered unreliable, there is a newer, more reliable implementation of block-based live migration diff --git a/doc/arch-design/source/design-networking.rst b/doc/arch-design/source/design-networking.rst index 093d486f59..984c3ef494 100644 --- a/doc/arch-design/source/design-networking.rst +++ b/doc/arch-design/source/design-networking.rst @@ -1,8 +1,8 @@ .. _network-design: -============== -Network design -============== +========== +Networking +========== .. toctree:: :maxdepth: 2