[arch-design] Minor edits to the compute design section

Minor IA and heading edits, remove duplication, and
update links.

Change-Id: I88ee48c883cf04272822d81bbd5ee1c568ebef20
Implements: blueprint arch-design-pike
This commit is contained in:
daz 2017-03-15 13:04:48 +11:00 committed by Brian Moss
parent 3cd73e20d0
commit bc4723c23a
10 changed files with 138 additions and 191 deletions

View File

@ -1,11 +1,11 @@
=============
Compute nodes
=============
===================
Compute node design
===================
.. toctree::
:maxdepth: 3
design-compute/design-compute-concepts
design-compute/design-compute-arch
design-compute/design-compute-cpu
design-compute/design-compute-hypervisor
design-compute/design-compute-hardware

View File

@ -1,5 +1,5 @@
====================================
Compute Server Architecture Overview
Compute server architecture overview
====================================
When designing compute resource pools, consider the number of processors,
@ -7,11 +7,12 @@ amount of memory, network requirements, the quantity of storage required for
each hypervisor, and any requirements for bare metal hosts provisioned
through ironic.
When architecting an OpenStack cloud, as part of the planning process, the
architect must not only determine what hardware to utilize but whether compute
When architecting an OpenStack cloud, as part of the planning process, you
must not only determine what hardware to utilize but whether compute
resources will be provided in a single pool or in multiple pools or
availability zones. Will the cloud provide distinctly different profiles for
compute?
availability zones. You should consider if the cloud will provide distinctly
different profiles for compute.
For example, CPU, memory or local storage based compute nodes. For NFV
or HPC based clouds, there may even be specific network configurations that
should be reserved for those specific workloads on specific compute nodes. This
@ -83,7 +84,7 @@ the hardware layout of your compute nodes.
and cause a potential increase in a noisy neighbor.
Insufficient disk capacity could also have a negative effect on overall
performance including CPU and memory usage. Depending on the back-end
performance including CPU and memory usage. Depending on the back end
architecture of the OpenStack Block Storage layer, capacity includes
adding disk shelves to enterprise storage systems or installing
additional Block Storage nodes. Upgrading directly attached storage

View File

@ -1,3 +1,5 @@
.. _choosing-a-cpu:
==============
Choosing a CPU
==============
@ -9,9 +11,10 @@ and *AMD-v* for AMD chips.
.. tip::
Consult the vendor documentation to check for virtualization support. For
Intel, read `“Does my processor support Intel® Virtualization Technology?”
<http://www.intel.com/support/processors/sb/cs-030729.htm>`_. For AMD, read
`AMD Virtualization
Intel CPUs, see
`Does my processor support Intel® Virtualization Technology?
<http://www.intel.com/support/processors/sb/cs-030729.htm>`_. For AMD CPUs,
see `AMD Virtualization
<http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization>`_.
Your CPU may support virtualization but it may be disabled. Consult your
BIOS documentation for how to enable CPU features.
@ -23,16 +26,17 @@ purchase a server that supports multiple CPUs, the number of cores is further
multiplied.
As of the Kilo release, key enhancements have been added to the
OpenStack code to improve guest performance. These improvements allow OpenStack
nova to take advantage of greater insight into a Compute host's physical layout
and therefore make smarter decisions regarding workload placement.
Administrators can use this functionality to enable smarter planning choices
for use cases like NFV (Network Function Virtualization) and HPC (High
OpenStack code to improve guest performance. These improvements allow the
Compute service to take advantage of greater insight into a compute host's
physical layout and therefore make smarter decisions regarding workload
placement. Administrators can use this functionality to enable smarter planning
choices for use cases like NFV (Network Function Virtualization) and HPC (High
Performance Computing).
Considering NUMA is important when selecting CPU sizes and types, there are use
cases that use NUMA pinning to reserve host cores for OS processes. These
reduce the available CPU for workloads and protects the OS.
Considering non-uniform memory access (NUMA) is important when selecting CPU
sizes and types, as there are use cases that use NUMA pinning to reserve host
cores for operating system processes. These reduce the available CPU for
workloads and protects the operating system.
.. tip::
@ -55,11 +59,12 @@ reduce the available CPU for workloads and protects the OS.
Additionally, CPU selection may not be one-size-fits-all across enterprises,
but more of a list of SKUs that are tuned for the enterprise workloads.
A deeper discussion about NUMA can be found in `CPU topologies in the Admin
Guide <https://docs.openstack.org/admin-guide/compute-cpu-topologies.html>`_.
For more information about NUMA, see `CPU topologies
<https://docs.openstack.org/admin-guide/compute-cpu-topologies.html>`_ in
the Administrator Guide.
In order to take advantage of these new enhancements in OpenStack nova, Compute
hosts must be using NUMA capable CPUs.
In order to take advantage of these new enhancements in the Compute service,
compute hosts must be using NUMA capable CPUs.
.. tip::

View File

@ -1,8 +1,8 @@
=========================
========================
Choosing server hardware
=========================
========================
Consider the following factors when selecting compute (server) hardware:
Consider the following factors when selecting compute server hardware:
* Server density
A measure of how many servers can fit into a given measure of
@ -20,10 +20,6 @@ Consider the following factors when selecting compute (server) hardware:
The relative cost of the hardware weighed against the total amount of
capacity available on the hardware based on predetermined requirements.
Compute (server) hardware selection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Weigh these considerations against each other to determine the best design for
the desired purpose. For example, increasing server density means sacrificing
resource capacity or expandability. It also can decrease availability and
@ -32,103 +28,36 @@ expandability can increase cost but decrease server density. Decreasing cost
often means decreasing supportability, availability, server density, resource
capacity, and expandability.
The primary job of the OpenStack architect is to determine the requirements for
the cloud prior to constructing the cloud, and planning for expansion
and new features that may require different hardware. Planning for hardware
lifecycles is also the job of the architect. However, if the cloud is initially
built with near end of life, but cost effective hardware, then the
performance and capacity demand of new workloads will drive the purchase of
more modern hardware quicker. With individual harware components changing over
time, companies may prefer to manage configurations as stock keeping units
(SKU)s. This method provides an enterprise with a standard configuration unit
of compute (server) that can be placed in any IT service manager or vendor
supplied ordering system that can be triggered manually or through advanced
operational automations. This simplifies ordering, provisioning, and
activating additional compute resources. For example, there are plug-ins for
several commercial service management tools that enable integration with
hardware APIs. This configures and activates new compute resources from standby
hardware based on a standard configurations. Using this methodology, spare
hardware can be ordered for a datacenter and provisioned based on
capacity data derived from OpenStack Telemetry.
Determine the requirements for the cloud prior to constructing the cloud,
and plan for hardware lifecycles, and expansion and new features that may
require different hardware.
If the cloud is initially built with near end of life, but cost effective
hardware, then the performance and capacity demand of new workloads will drive
the purchase of more modern hardware. With individual hardware components
changing over time, you may prefer to manage configurations as stock keeping
units (SKU)s. This method provides an enterprise with a standard
configuration unit of compute (server) that can be placed in any IT service
manager or vendor supplied ordering system that can be triggered manually or
through advanced operational automations. This simplifies ordering,
provisioning, and activating additional compute resources. For example, there
are plug-ins for several commercial service management tools that enable
integration with hardware APIs. These configure and activate new compute
resources from standby hardware based on a standard configurations. Using this
methodology, spare hardware can be ordered for a datacenter and provisioned
based on capacity data derived from OpenStack Telemetry.
Compute capacity (CPU cores and RAM capacity) is a secondary consideration for
selecting server hardware. The required server hardware must supply adequate
CPU sockets, additional CPU cores, and adequate RAM, and is discussed in detail
under the CPU selection secution.
CPU sockets, additional CPU cores, and adequate RA. For more information, see
:ref:`choosing-a-cpu`.
However, there are also network and storage considerations for any compute
server. Network considerations are discussed in the
`network section
<https://docs.openstack.org/draft/arch-design-draft/design-networking.html>`_
of this chapter.
In compute server architecture design, you must also consider network and
storage requirements. For more information on network considerations, see
:ref:`network-design`.
Scaling your cloud
~~~~~~~~~~~~~~~~~~
For a compute-focused cloud, emphasis should be on server
hardware that can offer more CPU sockets, more CPU cores, and more RAM.
Network connectivity and storage capacity are less critical.
When designing a OpenStack cloud compute server architecture, you must
consider whether you intend to scale up or scale out. Selecting a
smaller number of larger hosts, or a larger number of smaller hosts,
depends on a combination of factors: cost, power, cooling, physical rack
and floor space, support-warranty, and manageability. Typically, the scale out
model has been popular for OpenStack because it further reduces the number of
possible failure domains by spreading workloads across more infrastructure.
However, the downside is the cost of additional servers and the datacenter
resources needed to power, network, and cool them.
Hardware selection considerations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the following in selecting server hardware form factor suited for
your OpenStack design architecture:
* Most blade servers can support dual-socket multi-core CPUs. To avoid
this CPU limit, select ``full width`` or ``full height`` blades. Be
aware, however, that this also decreases server density. For example,
high density blade servers such as HP BladeSystem or Dell PowerEdge
M1000e support up to 16 servers in only ten rack units. Using
half-height blades is twice as dense as using full-height blades,
which results in only eight servers per ten rack units.
* 1U rack-mounted servers have the ability to offer greater server density
than a blade server solution, but are often limited to dual-socket,
multi-core CPU configurations. It is possible to place forty 1U servers
in a rack, providing space for the top of rack (ToR) switches, compared
to 32 full width blade servers.
To obtain greater than dual-socket support in a 1U rack-mount form
factor, customers need to buy their systems from Original Design
Manufacturers (ODMs) or second-tier manufacturers.
.. warning::
This may cause issues for organizations that have preferred
vendor policies or concerns with support and hardware warranties
of non-tier 1 vendors.
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
but with a corresponding decrease in server density (half the density
that 1U rack-mounted servers offer).
* Larger rack-mounted servers, such as 4U servers, often provide even
greater CPU capacity, commonly supporting four or even eight CPU
sockets. These servers have greater expandability, but such servers
have much lower server density and are often more expensive.
* ``Sled servers`` are rack-mounted servers that support multiple
independent servers in a single 2U or 3U enclosure. These deliver
higher density as compared to typical 1U or 2U rack-mounted servers.
For example, many sled servers offer four independent dual-socket
nodes in 2U for a total of eight CPU sockets in 2U.
Other factors to consider
~~~~~~~~~~~~~~~~~~~~~~~~~
Considerations when choosing hardware
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here are some other factors to consider when selecting hardware for your
compute servers.
@ -158,27 +87,27 @@ cooling. The number of hosts (or hypervisors) that can be fitted
into a given metric (rack, rack unit, or floor tile) is another
important method of sizing. Floor weight is an often overlooked
consideration.
The data center floor must be able to support the
weight of the proposed number of hosts within a rack or set of
racks. These factors need to be applied as part of the host density
calculation and server hardware selection.
The data center floor must be able to support the weight of the proposed number
of hosts within a rack or set of racks. These factors need to be applied as
part of the host density calculation and server hardware selection.
Power and cooling density
-------------------------
The power and cooling density requirements might be lower than with
blade,sled, or 1U server designs due to lower host density (by
blade, sled, or 1U server designs due to lower host density (by
using 2U, 3U or even 4U server designs). For data centers with older
infrastructure, this might be a desirable feature.
Data centers have a specified amount of power fed to a given rack or
set of racks. Older data centers may have a power density as power
as low as 20 AMPs per rack, while more recent data centers can be
architected to support power densities as high as 120 AMP per rack.
The selected server hardware must take power density into account.
set of racks. Older data centers may have power densities as low as 20A per
rack, and current data centers can be designed to support power densities as
high as 120A per rack. The selected server hardware must take power density
into account.
Specific hardware concepts
~~~~~~~~~~~~~~~~~~~~~~~~~~
Selecting hardware form factor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the following in selecting server hardware form factor suited for
your OpenStack design architecture:
@ -221,3 +150,16 @@ your OpenStack design architecture:
higher density as compared to typical 1U or 2U rack-mounted servers.
For example, many sled servers offer four independent dual-socket
nodes in 2U for a total of eight CPU sockets in 2U.
Scaling your cloud
~~~~~~~~~~~~~~~~~~
When designing a OpenStack cloud compute server architecture, you must
decide whether you intend to scale up or scale out. Selecting a
smaller number of larger hosts, or a larger number of smaller hosts,
depends on a combination of factors: cost, power, cooling, physical rack
and floor space, support-warranty, and manageability. Typically, the scale out
model has been popular for OpenStack because it reduces the number of possible
failure domains by spreading workloads across more infrastructure.
However, the downside is the cost of additional servers and the datacenter
resources needed to power, network, and cool the servers.

View File

@ -21,22 +21,20 @@ parity, documentation, and the level of community experience.
As per the recent OpenStack user survey, KVM is the most widely adopted
hypervisor in the OpenStack community. Besides KVM, there are many deployments
that run other hypervisors such as LXC, VMware, Xen and Hyper-V. However, these
hypervisors are either less used, are niche hypervisors, or have limited
functionality based on the more commonly used hypervisors. This is due to gaps
in feature parity.
In addition, the nova configuration reference below details feature support for
hypervisors as well as ironic and Virtuozzo (formerly Parallels).
The best information available to support your choice is found on the
`Hypervisor Support Matrix
<https://docs.openstack.org/developer/nova/support-matrix.html>`_
and in the `configuration reference
<https://docs.openstack.org/ocata/config-reference/compute/hypervisors.html>`_.
that run other hypervisors such as LXC, VMware, Xen, and Hyper-V. However,
these hypervisors are either less used, are niche hypervisors, or have limited
functionality compared to more commonly used hypervisors.
.. note::
It is also possible to run multiple hypervisors in a single
deployment using host aggregates or cells. However, an individual
compute node can run only a single hypervisor at a time.
For more information about feature support for
hypervisors as well as ironic and Virtuozzo (formerly Parallels), see
`Hypervisor Support Matrix
<https://docs.openstack.org/developer/nova/support-matrix.html>`_
and `Hypervisors
<https://docs.openstack.org/ocata/config-reference/compute/hypervisors.html>`_
in the Configuration Reference.

View File

@ -5,12 +5,13 @@ Compute server logging
The logs on the compute nodes, or any server running nova-compute (for example
in a hyperconverged architecture), are the primary points for troubleshooting
issues with the hypervisor and compute services. Additionally, operating system
logs can also provide useful information. However, as environments grow, the
amount of log data increases exponentially. Enabling debugging on either the
OpenStack services or the operating system logging further compounds the data
issues.
logs can also provide useful information.
Logging is detailed more fully in the `Operations Guide
As the cloud environment grows, the amount of log data increases exponentially.
Enabling debugging on either the OpenStack services or the operating system
further compounds the data issues.
Logging is described in more detail in the `Operations Guide
<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`_. However,
it is an important design consideration to take into account before commencing
operations of your cloud.
@ -18,7 +19,7 @@ operations of your cloud.
OpenStack produces a great deal of useful logging information, but for
the information to be useful for operations purposes, you should consider
having a central logging server to send logs to, and a log parsing/analysis
system (such as Elastic Stack [formerly known as ELK]).
system such as Elastic Stack [formerly known as ELK].
Elastic Stack consists of mainly three components: Elasticsearch (log search
and analysis), Logstash (log intake, processing and output) and Kibana (log
@ -35,28 +36,28 @@ Redis and Memcached. In newer versions of Elastic Stack, a file buffer called
similar purpose but adds a "backpressure-sensitive" protocol when sending data
to Logstash or Elasticsearch.
Many times, log analysis requires disparate logs of differing formats, Elastic
Stack (namely logstash) was created to take many different log inputs and then
transform them into a consistent format that elasticsearch can catalog and
Log analysis often requires disparate logs of differing formats. Elastic
Stack (namely Logstash) was created to take many different log inputs and
transform them into a consistent format that Elasticsearch can catalog and
analyze. As seen in the image above, the process of ingestion starts on the
servers by logstash, is forwarded to the elasticsearch server for storage and
searching and then displayed via Kibana for visual analysis and interaction.
servers by Logstash, is forwarded to the Elasticsearch server for storage and
searching, and then displayed through Kibana for visual analysis and
interaction.
For instructions on installing Logstash, Elasticsearch and Kibana see `the
elastic stack documentation.
For instructions on installing Logstash, Elasticsearch and Kibana, see the
`Elasticsearch reference
<https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html>`_.
There are some specific configuration parameters that are needed to
configure logstash for OpenStack. For example, in order to get logstash to
collect, parse and send the correct portions of log files and send them to the
elasticsearch server, you need to format the configuration file properly. There
are input, output and filter configurations. Input configurations tell logstash
who and what to recieve data from (log files/
forwarders/filebeats/StdIn/Eventlog/etc.), output specifies where to put the
data, and filter configurations define the input contents to forward to the
output.
configure Logstash for OpenStack. For example, in order to get Logstash to
collect, parse, and send the correct portions of log files to the Elasticsearch
server, you need to format the configuration file properly. There
are input, output and filter configurations. Input configurations tell Logstash
where to recieve data from (log files/forwarders/filebeats/StdIn/Eventlog),
output configurations specify where to put the data, and filter configurations
define the input contents to forward to the output.
The logstash filter performs intermediary processing on each event. Conditional
The Logstash filter performs intermediary processing on each event. Conditional
filters are applied based on the characteristics of the input and the event.
Some examples of filtering are:
@ -88,16 +89,15 @@ representation of events such as:
These input, output and filter configurations are typically stored in
:file:`/etc/logstash/conf.d` but may vary by linux distribution. Separate
configuration files should be created for different logging systems(syslog,
apache, OpenStack, etc.)
configuration files should be created for different logging systems such as
syslog, Apache, and OpenStack.
General examples and configuration guides can be found on the Elastic `Logstash
Configuration page
<https://www.elastic.co/guide/en/logstash/current/configuration-file-structure.html>`_
<https://www.elastic.co/guide/en/logstash/current/configuration-file-structure.html>`_.
OpenStack input, output and filter examples can be found at
`https://github.com/sorantis/elkstack/tree/master/elk/logstash
<https://github.com/sorantis/elkstack/tree/master/elk/logstash>`_
https://github.com/sorantis/elkstack/tree/master/elk/logstash.
Once a configuration is complete, Kibana can be used as a visualization tool
for OpenStack and system logging. This will allow operators to configure custom

View File

@ -1,6 +1,6 @@
=====================
====================
Network connectivity
=====================
====================
The selected server hardware must have the appropriate number of network
connections, as well as the right type of network connections, in order to

View File

@ -1,11 +1,11 @@
==============
Overcommitting
==============
==========================
Overcommitting CPU and RAM
==========================
OpenStack allows you to overcommit CPU and RAM on compute nodes. This
allows you to increase the number of instances you can have running on
your cloud, at the cost of reducing the performance of the instances.
OpenStack Compute uses the following ratios by default:
allows you to increase the number of instances running on your cloud at the
cost of reducing the performance of the instances. The Compute service uses the
following ratios by default:
* CPU allocation ratio: 16:1
* RAM allocation ratio: 1.5:1
@ -39,6 +39,7 @@ with the instances reaches 72 GB (such as nine instances, in the case
where each instance has 8 GB of RAM).
.. note::
Regardless of the overcommit ratio, an instance can not be placed
on any physical node with fewer raw (pre-overcommit) resources than
the instance flavor requires.

View File

@ -2,7 +2,7 @@
Instance storage solutions
==========================
As part of the architecture design for a compute cluster, you must specify some
As part of the architecture design for a compute cluster, you must specify
storage for the disk on which the instantiated instance runs. There are three
main approaches to providing temporary storage:
@ -122,7 +122,7 @@ from one physical host to another, a necessity for performing upgrades
that require reboots of the compute hosts, but only works well with
shared storage.
Live migration can also be done with nonshared storage, using a feature
Live migration can also be done with non-shared storage, using a feature
known as *KVM live block migration*. While an earlier implementation of
block-based migration in KVM and QEMU was considered unreliable, there
is a newer, more reliable implementation of block-based live migration

View File

@ -1,8 +1,8 @@
.. _network-design:
==============
Network design
==============
==========
Networking
==========
.. toctree::
:maxdepth: 2