openstack-manuals/doc/arch-design/compute_focus/section_architecture_compute_focus.xml
Thomas Kaergel b0337e125c replace 'metrics' with 'meters' wherever it makes sense
Replaced 'metrics' with 'meters' in all contexts, excluding the
metric system or  where the term is used as name.

Change-Id: If4c32dfe92c28a2079a485a6aec1d61c7b9999a1
Closes-Bug: #1446518
2015-06-16 13:49:56 +02:00

566 lines
26 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="arch-design-architecture-hardware">
<?dbhtml stop-chunking?>
<title>Architecture</title>
<para>The hardware selection covers three areas:</para>
<itemizedlist>
<listitem>
<para>Compute</para>
</listitem>
<listitem>
<para>Network</para>
</listitem>
<listitem>
<para>Storage</para>
</listitem>
</itemizedlist>
<para>
An OpenStack cloud with extreme demands on processor and memory
resources is compute-focused, and requires hardware that
can handle these demands. This can mean choosing hardware which might
not perform as well on storage or network capabilities. In a compute-
focused architecture, storage and networking load a
data set into the computational cluster, but are not otherwise in heavy
demand.
</para>
<para>
Consider the following factors when selecting compute (server) hardware:
</para>
<variablelist>
<varlistentry>
<term>Server density</term>
<listitem>
<para>A measure of how many servers can fit into a
given amount of physical space, such as a rack unit (U).</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Resource capacity</term>
<listitem>
<para>The number of CPU cores, how much RAM, or how
much storage a given server delivers.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Expandability</term>
<listitem>
<para>The number of additional resources you can add to a
server before it reaches its limit.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Cost</term>
<listitem>
<para>The relative purchase price of the hardware weighted
against the level of design effort needed to build the system.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Weigh these considerations against each other to determine the
best design for the desired purpose. For example, increasing server density
means sacrificing resource capacity or expandability. Increasing resource
capacity and expandability can increase cost but decreases server density.
Decreasing cost can mean decreasing supportability, server density,
resource capacity, and expandability.</para>
<para>A compute-focused cloud should have an emphasis on server hardware
that can offer more CPU sockets, more CPU cores, and more RAM. Network
connectivity and storage capacity are less critical. The hardware must
provide enough network connectivity and storage
capacity to meet minimum user requirements, but they are not the primary
consideration.</para>
<para>Some server hardware form factors suit a compute-focused architecture
better than others. CPU and RAM capacity have the highest priority. Some
considerations for selecting hardware:</para>
<itemizedlist>
<listitem>
<para>Most blade servers can support dual-socket multi-core CPUs. To
avoid this CPU limit, select "full width" or "full height" blades.
Be aware, however, that this also decreases server density. For example,
high density blade servers such as HP BladeSystem or Dell PowerEdge
M1000e support up to 16 servers in only ten rack units. Using
half-height blades is twice as dense as using full-height blades,
which results in only eight servers per ten rack units.</para>
</listitem>
<listitem>
<para>1U rack-mounted servers that occupy only a single rack
unit may offer greater server density than a blade server
solution. It is possible to place forty 1U servers in a rack, providing
space for the top of rack (ToR) switches, compared to 32 full width
blade servers. However, as of the Icehouse release, 1U servers from
the major vendors have only dual-socket, multi-core CPU
configurations. To obtain greater than dual-socket support in a 1U
rack-mount form factor, purchase systems from original
design (ODMs) or second-tier manufacturers.</para>
</listitem>
<listitem>
<para>2U rack-mounted servers provide quad-socket, multi-core CPU
support, but with a corresponding decrease in server density (half
the density that 1U rack-mounted servers offer).</para>
</listitem>
<listitem>
<para>Larger rack-mounted servers, such as 4U servers, often provide
even greater CPU capacity, commonly supporting four or even eight CPU
sockets. These servers have greater expandability, but such servers
have much lower server density and are often more expensive.</para>
</listitem>
<listitem>
<para>"Sled servers" are rack-mounted servers that support multiple
independent servers in a single 2U or 3U enclosure. These deliver higher
density as compared to typical 1U or 2U rack-mounted servers. For
example, many sled servers offer four independent dual-socket
nodes in 2U for a total of eight CPU sockets in 2U. However, the
dual-socket limitation on individual nodes may not be sufficient to
offset their additional cost and configuration complexity.</para>
</listitem>
</itemizedlist>
<para>Consider these facts when choosing server hardware for a compute-
focused OpenStack design architecture:</para>
<variablelist>
<varlistentry>
<term>Instance density</term>
<listitem>
<para>In a compute-focused architecture, instance density is
lower, which means CPU and RAM over-subscription ratios are
also lower. You require more hosts to support the anticipated
scale due to instance density being lower, especially if the
design uses dual-socket hardware designs.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Host density</term>
<listitem>
<para>Another option to address the higher host count
of dual socket designs is to use a quad
socket platform. Taking this approach decreases host density,
which increases rack count. This configuration may
affect the network requirements, the number of power connections, and
possibly impact the cooling requirements.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Power and cooling density</term>
<listitem>
<para>The power and cooling density
requirements for 2U, 3U or even 4U server designs might be lower
than for blade, sled, or 1U server designs because of lower host
density. For data centers with older infrastructure, this may
be a desirable feature.</para>
</listitem>
</varlistentry>
</variablelist>
<para>When designing a compute-focused OpenStack architecture, you must
consider whether you intend to scale up or scale out.
Selecting a smaller number of larger hosts, or a
larger number of smaller hosts, depends on a combination of factors:
cost, power, cooling, physical rack and floor space, support-warranty,
and manageability.</para>
<section xml:id="storage-hardware-selection">
<title>Storage hardware selection</title>
<para>For a compute-focused OpenStack architecture, the
selection of storage hardware is not critical as it is not a primary
consideration. Nonetheless, there are several factors
to consider:</para>
<variablelist>
<varlistentry>
<term>Cost</term>
<listitem>
<para>The overall cost of the solution plays a major role
in what storage architecture and storage hardware you select.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Performance</term>
<listitem>
<para>The performance of the storage solution is important; you can
measure it by observing the latency of storage I-O
requests. In a compute-focused OpenStack cloud, storage latency
can be a major consideration. In some compute-intensive
workloads, minimizing the delays that the CPU experiences while
fetching data from storage can impact significantly on
the overall performance of the application.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Scalability</term>
<listitem>
<para>Scalability refers to the performance of a storage solution
as it expands to its maximum size. A solution that performs
well in small configurations but has degrading
performance as it expands is not scalable. On
the other hand, a solution that continues to perform well at
maximum expansion is scalable.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Expandability</term>
<listitem>
<para>Expandability refers to the overall ability of
a storage solution to grow. A solution that expands to 50 PB is
more expandable than a solution that only scales to 10PB.
Note that this meter is related to, but different
from, scalability, which is a measure of the solution's
performance as it expands.</para>
</listitem>
</varlistentry>
</variablelist>
<para>For a compute-focused OpenStack cloud, latency of storage is a
major consideration. Using solid-state disks (SSDs) to minimize
latency for instance storage reduces CPU delays related to storage
and improves performance. Consider using RAID
controller cards in compute hosts to improve the performance of the
underlying disk subsystem.</para>
<para>Evaluate solutions against the key factors above when considering
your storage architecture. This determines if a scale-out solution
such as Ceph or GlusterFS is suitable, or if a single, highly expandable,
scalable, centralized storage array is better. If a centralized
storage array suits the requirements, the array vendor determines the
hardware. You can build a storage array using commodity hardware with
Open Source software, but you require people with expertise to build
such a system. Conversely, a scale-out storage solution that uses
direct-attached storage (DAS) in the servers may be an appropriate
choice. If so, then the server hardware must
support the storage solution.</para>
<para>The following lists some of the potential impacts that may affect a
particular storage architecture, and the corresponding storage hardware,
of a compute-focused OpenStack cloud:</para>
<variablelist>
<varlistentry>
<term>Connectivity</term>
<listitem>
<para>Ensure connectivity matches the storage solution requirements.
If you select a centralized storage array, determine how the
hypervisors should connect to the storage array. Connectivity
can affect latency and thus performance, so ensure that the network
characteristics minimize latency to boost the overall
performance of the design.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Latency</term>
<listitem>
<para>Determine if the use case has consistent or
highly variable latency.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Throughput</term>
<listitem>
<para>To improve overall performance, ensure that you optimize the
storage solution. While a compute-focused cloud does not usually
have major data I-O to and from storage, this is an important
factor to consider.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Server Hardware</term>
<listitem>
<para>If the solution uses DAS, this impacts the server hardware choice,
host density, instance density, power density, OS-hypervisor, and
management tools.</para>
</listitem>
</varlistentry>
</variablelist>
<para>When instances must be highly available or capable of migration
between hosts, use a shared storage file-system
to store instance ephemeral data to ensure that
compute services can run uninterrupted in the event of a node
failure.</para>
</section>
<section xml:id="selecting-networking-hardware-arch">
<title>Selecting networking hardware</title>
<para>Some of the key considerations for networking hardware selection
include:</para>
<variablelist>
<varlistentry>
<term>Port count</term>
<listitem>
<para>The design requires networking hardware that
has the requisite port count.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Port density</term>
<listitem>
<para>The required port count affects the physical space that a
network design requires.
A switch that can provide 48 10 GbE ports in 1U has a much higher
port density than a switch that provides 24 10 GbE ports in 2U. A
higher port density is better, as it leaves more rack space for
compute or storage components. You must also consider fault
domains and power density. Although more expensive, you can also
consider higher density switches as it is important not to
design the network beyond requirements.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Port speed</term>
<listitem>
<para>The networking hardware must support the proposed
network speed, for example: 1 GbE, 10 GbE, 40 GbE, or 100
GbE.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Redundancy</term>
<listitem>
<para>User requirements for high availability and cost considerations
influence the level of network hardware redundancy you require.
You can achieve network redundancy by adding
redundant power supplies or paired switches. If this is a
requirement, the hardware must support this configuration.
User requirements determine if you require a completely redundant network
infrastructure.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Power requirements</term>
<listitem>
<para>Ensure that the physical data center
provides the necessary power for the selected network hardware. This
is not an issue for top of rack (ToR) switches, but may be an issue
for spine switches in a leaf and spine fabric, or end of row (EoR)
switches.</para>
</listitem>
</varlistentry>
</variablelist>
<para>We recommend designing the network architecture using
a scalable network model that makes it easy to add capacity and
bandwidth. A good example of such a model is the leaf-spline model. In
this type of network design, it is possible to easily add additional
bandwidth as well as scale out to additional racks of gear. It is
important to select network hardware that supports the required
port count, port speed, and port density while also allowing for future
growth as workload demands increase. It is also important to evaluate
where in the network architecture it is valuable to provide redundancy.
Increased network availability and redundancy comes at a cost, therefore
we recommend weighing the cost versus the benefit gained from
utilizing and deploying redundant network switches and using bonded
interfaces at the host level.</para>
</section>
<section xml:id="software-selection-arch">
<title>Software selection</title>
<para>Consider your selection of software for a compute-focused
OpenStack:</para>
<itemizedlist>
<listitem>
<para>Operating system (OS) and hypervisor</para>
</listitem>
<listitem>
<para>OpenStack components</para>
</listitem>
<listitem>
<para>Supplemental software</para>
</listitem>
</itemizedlist>
<para>Design decisions made in each of these areas impact the rest
of the OpenStack architecture design.</para>
</section>
<section xml:id="os-and-hypervisor-arch">
<title>Operating system and hypervisor</title>
<para>The selection of operating system (OS) and hypervisor has a
significant impact on the end point design. Selecting a particular
operating system and hypervisor could affect server hardware selection.
The node, networking, and storage hardware must support the selected
combination. For example, if the design uses Link Aggregation
Control Protocol (LACP), the hypervisor must support it.</para>
<para>OS and hypervisor selection impact the following areas:</para>
<variablelist>
<varlistentry>
<term>Cost</term>
<listitem>
<para>Selecting a commercially supported hypervisor such as
Microsoft Hyper-V results in a different cost model from
choosing a community-supported, open source hypervisor like Kinstance
or Xen. Even within the ranks of open source solutions, choosing
one solution over another can impact cost due
to support contracts. On the other hand, business or application
requirements might dictate a specific or commercially supported
hypervisor.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Supportability</term>
<listitem>
<para>Staff require appropriate training and knowledge to support the
selected OS and hypervisor combination. Consideration of training
costs may impact the design.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Management tools</term>
<listitem>
<para>The management tools used for Ubuntu and
Kinstance differ from the management tools for VMware vSphere.
Although OpenStack supports both OS and hypervisor combinations,
the choice of tool impacts the rest of the design.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Scale and performance</term>
<listitem>
<para>Ensure that selected OS and hypervisor
combinations meet the appropriate scale and performance
requirements. The chosen architecture must meet the targeted
instance-host ratios with the selected OS-hypervisor
combination.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Security</term>
<listitem>
<para>Ensure that the design can accommodate the regular
installation of application security patches while
maintaining the required workloads. The frequency of security
patches for the proposed OS-hypervisor combination has an
impact on performance and the patch installation process can
affect maintenance windows.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Supported features</term>
<listitem>
<para>Determine what features of OpenStack you require.
The choice of features often determines the selection of the
OS-hypervisor combination. Certain features are only available with
specific OSs or hypervisors. For example, if certain features are
not available, modify the design to meet user requirements.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Interoperability</term>
<listitem>
<para>Consider the ability of the selected OS-hypervisor combination
to interoperate or co-exist with other OS-hypervisors, or with
other software solutions in the overall design. Operational and
troubleshooting tools for one OS-hypervisor combination may differ
from the tools for another OS-hypervisor combination. The design
must address if the two sets of tools need to interoperate.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section xml:id="openstack-components-arch">
<title>OpenStack components</title>
<para>The selection of OpenStack components has a significant impact.
There are certain components that are omnipresent, for example the compute
and image services, but others, such as the orchestration module may not
be present. Omitting heat does not typically have a significant impact
on the overall design. However, if the architecture uses a replacement for
OpenStack Object Storage for its storage component, this could have
significant impacts on the rest of the design.</para>
<para>For a compute-focused OpenStack design architecture, the
following components may be present:</para>
<itemizedlist>
<listitem>
<para>Identity (keystone)</para>
</listitem>
<listitem>
<para>Dashboard (horizon)</para>
</listitem>
<listitem>
<para>Compute (nova)</para>
</listitem>
<listitem>
<para>Object Storage (swift, ceph or a commercial solution)</para>
</listitem>
<listitem>
<para>Image (glance)</para>
</listitem>
<listitem>
<para>Networking (neutron)</para>
</listitem>
<listitem>
<para>Orchestration (heat)</para>
</listitem>
</itemizedlist>
<para>A compute-focused design is less likely to include OpenStack Block
Storage due to persistent block storage not
being a significant requirement for the expected workloads. However,
there may be some situations where the need for performance employs
a block storage component to improve data I-O.</para>
<para>The exclusion of certain OpenStack components might also limit the
functionality of other components. If a design opts to
include the Orchestration module but excludes the Telemetry module, then
the design cannot take advantage of Orchestration's auto
scaling functionality as this relies on information from Telemetry.</para>
</section>
<section xml:id="supplemental-software">
<title>Supplemental software</title>
<para>While OpenStack is a fairly complete collection of software
projects for building a platform for cloud services, there are
invariably additional pieces of software that you might add
to an OpenStack design.</para>
<section xml:id="networking-software-arch">
<title>Networking software</title>
<para>OpenStack Networking provides a wide variety of networking services
for instances. There are many additional networking software packages
that might be useful to manage the OpenStack components themselves.
Some examples include software to provide load balancing,
network redundancy protocols, and routing daemons. The
<citetitle>OpenStack High Availability Guide</citetitle> (<link
xlink:href="http://docs.openstack.org/high-availability-guide/content">http://docs.openstack.org/high-availability-guide/content</link>)
describes some of these software packages in more detail.
</para>
<para>For a compute-focused OpenStack cloud, the OpenStack infrastructure
components must be highly available. If the design does not
include hardware load balancing, you must add networking software packages
like HAProxy.</para>
</section>
<section xml:id="management-software-arch">
<title>Management software</title>
<para>The selected supplemental software solution impacts and affects
the overall OpenStack cloud design. This includes software for
providing clustering, logging, monitoring and alerting.</para>
<para>The availability of design requirements is the main determination
for the inclusion of clustering Software, such as Corosync or Pacemaker.
Therefore, the availability of the cloud infrastructure and the
complexity of supporting the configuration after deployment impacts
the inclusion of these software packages. The OpenStack High Availability
Guide provides more
details on the installation and configuration of Corosync and Pacemaker.
</para>
<para>Operational considerations determine the requirements for logging,
monitoring, and alerting. Each of these sub-categories includes
various options. For example, in the logging sub-category
consider Logstash, Splunk, Log Insight, or some other log
aggregation-consolidation tool. Store logs in a centralized
location to ease analysis of the data. Log
data analytics engines can also provide automation and issue
notification by alerting and
attempting to remediate some of the more commonly known issues.</para>
<para>If you require any of these software packages, then the design
must account for the additional resource consumption.
Some other potential design impacts include:</para>
<itemizedlist>
<listitem>
<para>OS-hypervisor combination: ensure that the selected logging,
monitoring, or alerting tools support the proposed OS-hypervisor
combination.</para>
</listitem>
<listitem>
<para>Network hardware: the logging, monitoring, and alerting software
must support the network hardware selection.</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="database-software-arch">
<title>Database software</title>
<para>A large majority of OpenStack components require access to
back-end database services to store state and configuration
information. Select an appropriate back-end database that
satisfies the availability and fault tolerance requirements of the
OpenStack services. OpenStack services support connecting
to any database that the SQLAlchemy Python drivers support,
however most common database deployments make use of MySQL or some
variation of it. We recommend that you make the database that provides
back-end services within a general-purpose cloud highly
available. Some of the more common software solutions include Galera,
MariaDB, and MySQL with multi-master replication.</para>
</section>
</section>
</section>