[arch-design] Migrate cloud architecture examples
1. Migrate and tidy up cloud architecture examples from the current guide 2. Migrate figures 3. Add placeholder sections for new content Change-Id: I290f555f6e0cd4200deccb4d705127d99e61c343 Partial-Bug: #1548176 Implements: blueprint archguide-mitaka-reorg
126
doc/arch-design-draft/source/arch-examples-compute.rst
Normal file
@ -0,0 +1,126 @@
|
|||||||
|
=============================
|
||||||
|
Compute-focused cloud example
|
||||||
|
=============================
|
||||||
|
|
||||||
|
The Conseil Européen pour la Recherche Nucléaire (CERN), also known as
|
||||||
|
the European Organization for Nuclear Research, provides particle
|
||||||
|
accelerators and other infrastructure for high-energy physics research.
|
||||||
|
|
||||||
|
As of 2011 CERN operated these two compute centers in Europe with plans
|
||||||
|
to add a third.
|
||||||
|
|
||||||
|
+-----------------------+------------------------+
|
||||||
|
| Data center | Approximate capacity |
|
||||||
|
+=======================+========================+
|
||||||
|
| Geneva, Switzerland | - 3.5 Mega Watts |
|
||||||
|
| | |
|
||||||
|
| | - 91000 cores |
|
||||||
|
| | |
|
||||||
|
| | - 120 PB HDD |
|
||||||
|
| | |
|
||||||
|
| | - 100 PB Tape |
|
||||||
|
| | |
|
||||||
|
| | - 310 TB Memory |
|
||||||
|
+-----------------------+------------------------+
|
||||||
|
| Budapest, Hungary | - 2.5 Mega Watts |
|
||||||
|
| | |
|
||||||
|
| | - 20000 cores |
|
||||||
|
| | |
|
||||||
|
| | - 6 PB HDD |
|
||||||
|
+-----------------------+------------------------+
|
||||||
|
|
||||||
|
To support a growing number of compute-heavy users of experiments
|
||||||
|
related to the Large Hadron Collider (LHC), CERN ultimately elected to
|
||||||
|
deploy an OpenStack cloud using Scientific Linux and RDO. This effort
|
||||||
|
aimed to simplify the management of the center's compute resources with
|
||||||
|
a view to doubling compute capacity through the addition of a data
|
||||||
|
center in 2013 while maintaining the same levels of compute staff.
|
||||||
|
|
||||||
|
The CERN solution uses :term:`cells <cell>` for segregation of compute
|
||||||
|
resources and for transparently scaling between different data centers.
|
||||||
|
This decision meant trading off support for security groups and live
|
||||||
|
migration. In addition, they must manually replicate some details, like
|
||||||
|
flavors, across cells. In spite of these drawbacks cells provide the
|
||||||
|
required scale while exposing a single public API endpoint to users.
|
||||||
|
|
||||||
|
CERN created a compute cell for each of the two original data centers
|
||||||
|
and created a third when it added a new data center in 2013. Each cell
|
||||||
|
contains three availability zones to further segregate compute resources
|
||||||
|
and at least three RabbitMQ message brokers configured for clustering
|
||||||
|
with mirrored queues for high availability.
|
||||||
|
|
||||||
|
The API cell, which resides behind a HAProxy load balancer, is in the
|
||||||
|
data center in Switzerland and directs API calls to compute cells using
|
||||||
|
a customized variation of the cell scheduler. The customizations allow
|
||||||
|
certain workloads to route to a specific data center or all data
|
||||||
|
centers, with cell RAM availability determining cell selection in the
|
||||||
|
latter case.
|
||||||
|
|
||||||
|
.. figure:: figures/Generic_CERN_Example.png
|
||||||
|
|
||||||
|
There is also some customization of the filter scheduler that handles
|
||||||
|
placement within the cells:
|
||||||
|
|
||||||
|
ImagePropertiesFilter
|
||||||
|
Provides special handling depending on the guest operating system in
|
||||||
|
use (Linux-based or Windows-based).
|
||||||
|
|
||||||
|
ProjectsToAggregateFilter
|
||||||
|
Provides special handling depending on which project the instance is
|
||||||
|
associated with.
|
||||||
|
|
||||||
|
default_schedule_zones
|
||||||
|
Allows the selection of multiple default availability zones, rather
|
||||||
|
than a single default.
|
||||||
|
|
||||||
|
A central database team manages the MySQL database server in each cell
|
||||||
|
in an active/passive configuration with a NetApp storage back end.
|
||||||
|
Backups run every 6 hours.
|
||||||
|
|
||||||
|
Network architecture
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To integrate with existing networking infrastructure, CERN made
|
||||||
|
customizations to legacy networking (nova-network). This was in the form
|
||||||
|
of a driver to integrate with CERN's existing database for tracking MAC
|
||||||
|
and IP address assignments.
|
||||||
|
|
||||||
|
The driver facilitates selection of a MAC address and IP for new
|
||||||
|
instances based on the compute node where the scheduler places the
|
||||||
|
instance.
|
||||||
|
|
||||||
|
The driver considers the compute node where the scheduler placed an
|
||||||
|
instance and selects a MAC address and IP from the pre-registered list
|
||||||
|
associated with that node in the database. The database updates to
|
||||||
|
reflect the address assignment to that instance.
|
||||||
|
|
||||||
|
Storage architecture
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
CERN deploys the OpenStack Image service in the API cell and configures
|
||||||
|
it to expose version 1 (V1) of the API. This also requires the image
|
||||||
|
registry. The storage back end in use is a 3 PB Ceph cluster.
|
||||||
|
|
||||||
|
CERN maintains a small set of Scientific Linux 5 and 6 images onto which
|
||||||
|
orchestration tools can place applications. Puppet manages instance
|
||||||
|
configuration and customization.
|
||||||
|
|
||||||
|
Monitoring
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
CERN does not require direct billing, but uses the Telemetry service to
|
||||||
|
perform metering for the purposes of adjusting project quotas. CERN uses
|
||||||
|
a sharded, replicated, MongoDB back-end. To spread API load, CERN
|
||||||
|
deploys instances of the nova-api service within the child cells for
|
||||||
|
Telemetry to query against. This also requires the configuration of
|
||||||
|
supporting services such as keystone, glance-api, and glance-registry in
|
||||||
|
the child cells.
|
||||||
|
|
||||||
|
.. figure:: figures/Generic_CERN_Architecture.png
|
||||||
|
|
||||||
|
Additional monitoring tools in use include
|
||||||
|
`Flume <http://flume.apache.org/>`_, `Elastic
|
||||||
|
Search <http://www.elasticsearch.org/>`_,
|
||||||
|
`Kibana <http://www.elasticsearch.org/overview/kibana/>`_, and the CERN
|
||||||
|
developed `Lemon <http://lemon.web.cern.ch/lemon/index.shtml>`_
|
||||||
|
project.
|
85
doc/arch-design-draft/source/arch-examples-general.rst
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
=====================
|
||||||
|
General cloud example
|
||||||
|
=====================
|
||||||
|
|
||||||
|
An online classified advertising company wants to run web applications
|
||||||
|
consisting of Tomcat, Nginx and MariaDB in a private cloud. To be able
|
||||||
|
to meet policy requirements, the cloud infrastructure will run in their
|
||||||
|
own data center. The company has predictable load requirements, but
|
||||||
|
requires scaling to cope with nightly increases in demand. Their current
|
||||||
|
environment does not have the flexibility to align with their goal of
|
||||||
|
running an open source API environment. The current environment consists
|
||||||
|
of the following:
|
||||||
|
|
||||||
|
* Between 120 and 140 installations of Nginx and Tomcat, each with 2
|
||||||
|
vCPUs and 4 GB of RAM
|
||||||
|
|
||||||
|
* A three-node MariaDB and Galera cluster, each with 4 vCPUs and 8 GB
|
||||||
|
RAM
|
||||||
|
|
||||||
|
The company runs hardware load balancers and multiple web applications
|
||||||
|
serving their websites, and orchestrates environments using combinations
|
||||||
|
of scripts and Puppet. The website generates large amounts of log data
|
||||||
|
daily that requires archiving.
|
||||||
|
|
||||||
|
The solution would consist of the following OpenStack components:
|
||||||
|
|
||||||
|
* A firewall, switches and load balancers on the public facing network
|
||||||
|
connections.
|
||||||
|
|
||||||
|
* OpenStack Controller service running Image, Identity, Networking,
|
||||||
|
combined with support services such as MariaDB and RabbitMQ,
|
||||||
|
configured for high availability on at least three controller nodes.
|
||||||
|
|
||||||
|
* OpenStack Compute nodes running the KVM hypervisor.
|
||||||
|
|
||||||
|
* OpenStack Block Storage for use by compute instances, requiring
|
||||||
|
persistent storage (such as databases for dynamic sites).
|
||||||
|
|
||||||
|
* OpenStack Object Storage for serving static objects (such as images).
|
||||||
|
|
||||||
|
.. figure:: figures/General_Architecture3.png
|
||||||
|
|
||||||
|
Running up to 140 web instances and the small number of MariaDB
|
||||||
|
instances requires 292 vCPUs available, as well as 584 GB RAM. On a
|
||||||
|
typical 1U server using dual-socket hex-core Intel CPUs with
|
||||||
|
Hyperthreading, and assuming 2:1 CPU overcommit ratio, this would
|
||||||
|
require 8 OpenStack Compute nodes.
|
||||||
|
|
||||||
|
The web application instances run from local storage on each of the
|
||||||
|
OpenStack Compute nodes. The web application instances are stateless,
|
||||||
|
meaning that any of the instances can fail and the application will
|
||||||
|
continue to function.
|
||||||
|
|
||||||
|
MariaDB server instances store their data on shared enterprise storage,
|
||||||
|
such as NetApp or Solidfire devices. If a MariaDB instance fails,
|
||||||
|
storage would be expected to be re-attached to another instance and
|
||||||
|
rejoined to the Galera cluster.
|
||||||
|
|
||||||
|
Logs from the web application servers are shipped to OpenStack Object
|
||||||
|
Storage for processing and archiving.
|
||||||
|
|
||||||
|
Additional capabilities can be realized by moving static web content to
|
||||||
|
be served from OpenStack Object Storage containers, and backing the
|
||||||
|
OpenStack Image service with OpenStack Object Storage.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Increasing OpenStack Object Storage means network bandwidth needs to
|
||||||
|
be taken into consideration. Running OpenStack Object Storage with
|
||||||
|
network connections offering 10 GbE or better connectivity is
|
||||||
|
advised.
|
||||||
|
|
||||||
|
Leveraging Orchestration and Telemetry services is also a potential
|
||||||
|
issue when providing auto-scaling, orchestrated web application
|
||||||
|
environments. Defining the web applications in a
|
||||||
|
:term:`Heat Orchestration Template (HOT)`
|
||||||
|
negates the reliance on the current scripted Puppet
|
||||||
|
solution.
|
||||||
|
|
||||||
|
OpenStack Networking can be used to control hardware load balancers
|
||||||
|
through the use of plug-ins and the Networking API. This allows users to
|
||||||
|
control hardware load balance pools and instances as members in these
|
||||||
|
pools, but their use in production environments must be carefully
|
||||||
|
weighed against current stability.
|
||||||
|
|
154
doc/arch-design-draft/source/arch-examples-hybrid.rst
Normal file
@ -0,0 +1,154 @@
|
|||||||
|
=====================
|
||||||
|
Hybrid cloud examples
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Hybrid cloud environments are designed for these use cases:
|
||||||
|
|
||||||
|
* Bursting workloads from private to public OpenStack clouds
|
||||||
|
* Bursting workloads from private to public non-OpenStack clouds
|
||||||
|
* High availability across clouds (for technical diversity)
|
||||||
|
|
||||||
|
This chapter provides examples of environments that address
|
||||||
|
each of these use cases.
|
||||||
|
|
||||||
|
Bursting to a public OpenStack cloud
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Company A's data center is running low on capacity.
|
||||||
|
It is not possible to expand the data center in the foreseeable future.
|
||||||
|
In order to accommodate the continuously growing need for
|
||||||
|
development resources in the organization,
|
||||||
|
Company A decides to use resources in the public cloud.
|
||||||
|
|
||||||
|
Company A has an established data center with a substantial amount
|
||||||
|
of hardware. Migrating the workloads to a public cloud is not feasible.
|
||||||
|
|
||||||
|
The company has an internal cloud management platform that directs
|
||||||
|
requests to the appropriate cloud, depending on the local capacity.
|
||||||
|
This is a custom in-house application written for this specific purpose.
|
||||||
|
|
||||||
|
This solution is depicted in the figure below:
|
||||||
|
|
||||||
|
.. figure:: figures/Multi-Cloud_Priv-Pub3.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
This example shows two clouds with a Cloud Management
|
||||||
|
Platform (CMP) connecting them. This guide does not
|
||||||
|
discuss a specific CMP, but describes how the Orchestration and
|
||||||
|
Telemetry services handle, manage, and control workloads.
|
||||||
|
|
||||||
|
The private OpenStack cloud has at least one controller and at least
|
||||||
|
one compute node. It includes metering using the Telemetry service.
|
||||||
|
The Telemetry service captures the load increase and the CMP
|
||||||
|
processes the information. If there is available capacity,
|
||||||
|
the CMP uses the OpenStack API to call the Orchestration service.
|
||||||
|
This creates instances on the private cloud in response to user requests.
|
||||||
|
When capacity is not available on the private cloud, the CMP issues
|
||||||
|
a request to the Orchestration service API of the public cloud.
|
||||||
|
This creates the instance on the public cloud.
|
||||||
|
|
||||||
|
In this example, Company A does not direct the deployments to an
|
||||||
|
external public cloud due to concerns regarding resource control,
|
||||||
|
security, and increased operational expense.
|
||||||
|
|
||||||
|
Bursting to a public non-OpenStack cloud
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The second example examines bursting workloads from the private cloud
|
||||||
|
into a non-OpenStack public cloud using Amazon Web Services (AWS)
|
||||||
|
to take advantage of additional capacity and to scale applications.
|
||||||
|
|
||||||
|
The following diagram demonstrates an OpenStack-to-AWS hybrid cloud:
|
||||||
|
|
||||||
|
.. figure:: figures/Multi-Cloud_Priv-AWS4.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Company B states that its developers are already using AWS
|
||||||
|
and do not want to change to a different provider.
|
||||||
|
|
||||||
|
If the CMP is capable of connecting to an external cloud
|
||||||
|
provider with an appropriate API, the workflow process remains
|
||||||
|
the same as the previous scenario.
|
||||||
|
The actions the CMP takes, such as monitoring loads and
|
||||||
|
creating new instances, stay the same.
|
||||||
|
However, the CMP performs actions in the public cloud
|
||||||
|
using applicable API calls.
|
||||||
|
|
||||||
|
If the public cloud is AWS, the CMP would use the
|
||||||
|
EC2 API to create a new instance and assign an Elastic IP.
|
||||||
|
It can then add that IP to HAProxy in the private cloud.
|
||||||
|
The CMP can also reference AWS-specific
|
||||||
|
tools such as CloudWatch and CloudFormation.
|
||||||
|
|
||||||
|
Several open source tool kits for building CMPs are
|
||||||
|
available and can handle this kind of translation.
|
||||||
|
Examples include ManageIQ, jClouds, and JumpGate.
|
||||||
|
|
||||||
|
High availability and disaster recovery
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Company C requires their local data center to be able to
|
||||||
|
recover from failure. Some of the workloads currently in
|
||||||
|
use are running on their private OpenStack cloud.
|
||||||
|
Protecting the data involves Block Storage, Object Storage,
|
||||||
|
and a database. The architecture supports the failure of
|
||||||
|
large components of the system while ensuring that the
|
||||||
|
system continues to deliver services.
|
||||||
|
While the services remain available to users, the failed
|
||||||
|
components are restored in the background based on standard
|
||||||
|
best practice data replication policies.
|
||||||
|
To achieve these objectives, Company C replicates data to
|
||||||
|
a second cloud in a geographically distant location.
|
||||||
|
The following diagram describes this system:
|
||||||
|
|
||||||
|
.. figure:: figures/Multi-Cloud_failover2.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
This example includes two private OpenStack clouds connected with a CMP.
|
||||||
|
The source cloud, OpenStack Cloud 1, includes a controller and
|
||||||
|
at least one instance running MySQL. It also includes at least
|
||||||
|
one Block Storage volume and one Object Storage volume.
|
||||||
|
This means that data is available to the users at all times.
|
||||||
|
The details of the method for protecting each of these sources
|
||||||
|
of data differs.
|
||||||
|
|
||||||
|
Object Storage relies on the replication capabilities of
|
||||||
|
the Object Storage provider.
|
||||||
|
Company C enables OpenStack Object Storage so that it creates
|
||||||
|
geographically separated replicas that take advantage of this feature.
|
||||||
|
The company configures storage so that at least one replica
|
||||||
|
exists in each cloud. In order to make this work, the company
|
||||||
|
configures a single array spanning both clouds with OpenStack Identity.
|
||||||
|
Using Federated Identity, the array talks to both clouds, communicating
|
||||||
|
with OpenStack Object Storage through the Swift proxy.
|
||||||
|
|
||||||
|
For Block Storage, the replication is a little more difficult,
|
||||||
|
and involves tools outside of OpenStack itself.
|
||||||
|
The OpenStack Block Storage volume is not set as the drive itself
|
||||||
|
but as a logical object that points to a physical back end.
|
||||||
|
Disaster recovery is configured for Block Storage for
|
||||||
|
synchronous backup for the highest level of data protection,
|
||||||
|
but asynchronous backup could have been set as an alternative
|
||||||
|
that is not as latency sensitive.
|
||||||
|
For asynchronous backup, the Block Storage API makes it possible
|
||||||
|
to export the data and also the metadata of a particular volume,
|
||||||
|
so that it can be moved and replicated elsewhere.
|
||||||
|
More information can be found here:
|
||||||
|
https://blueprints.launchpad.net/cinder/+spec/cinder-backup-volume-metadata-support.
|
||||||
|
|
||||||
|
The synchronous backups create an identical volume in both
|
||||||
|
clouds and chooses the appropriate flavor so that each cloud
|
||||||
|
has an identical back end. This is done by creating volumes
|
||||||
|
through the CMP. After this is configured, a solution
|
||||||
|
involving DRDB synchronizes the physical drives.
|
||||||
|
|
||||||
|
The database component is backed up using synchronous backups.
|
||||||
|
MySQL does not support geographically diverse replication,
|
||||||
|
so disaster recovery is provided by replicating the file itself.
|
||||||
|
As it is not possible to use Object Storage as the back end of
|
||||||
|
a database like MySQL, Swift replication is not an option.
|
||||||
|
Company C decides not to store the data on another geo-tiered
|
||||||
|
storage system, such as Ceph, as Block Storage.
|
||||||
|
This would have given another layer of protection.
|
||||||
|
Another option would have been to store the database on an OpenStack
|
||||||
|
Block Storage volume and backing it up like any other Block Storage.
|
192
doc/arch-design-draft/source/arch-examples-multi-site.rst
Normal file
@ -0,0 +1,192 @@
|
|||||||
|
=========================
|
||||||
|
Multi-site cloud examples
|
||||||
|
=========================
|
||||||
|
|
||||||
|
There are multiple ways to build a multi-site OpenStack installation,
|
||||||
|
based on the needs of the intended workloads. Below are example
|
||||||
|
architectures based on different requirements. These examples are meant
|
||||||
|
as a reference, and not a hard and fast rule for deployments. Use the
|
||||||
|
previous sections of this chapter to assist in selecting specific
|
||||||
|
components and implementations based on specific needs.
|
||||||
|
|
||||||
|
A large content provider needs to deliver content to customers that are
|
||||||
|
geographically dispersed. The workload is very sensitive to latency and
|
||||||
|
needs a rapid response to end-users. After reviewing the user, technical
|
||||||
|
and operational considerations, it is determined beneficial to build a
|
||||||
|
number of regions local to the customer's edge. Rather than build a few
|
||||||
|
large, centralized data centers, the intent of the architecture is to
|
||||||
|
provide a pair of small data centers in locations that are closer to the
|
||||||
|
customer. In this use case, spreading applications out allows for
|
||||||
|
different horizontal scaling than a traditional compute workload scale.
|
||||||
|
The intent is to scale by creating more copies of the application in
|
||||||
|
closer proximity to the users that need it most, in order to ensure
|
||||||
|
faster response time to user requests. This provider deploys two
|
||||||
|
datacenters at each of the four chosen regions. The implications of this
|
||||||
|
design are based around the method of placing copies of resources in
|
||||||
|
each of the remote regions. Swift objects, Glance images, and block
|
||||||
|
storage need to be manually replicated into each region. This may be
|
||||||
|
beneficial for some systems, such as the case of content service, where
|
||||||
|
only some of the content needs to exist in some but not all regions. A
|
||||||
|
centralized Keystone is recommended to ensure authentication and that
|
||||||
|
access to the API endpoints is easily manageable.
|
||||||
|
|
||||||
|
It is recommended that you install an automated DNS system such as
|
||||||
|
Designate. Application administrators need a way to manage the mapping
|
||||||
|
of which application copy exists in each region and how to reach it,
|
||||||
|
unless an external Dynamic DNS system is available. Designate assists by
|
||||||
|
making the process automatic and by populating the records in the each
|
||||||
|
region's zone.
|
||||||
|
|
||||||
|
Telemetry for each region is also deployed, as each region may grow
|
||||||
|
differently or be used at a different rate. Ceilometer collects each
|
||||||
|
region's meters from each of the controllers and report them back to a
|
||||||
|
central location. This is useful both to the end user and the
|
||||||
|
administrator of the OpenStack environment. The end user will find this
|
||||||
|
method useful, as it makes possible to determine if certain locations
|
||||||
|
are experiencing higher load than others, and take appropriate action.
|
||||||
|
Administrators also benefit by possibly being able to forecast growth
|
||||||
|
per region, rather than expanding the capacity of all regions
|
||||||
|
simultaneously, therefore maximizing the cost-effectiveness of the
|
||||||
|
multi-site design.
|
||||||
|
|
||||||
|
One of the key decisions of running this infrastructure is whether or
|
||||||
|
not to provide a redundancy model. Two types of redundancy and high
|
||||||
|
availability models in this configuration can be implemented. The first
|
||||||
|
type is the availability of central OpenStack components. Keystone can
|
||||||
|
be made highly available in three central data centers that host the
|
||||||
|
centralized OpenStack components. This prevents a loss of any one of the
|
||||||
|
regions causing an outage in service. It also has the added benefit of
|
||||||
|
being able to run a central storage repository as a primary cache for
|
||||||
|
distributing content to each of the regions.
|
||||||
|
|
||||||
|
The second redundancy type is the edge data center itself. A second data
|
||||||
|
center in each of the edge regional locations house a second region near
|
||||||
|
the first region. This ensures that the application does not suffer
|
||||||
|
degraded performance in terms of latency and availability.
|
||||||
|
|
||||||
|
:ref:`ms-customer-edge` depicts the solution designed to have both a
|
||||||
|
centralized set of core data centers for OpenStack services and paired edge
|
||||||
|
data centers:
|
||||||
|
|
||||||
|
.. _ms-customer-edge:
|
||||||
|
|
||||||
|
.. figure:: figures/Multi-Site_Customer_Edge.png
|
||||||
|
|
||||||
|
**Multi-site architecture example**
|
||||||
|
|
||||||
|
Geo-redundant load balancing
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A large-scale web application has been designed with cloud principles in
|
||||||
|
mind. The application is designed provide service to application store,
|
||||||
|
on a 24/7 basis. The company has typical two tier architecture with a
|
||||||
|
web front-end servicing the customer requests, and a NoSQL database back
|
||||||
|
end storing the information.
|
||||||
|
|
||||||
|
As of late there has been several outages in number of major public
|
||||||
|
cloud providers due to applications running out of a single geographical
|
||||||
|
location. The design therefore should mitigate the chance of a single
|
||||||
|
site causing an outage for their business.
|
||||||
|
|
||||||
|
The solution would consist of the following OpenStack components:
|
||||||
|
|
||||||
|
* A firewall, switches and load balancers on the public facing network
|
||||||
|
connections.
|
||||||
|
|
||||||
|
* OpenStack Controller services running, Networking, dashboard, Block
|
||||||
|
Storage and Compute running locally in each of the three regions.
|
||||||
|
Identity service, Orchestration service, Telemetry service, Image
|
||||||
|
service and Object Storage service can be installed centrally, with
|
||||||
|
nodes in each of the region providing a redundant OpenStack
|
||||||
|
Controller plane throughout the globe.
|
||||||
|
|
||||||
|
* OpenStack Compute nodes running the KVM hypervisor.
|
||||||
|
|
||||||
|
* OpenStack Object Storage for serving static objects such as images
|
||||||
|
can be used to ensure that all images are standardized across all the
|
||||||
|
regions, and replicated on a regular basis.
|
||||||
|
|
||||||
|
* A distributed DNS service available to all regions that allows for
|
||||||
|
dynamic update of DNS records of deployed instances.
|
||||||
|
|
||||||
|
* A geo-redundant load balancing service can be used to service the
|
||||||
|
requests from the customers based on their origin.
|
||||||
|
|
||||||
|
An autoscaling heat template can be used to deploy the application in
|
||||||
|
the three regions. This template includes:
|
||||||
|
|
||||||
|
* Web Servers, running Apache.
|
||||||
|
|
||||||
|
* Appropriate ``user_data`` to populate the central DNS servers upon
|
||||||
|
instance launch.
|
||||||
|
|
||||||
|
* Appropriate Telemetry alarms that maintain state of the application
|
||||||
|
and allow for handling of region or instance failure.
|
||||||
|
|
||||||
|
Another autoscaling Heat template can be used to deploy a distributed
|
||||||
|
MongoDB shard over the three locations, with the option of storing
|
||||||
|
required data on a globally available swift container. According to the
|
||||||
|
usage and load on the database server, additional shards can be
|
||||||
|
provisioned according to the thresholds defined in Telemetry.
|
||||||
|
|
||||||
|
Two data centers would have been sufficient had the requirements been
|
||||||
|
met. But three regions are selected here to avoid abnormal load on a
|
||||||
|
single region in the event of a failure.
|
||||||
|
|
||||||
|
Orchestration is used because of the built-in functionality of
|
||||||
|
autoscaling and auto healing in the event of increased load. Additional
|
||||||
|
configuration management tools, such as Puppet or Chef could also have
|
||||||
|
been used in this scenario, but were not chosen since Orchestration had
|
||||||
|
the appropriate built-in hooks into the OpenStack cloud, whereas the
|
||||||
|
other tools were external and not native to OpenStack. In addition,
|
||||||
|
external tools were not needed since this deployment scenario was
|
||||||
|
straight forward.
|
||||||
|
|
||||||
|
OpenStack Object Storage is used here to serve as a back end for the
|
||||||
|
Image service since it is the most suitable solution for a globally
|
||||||
|
distributed storage solution with its own replication mechanism. Home
|
||||||
|
grown solutions could also have been used including the handling of
|
||||||
|
replication, but were not chosen, because Object Storage is already an
|
||||||
|
intricate part of the infrastructure and a proven solution.
|
||||||
|
|
||||||
|
An external load balancing service was used and not the LBaaS in
|
||||||
|
OpenStack because the solution in OpenStack is not redundant and does
|
||||||
|
not have any awareness of geo location.
|
||||||
|
|
||||||
|
.. _ms-geo-redundant:
|
||||||
|
|
||||||
|
.. figure:: figures/Multi-site_Geo_Redundant_LB.png
|
||||||
|
|
||||||
|
**Multi-site geo-redundant architecture**
|
||||||
|
|
||||||
|
Location-local service
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A common use for multi-site OpenStack deployment is creating a Content
|
||||||
|
Delivery Network. An application that uses a location-local architecture
|
||||||
|
requires low network latency and proximity to the user to provide an
|
||||||
|
optimal user experience and reduce the cost of bandwidth and transit.
|
||||||
|
The content resides on sites closer to the customer, instead of a
|
||||||
|
centralized content store that requires utilizing higher cost
|
||||||
|
cross-country links.
|
||||||
|
|
||||||
|
This architecture includes a geo-location component that places user
|
||||||
|
requests to the closest possible node. In this scenario, 100% redundancy
|
||||||
|
of content across every site is a goal rather than a requirement, with
|
||||||
|
the intent to maximize the amount of content available within a minimum
|
||||||
|
number of network hops for end users. Despite these differences, the
|
||||||
|
storage replication configuration has significant overlap with that of a
|
||||||
|
geo-redundant load balancing use case.
|
||||||
|
|
||||||
|
In :ref:`ms-shared-keystone`, the application utilizing this multi-site
|
||||||
|
OpenStack install that is location-aware would launch web server or content
|
||||||
|
serving instances on the compute cluster in each site. Requests from clients
|
||||||
|
are first sent to a global services load balancer that determines the location
|
||||||
|
of the client, then routes the request to the closest OpenStack site where the
|
||||||
|
application completes the request.
|
||||||
|
|
||||||
|
.. _ms-shared-keystone:
|
||||||
|
|
||||||
|
.. figure:: figures/Multi-Site_shared_keystone1.png
|
||||||
|
|
||||||
|
**Multi-site shared keystone architecture**
|
166
doc/arch-design-draft/source/arch-examples-network.rst
Normal file
@ -0,0 +1,166 @@
|
|||||||
|
==============================
|
||||||
|
Network-focused cloud examples
|
||||||
|
==============================
|
||||||
|
|
||||||
|
An organization designs a large-scale web application with cloud
|
||||||
|
principles in mind. The application scales horizontally in a bursting
|
||||||
|
fashion and generates a high instance count. The application requires an
|
||||||
|
SSL connection to secure data and must not lose connection state to
|
||||||
|
individual servers.
|
||||||
|
|
||||||
|
The figure below depicts an example design for this workload. In this
|
||||||
|
example, a hardware load balancer provides SSL offload functionality and
|
||||||
|
connects to tenant networks in order to reduce address consumption. This
|
||||||
|
load balancer links to the routing architecture as it services the VIP
|
||||||
|
for the application. The router and load balancer use the GRE tunnel ID
|
||||||
|
of the application's tenant network and an IP address within the tenant
|
||||||
|
subnet but outside of the address pool. This is to ensure that the load
|
||||||
|
balancer can communicate with the application's HTTP servers without
|
||||||
|
requiring the consumption of a public IP address.
|
||||||
|
|
||||||
|
Because sessions persist until closed, the routing and switching
|
||||||
|
architecture provides high availability. Switches mesh to each
|
||||||
|
hypervisor and each other, and also provide an MLAG implementation to
|
||||||
|
ensure that layer-2 connectivity does not fail. Routers use VRRP and
|
||||||
|
fully mesh with switches to ensure layer-3 connectivity. Since GRE is
|
||||||
|
provides an overlay network, Networking is present and uses the Open
|
||||||
|
vSwitch agent in GRE tunnel mode. This ensures all devices can reach all
|
||||||
|
other devices and that you can create tenant networks for private
|
||||||
|
addressing links to the load balancer.
|
||||||
|
|
||||||
|
.. figure:: figures/Network_Web_Services1.png
|
||||||
|
|
||||||
|
A web service architecture has many options and optional components. Due
|
||||||
|
to this, it can fit into a large number of other OpenStack designs. A
|
||||||
|
few key components, however, need to be in place to handle the nature of
|
||||||
|
most web-scale workloads. You require the following components:
|
||||||
|
|
||||||
|
* OpenStack Controller services (Image, Identity, Networking and
|
||||||
|
supporting services such as MariaDB and RabbitMQ)
|
||||||
|
|
||||||
|
* OpenStack Compute running KVM hypervisor
|
||||||
|
|
||||||
|
* OpenStack Object Storage
|
||||||
|
|
||||||
|
* Orchestration service
|
||||||
|
|
||||||
|
* Telemetry service
|
||||||
|
|
||||||
|
Beyond the normal Identity, Compute, Image service, and Object Storage
|
||||||
|
components, we recommend the Orchestration service component to handle
|
||||||
|
the proper scaling of workloads to adjust to demand. Due to the
|
||||||
|
requirement for auto-scaling, the design includes the Telemetry service.
|
||||||
|
Web services tend to be bursty in load, have very defined peak and
|
||||||
|
valley usage patterns and, as a result, benefit from automatic scaling
|
||||||
|
of instances based upon traffic. At a network level, a split network
|
||||||
|
configuration works well with databases residing on private tenant
|
||||||
|
networks since these do not emit a large quantity of broadcast traffic
|
||||||
|
and may need to interconnect to some databases for content.
|
||||||
|
|
||||||
|
Load balancing
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Load balancing spreads requests across multiple instances. This workload
|
||||||
|
scales well horizontally across large numbers of instances. This enables
|
||||||
|
instances to run without publicly routed IP addresses and instead to
|
||||||
|
rely on the load balancer to provide a globally reachable service. Many
|
||||||
|
of these services do not require direct server return. This aids in
|
||||||
|
address planning and utilization at scale since only the virtual IP
|
||||||
|
(VIP) must be public.
|
||||||
|
|
||||||
|
Overlay networks
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The overlay functionality design includes OpenStack Networking in Open
|
||||||
|
vSwitch GRE tunnel mode. In this case, the layer-3 external routers pair
|
||||||
|
with VRRP, and switches pair with an implementation of MLAG to ensure
|
||||||
|
that you do not lose connectivity with the upstream routing
|
||||||
|
infrastructure.
|
||||||
|
|
||||||
|
Performance tuning
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Network level tuning for this workload is minimal. Quality-of-Service
|
||||||
|
(QoS) applies to these workloads for a middle ground Class Selector
|
||||||
|
depending on existing policies. It is higher than a best effort queue
|
||||||
|
but lower than an Expedited Forwarding or Assured Forwarding queue.
|
||||||
|
Since this type of application generates larger packets with
|
||||||
|
longer-lived connections, you can optimize bandwidth utilization for
|
||||||
|
long duration TCP. Normal bandwidth planning applies here with regards
|
||||||
|
to benchmarking a session's usage multiplied by the expected number of
|
||||||
|
concurrent sessions with overhead.
|
||||||
|
|
||||||
|
Network functions
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Network functions is a broad category but encompasses workloads that
|
||||||
|
support the rest of a system's network. These workloads tend to consist
|
||||||
|
of large amounts of small packets that are very short lived, such as DNS
|
||||||
|
queries or SNMP traps. These messages need to arrive quickly and do not
|
||||||
|
deal with packet loss as there can be a very large volume of them. There
|
||||||
|
are a few extra considerations to take into account for this type of
|
||||||
|
workload and this can change a configuration all the way to the
|
||||||
|
hypervisor level. For an application that generates 10 TCP sessions per
|
||||||
|
user with an average bandwidth of 512 kilobytes per second per flow and
|
||||||
|
expected user count of ten thousand concurrent users, the expected
|
||||||
|
bandwidth plan is approximately 4.88 gigabits per second.
|
||||||
|
|
||||||
|
The supporting network for this type of configuration needs to have a
|
||||||
|
low latency and evenly distributed availability. This workload benefits
|
||||||
|
from having services local to the consumers of the service. Use a
|
||||||
|
multi-site approach as well as deploying many copies of the application
|
||||||
|
to handle load as close as possible to consumers. Since these
|
||||||
|
applications function independently, they do not warrant running
|
||||||
|
overlays to interconnect tenant networks. Overlays also have the
|
||||||
|
drawback of performing poorly with rapid flow setup and may incur too
|
||||||
|
much overhead with large quantities of small packets and therefore we do
|
||||||
|
not recommend them.
|
||||||
|
|
||||||
|
QoS is desirable for some workloads to ensure delivery. DNS has a major
|
||||||
|
impact on the load times of other services and needs to be reliable and
|
||||||
|
provide rapid responses. Configure rules in upstream devices to apply a
|
||||||
|
higher Class Selector to DNS to ensure faster delivery or a better spot
|
||||||
|
in queuing algorithms.
|
||||||
|
|
||||||
|
Cloud storage
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Another common use case for OpenStack environments is providing a
|
||||||
|
cloud-based file storage and sharing service. You might consider this a
|
||||||
|
storage-focused use case, but its network-side requirements make it a
|
||||||
|
network-focused use case.
|
||||||
|
|
||||||
|
For example, consider a cloud backup application. This workload has two
|
||||||
|
specific behaviors that impact the network. Because this workload is an
|
||||||
|
externally-facing service and an internally-replicating application, it
|
||||||
|
has both :term:`north-south<north-south traffic>` and
|
||||||
|
:term:`east-west<east-west traffic>` traffic considerations:
|
||||||
|
|
||||||
|
north-south traffic
|
||||||
|
When a user uploads and stores content, that content moves into the
|
||||||
|
OpenStack installation. When users download this content, the
|
||||||
|
content moves out from the OpenStack installation. Because this
|
||||||
|
service operates primarily as a backup, most of the traffic moves
|
||||||
|
southbound into the environment. In this situation, it benefits you
|
||||||
|
to configure a network to be asymmetrically downstream because the
|
||||||
|
traffic that enters the OpenStack installation is greater than the
|
||||||
|
traffic that leaves the installation.
|
||||||
|
|
||||||
|
east-west traffic
|
||||||
|
Likely to be fully symmetric. Because replication originates from
|
||||||
|
any node and might target multiple other nodes algorithmically, it
|
||||||
|
is less likely for this traffic to have a larger volume in any
|
||||||
|
specific direction. However this traffic might interfere with
|
||||||
|
north-south traffic.
|
||||||
|
|
||||||
|
.. figure:: figures/Network_Cloud_Storage2.png
|
||||||
|
|
||||||
|
This application prioritizes the north-south traffic over east-west
|
||||||
|
traffic: the north-south traffic involves customer-facing data.
|
||||||
|
|
||||||
|
The network design in this case is less dependent on availability and
|
||||||
|
more dependent on being able to handle high bandwidth. As a direct
|
||||||
|
result, it is beneficial to forgo redundant links in favor of bonding
|
||||||
|
those connections. This increases available bandwidth. It is also
|
||||||
|
beneficial to configure all devices in the path, including OpenStack, to
|
||||||
|
generate and pass jumbo frames.
|
42
doc/arch-design-draft/source/arch-examples-specialized.rst
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
=================
|
||||||
|
Specialized cases
|
||||||
|
=================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
specialized-multi-hypervisor.rst
|
||||||
|
specialized-networking.rst
|
||||||
|
specialized-software-defined-networking.rst
|
||||||
|
specialized-desktop-as-a-service.rst
|
||||||
|
specialized-openstack-on-openstack.rst
|
||||||
|
specialized-hardware.rst
|
||||||
|
specialized-single-site.rst
|
||||||
|
specialized-add-region.rst
|
||||||
|
specialized-scaling-multiple-cells.rst
|
||||||
|
|
||||||
|
Although OpenStack architecture designs have been described
|
||||||
|
in seven major scenarios outlined in other sections
|
||||||
|
(compute focused, network focused, storage focused, general
|
||||||
|
purpose, multi-site, hybrid cloud, and massively scalable),
|
||||||
|
there are a few use cases that do not fit into these categories.
|
||||||
|
This section discusses these specialized cases and provide some
|
||||||
|
additional details and design considerations for each use case:
|
||||||
|
|
||||||
|
* :doc:`Specialized networking <specialized-networking>`:
|
||||||
|
describes running networking-oriented software that may involve reading
|
||||||
|
packets directly from the wire or participating in routing protocols.
|
||||||
|
* :doc:`Software-defined networking (SDN)
|
||||||
|
<specialized-software-defined-networking>`:
|
||||||
|
describes both running an SDN controller from within OpenStack
|
||||||
|
as well as participating in a software-defined network.
|
||||||
|
* :doc:`Desktop-as-a-Service <specialized-desktop-as-a-service>`:
|
||||||
|
describes running a virtualized desktop environment in a cloud
|
||||||
|
(:term:`Desktop-as-a-Service`).
|
||||||
|
This applies to private and public clouds.
|
||||||
|
* :doc:`OpenStack on OpenStack <specialized-openstack-on-openstack>`:
|
||||||
|
describes building a multi-tiered cloud by running OpenStack
|
||||||
|
on top of an OpenStack installation.
|
||||||
|
* :doc:`Specialized hardware <specialized-hardware>`:
|
||||||
|
describes the use of specialized hardware devices from within
|
||||||
|
the OpenStack environment.
|
143
doc/arch-design-draft/source/arch-examples-storage.rst
Normal file
@ -0,0 +1,143 @@
|
|||||||
|
==============================
|
||||||
|
Storage-focused cloud examples
|
||||||
|
==============================
|
||||||
|
|
||||||
|
Storage-focused architecture depends on specific use cases. This section
|
||||||
|
discusses three example use cases:
|
||||||
|
|
||||||
|
* An object store with a RESTful interface
|
||||||
|
|
||||||
|
* Compute analytics with parallel file systems
|
||||||
|
|
||||||
|
* High performance database
|
||||||
|
|
||||||
|
The example below shows a REST interface without a high performance
|
||||||
|
requirement.
|
||||||
|
|
||||||
|
Swift is a highly scalable object store that is part of the OpenStack
|
||||||
|
project. This diagram explains the example architecture:
|
||||||
|
|
||||||
|
.. figure:: figures/Storage_Object.png
|
||||||
|
|
||||||
|
The example REST interface, presented as a traditional Object store
|
||||||
|
running on traditional spindles, does not require a high performance
|
||||||
|
caching tier.
|
||||||
|
|
||||||
|
This example uses the following components:
|
||||||
|
|
||||||
|
Network:
|
||||||
|
|
||||||
|
* 10 GbE horizontally scalable spine leaf back-end storage and front
|
||||||
|
end network.
|
||||||
|
|
||||||
|
Storage hardware:
|
||||||
|
|
||||||
|
* 10 storage servers each with 12x4 TB disks equaling 480 TB total
|
||||||
|
space with approximately 160 TB of usable space after replicas.
|
||||||
|
|
||||||
|
Proxy:
|
||||||
|
|
||||||
|
* 3x proxies
|
||||||
|
|
||||||
|
* 2x10 GbE bonded front end
|
||||||
|
|
||||||
|
* 2x10 GbE back-end bonds
|
||||||
|
|
||||||
|
* Approximately 60 Gb of total bandwidth to the back-end storage
|
||||||
|
cluster
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
It may be necessary to implement a 3rd-party caching layer for some
|
||||||
|
applications to achieve suitable performance.
|
||||||
|
|
||||||
|
Compute analytics with Data processing service
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Analytics of large data sets are dependent on the performance of the
|
||||||
|
storage system. Clouds using storage systems such as Hadoop Distributed
|
||||||
|
File System (HDFS) have inefficiencies which can cause performance
|
||||||
|
issues.
|
||||||
|
|
||||||
|
One potential solution to this problem is the implementation of storage
|
||||||
|
systems designed for performance. Parallel file systems have previously
|
||||||
|
filled this need in the HPC space and are suitable for large scale
|
||||||
|
performance-orientated systems.
|
||||||
|
|
||||||
|
OpenStack has integration with Hadoop to manage the Hadoop cluster
|
||||||
|
within the cloud. The following diagram shows an OpenStack store with a
|
||||||
|
high performance requirement:
|
||||||
|
|
||||||
|
.. figure:: figures/Storage_Hadoop3.png
|
||||||
|
|
||||||
|
The hardware requirements and configuration are similar to those of the
|
||||||
|
High Performance Database example below. In this case, the architecture
|
||||||
|
uses Ceph's Swift-compatible REST interface, features that allow for
|
||||||
|
connecting a caching pool to allow for acceleration of the presented
|
||||||
|
pool.
|
||||||
|
|
||||||
|
High performance database with Database service
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Databases are a common workload that benefit from high performance
|
||||||
|
storage back ends. Although enterprise storage is not a requirement,
|
||||||
|
many environments have existing storage that OpenStack cloud can use as
|
||||||
|
back ends. You can create a storage pool to provide block devices with
|
||||||
|
OpenStack Block Storage for instances as well as object interfaces. In
|
||||||
|
this example, the database I-O requirements are high and demand storage
|
||||||
|
presented from a fast SSD pool.
|
||||||
|
|
||||||
|
A storage system presents a LUN backed by a set of SSDs using a
|
||||||
|
traditional storage array with OpenStack Block Storage integration or a
|
||||||
|
storage platform such as Ceph or Gluster.
|
||||||
|
|
||||||
|
This system can provide additional performance. For example, in the
|
||||||
|
database example below, a portion of the SSD pool can act as a block
|
||||||
|
device to the Database server. In the high performance analytics
|
||||||
|
example, the inline SSD cache layer accelerates the REST interface.
|
||||||
|
|
||||||
|
.. figure:: figures/Storage_Database_+_Object5.png
|
||||||
|
|
||||||
|
In this example, Ceph presents a Swift-compatible REST interface, as
|
||||||
|
well as a block level storage from a distributed storage cluster. It is
|
||||||
|
highly flexible and has features that enable reduced cost of operations
|
||||||
|
such as self healing and auto balancing. Using erasure coded pools are a
|
||||||
|
suitable way of maximizing the amount of usable space.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
There are special considerations around erasure coded pools. For
|
||||||
|
example, higher computational requirements and limitations on the
|
||||||
|
operations allowed on an object; erasure coded pools do not support
|
||||||
|
partial writes.
|
||||||
|
|
||||||
|
Using Ceph as an applicable example, a potential architecture would have
|
||||||
|
the following requirements:
|
||||||
|
|
||||||
|
Network:
|
||||||
|
|
||||||
|
* 10 GbE horizontally scalable spine leaf back-end storage and
|
||||||
|
front-end network
|
||||||
|
|
||||||
|
Storage hardware:
|
||||||
|
|
||||||
|
* 5 storage servers for caching layer 24x1 TB SSD
|
||||||
|
|
||||||
|
* 10 storage servers each with 12x4 TB disks which equals 480 TB total
|
||||||
|
space with about approximately 160 TB of usable space after 3
|
||||||
|
replicas
|
||||||
|
|
||||||
|
REST proxy:
|
||||||
|
|
||||||
|
* 3x proxies
|
||||||
|
|
||||||
|
* 2x10 GbE bonded front end
|
||||||
|
|
||||||
|
* 2x10 GbE back-end bonds
|
||||||
|
|
||||||
|
* Approximately 60 Gb of total bandwidth to the back-end storage
|
||||||
|
cluster
|
||||||
|
|
||||||
|
Using an SSD cache layer, you can present block devices directly to
|
||||||
|
hypervisors or instances. The REST interface can also use the SSD cache
|
||||||
|
systems as an inline cache.
|
14
doc/arch-design-draft/source/arch-examples.rst
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
===========================
|
||||||
|
Cloud architecture examples
|
||||||
|
===========================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
arch-examples-general.rst
|
||||||
|
arch-examples-compute.rst
|
||||||
|
arch-examples-storage.rst
|
||||||
|
arch-examples-network.rst
|
||||||
|
arch-examples-multi-site.rst
|
||||||
|
arch-examples-hybrid.rst
|
||||||
|
arch-examples-specialized.rst
|
@ -1,9 +0,0 @@
|
|||||||
=====================
|
|
||||||
Example architectures
|
|
||||||
=====================
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 2
|
|
||||||
|
|
||||||
|
|
||||||
|
|
BIN
doc/arch-design-draft/source/figures/Compute_NSX.png
Normal file
After Width: | Height: | Size: 52 KiB |
After Width: | Height: | Size: 39 KiB |
After Width: | Height: | Size: 35 KiB |
BIN
doc/arch-design-draft/source/figures/General_Architecture3.png
Normal file
After Width: | Height: | Size: 79 KiB |
After Width: | Height: | Size: 70 KiB |
BIN
doc/arch-design-draft/source/figures/Generic_CERN_Example.png
Normal file
After Width: | Height: | Size: 24 KiB |
After Width: | Height: | Size: 42 KiB |
BIN
doc/arch-design-draft/source/figures/Multi-Cloud_Priv-AWS4.png
Normal file
After Width: | Height: | Size: 59 KiB |
BIN
doc/arch-design-draft/source/figures/Multi-Cloud_Priv-Pub3.png
Normal file
After Width: | Height: | Size: 54 KiB |
BIN
doc/arch-design-draft/source/figures/Multi-Cloud_failover2.png
Normal file
After Width: | Height: | Size: 54 KiB |
After Width: | Height: | Size: 68 KiB |
After Width: | Height: | Size: 50 KiB |
After Width: | Height: | Size: 52 KiB |
After Width: | Height: | Size: 75 KiB |
BIN
doc/arch-design-draft/source/figures/Network_Cloud_Storage2.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
doc/arch-design-draft/source/figures/Network_Web_Services1.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
doc/arch-design-draft/source/figures/Specialized_Hardware2.png
Normal file
After Width: | Height: | Size: 46 KiB |
BIN
doc/arch-design-draft/source/figures/Specialized_OOO.png
Normal file
After Width: | Height: | Size: 56 KiB |
After Width: | Height: | Size: 30 KiB |
BIN
doc/arch-design-draft/source/figures/Specialized_SDN_hosted.png
Normal file
After Width: | Height: | Size: 22 KiB |
BIN
doc/arch-design-draft/source/figures/Specialized_VDI1.png
Normal file
After Width: | Height: | Size: 25 KiB |
After Width: | Height: | Size: 50 KiB |
BIN
doc/arch-design-draft/source/figures/Storage_Hadoop3.png
Normal file
After Width: | Height: | Size: 50 KiB |
BIN
doc/arch-design-draft/source/figures/Storage_Object.png
Normal file
After Width: | Height: | Size: 35 KiB |
@ -32,7 +32,7 @@ Contents
|
|||||||
high-availability.rst
|
high-availability.rst
|
||||||
security-requirements.rst
|
security-requirements.rst
|
||||||
legal-requirements.rst
|
legal-requirements.rst
|
||||||
example-architectures.rst
|
arch-examples.rst
|
||||||
common/app_support.rst
|
common/app_support.rst
|
||||||
common/glossary.rst
|
common/glossary.rst
|
||||||
|
|
||||||
|
5
doc/arch-design-draft/source/specialized-add-region.rst
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
=====================
|
||||||
|
Adding another region
|
||||||
|
=====================
|
||||||
|
|
||||||
|
.. TODO
|
@ -0,0 +1,47 @@
|
|||||||
|
====================
|
||||||
|
Desktop-as-a-Service
|
||||||
|
====================
|
||||||
|
|
||||||
|
Virtual Desktop Infrastructure (VDI) is a service that hosts
|
||||||
|
user desktop environments on remote servers. This application
|
||||||
|
is very sensitive to network latency and requires a high
|
||||||
|
performance compute environment. Traditionally these types of
|
||||||
|
services do not use cloud environments because few clouds
|
||||||
|
support such a demanding workload for user-facing applications.
|
||||||
|
As cloud environments become more robust, vendors are starting
|
||||||
|
to provide services that provide virtual desktops in the cloud.
|
||||||
|
OpenStack may soon provide the infrastructure for these types of deployments.
|
||||||
|
|
||||||
|
Challenges
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Designing an infrastructure that is suitable to host virtual
|
||||||
|
desktops is a very different task to that of most virtual workloads.
|
||||||
|
For example, the design must consider:
|
||||||
|
|
||||||
|
* Boot storms, when a high volume of logins occur in a short period of time
|
||||||
|
* The performance of the applications running on virtual desktops
|
||||||
|
* Operating systems and their compatibility with the OpenStack hypervisor
|
||||||
|
|
||||||
|
Broker
|
||||||
|
~~~~~~
|
||||||
|
|
||||||
|
The connection broker determines which remote desktop host
|
||||||
|
users can access. Medium and large scale environments require a broker
|
||||||
|
since its service represents a central component of the architecture.
|
||||||
|
The broker is a complete management product, and enables automated
|
||||||
|
deployment and provisioning of remote desktop hosts.
|
||||||
|
|
||||||
|
Possible solutions
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
There are a number of commercial products currently available that
|
||||||
|
provide a broker solution. However, no native OpenStack projects
|
||||||
|
provide broker services.
|
||||||
|
Not providing a broker is also an option, but managing this manually
|
||||||
|
would not suffice for a large scale, enterprise solution.
|
||||||
|
|
||||||
|
Diagram
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
.. figure:: figures/Specialized_VDI1.png
|
43
doc/arch-design-draft/source/specialized-hardware.rst
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
====================
|
||||||
|
Specialized hardware
|
||||||
|
====================
|
||||||
|
|
||||||
|
Certain workloads require specialized hardware devices that
|
||||||
|
have significant virtualization or sharing challenges.
|
||||||
|
Applications such as load balancers, highly parallel brute
|
||||||
|
force computing, and direct to wire networking may need
|
||||||
|
capabilities that basic OpenStack components do not provide.
|
||||||
|
|
||||||
|
Challenges
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Some applications need access to hardware devices to either
|
||||||
|
improve performance or provide capabilities that are not
|
||||||
|
virtual CPU, RAM, network, or storage. These can be a shared
|
||||||
|
resource, such as a cryptography processor, or a dedicated
|
||||||
|
resource, such as a Graphics Processing Unit (GPU). OpenStack can
|
||||||
|
provide some of these, while others may need extra work.
|
||||||
|
|
||||||
|
Solutions
|
||||||
|
~~~~~~~~~
|
||||||
|
|
||||||
|
To provide cryptography offloading to a set of instances,
|
||||||
|
you can use Image service configuration options.
|
||||||
|
For example, assign the cryptography chip to a device node in the guest.
|
||||||
|
The OpenStack Command Line Reference contains further information on
|
||||||
|
configuring this solution in the section `Image service property keys
|
||||||
|
<http://docs.openstack.org/cli-reference/glance.html#image-service-property-keys>`_.
|
||||||
|
A challenge, however, is this option allows all guests using the
|
||||||
|
configured images to access the hypervisor cryptography device.
|
||||||
|
|
||||||
|
If you require direct access to a specific device, PCI pass-through
|
||||||
|
enables you to dedicate the device to a single instance per hypervisor.
|
||||||
|
You must define a flavor that has the PCI device specifically in order
|
||||||
|
to properly schedule instances.
|
||||||
|
More information regarding PCI pass-through, including instructions for
|
||||||
|
implementing and using it, is available at
|
||||||
|
`https://wiki.openstack.org/wiki/Pci_passthrough <https://wiki.openstack.org/
|
||||||
|
wiki/Pci_passthrough#How_to_check_PCI_status_with_PCI_api_patches>`_.
|
||||||
|
|
||||||
|
.. figure:: figures/Specialized_Hardware2.png
|
||||||
|
:width: 100%
|
@ -0,0 +1,78 @@
|
|||||||
|
========================
|
||||||
|
Multi-hypervisor example
|
||||||
|
========================
|
||||||
|
|
||||||
|
A financial company requires its applications migrated
|
||||||
|
from a traditional, virtualized environment to an API driven,
|
||||||
|
orchestrated environment. The new environment needs
|
||||||
|
multiple hypervisors since many of the company's applications
|
||||||
|
have strict hypervisor requirements.
|
||||||
|
|
||||||
|
Currently, the company's vSphere environment runs 20 VMware
|
||||||
|
ESXi hypervisors. These hypervisors support 300 instances of
|
||||||
|
various sizes. Approximately 50 of these instances must run
|
||||||
|
on ESXi. The remaining 250 or so have more flexible requirements.
|
||||||
|
|
||||||
|
The financial company decides to manage the
|
||||||
|
overall system with a common OpenStack platform.
|
||||||
|
|
||||||
|
.. figure:: figures/Compute_NSX.png
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
|
Architecture planning teams decided to run a host aggregate
|
||||||
|
containing KVM hypervisors for the general purpose instances.
|
||||||
|
A separate host aggregate targets instances requiring ESXi.
|
||||||
|
|
||||||
|
Images in the OpenStack Image service have particular
|
||||||
|
hypervisor metadata attached. When a user requests a
|
||||||
|
certain image, the instance spawns on the relevant aggregate.
|
||||||
|
|
||||||
|
Images for ESXi use the VMDK format. You can convert
|
||||||
|
QEMU disk images to VMDK, VMFS Flat Disks. These disk images
|
||||||
|
can also be thin, thick, zeroed-thick, and eager-zeroed-thick.
|
||||||
|
After exporting a VMFS thin disk from VMFS to the
|
||||||
|
OpenStack Image service (a non-VMFS location), it becomes a
|
||||||
|
preallocated flat disk. This impacts the transfer time from the
|
||||||
|
OpenStack Image service to the data store since transfers require
|
||||||
|
moving the full preallocated flat disk rather than the thin disk.
|
||||||
|
|
||||||
|
The VMware host aggregate compute nodes communicate with
|
||||||
|
vCenter rather than spawning directly on a hypervisor.
|
||||||
|
The vCenter then requests scheduling for the instance to run on
|
||||||
|
an ESXi hypervisor.
|
||||||
|
|
||||||
|
This functionality requires that VMware Distributed Resource
|
||||||
|
Scheduler (DRS) is enabled on a cluster and set to **Fully Automated**.
|
||||||
|
The vSphere requires shared storage because the DRS uses vMotion
|
||||||
|
which is a service that relies on shared storage.
|
||||||
|
|
||||||
|
This solution to the company's migration uses shared storage
|
||||||
|
to provide Block Storage capabilities to the KVM instances while
|
||||||
|
also providing vSphere storage. The new environment provides this
|
||||||
|
storage functionality using a dedicated data network. The
|
||||||
|
compute hosts should have dedicated NICs to support the
|
||||||
|
dedicated data network. vSphere supports OpenStack Block Storage. This
|
||||||
|
support gives storage from a VMFS datastore to an instance. For the
|
||||||
|
financial company, Block Storage in their new architecture supports
|
||||||
|
both hypervisors.
|
||||||
|
|
||||||
|
OpenStack Networking provides network connectivity in this new
|
||||||
|
architecture, with the VMware NSX plug-in driver configured. legacy
|
||||||
|
networking (nova-network) supports both hypervisors in this new
|
||||||
|
architecture example, but has limitations. Specifically, vSphere
|
||||||
|
with legacy networking does not support security groups. The new
|
||||||
|
architecture uses VMware NSX as a part of the design. When users launch an
|
||||||
|
instance within either of the host aggregates, VMware NSX ensures the
|
||||||
|
instance attaches to the appropriate network overlay-based logical networks.
|
||||||
|
|
||||||
|
The architecture planning teams also consider OpenStack Compute integration.
|
||||||
|
When running vSphere in an OpenStack environment, nova-compute
|
||||||
|
communications with vCenter appear as a single large hypervisor.
|
||||||
|
This hypervisor represents the entire ESXi cluster. Multiple nova-compute
|
||||||
|
instances can represent multiple ESXi clusters. They can connect to
|
||||||
|
multiple vCenter servers. If the process running nova-compute
|
||||||
|
crashes it cuts the connection to the vCenter server.
|
||||||
|
Any ESXi clusters will stop running, and you will not be able to
|
||||||
|
provision further instances on the vCenter, even if you enable high
|
||||||
|
availability. You must monitor the nova-compute service connected
|
||||||
|
to vSphere carefully for any disruptions as a result of this failure point.
|
32
doc/arch-design-draft/source/specialized-networking.rst
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
==============================
|
||||||
|
Specialized networking example
|
||||||
|
==============================
|
||||||
|
|
||||||
|
Some applications that interact with a network require
|
||||||
|
specialized connectivity. Applications such as a looking glass
|
||||||
|
require the ability to connect to a BGP peer, or route participant
|
||||||
|
applications may need to join a network at a layer2 level.
|
||||||
|
|
||||||
|
Challenges
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Connecting specialized network applications to their required
|
||||||
|
resources alters the design of an OpenStack installation.
|
||||||
|
Installations that rely on overlay networks are unable to
|
||||||
|
support a routing participant, and may also block layer-2 listeners.
|
||||||
|
|
||||||
|
Possible solutions
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Deploying an OpenStack installation using OpenStack Networking with a
|
||||||
|
provider network allows direct layer-2 connectivity to an
|
||||||
|
upstream networking device.
|
||||||
|
This design provides the layer-2 connectivity required to communicate
|
||||||
|
via Intermediate System-to-Intermediate System (ISIS) protocol or
|
||||||
|
to pass packets controlled by an OpenFlow controller.
|
||||||
|
Using the multiple layer-2 plug-in with an agent such as
|
||||||
|
:term:`Open vSwitch` allows a private connection through a VLAN
|
||||||
|
directly to a specific port in a layer-3 device.
|
||||||
|
This allows a BGP point-to-point link to join the autonomous system.
|
||||||
|
Avoid using layer-3 plug-ins as they divide the broadcast
|
||||||
|
domain and prevent router adjacencies from forming.
|
@ -0,0 +1,70 @@
|
|||||||
|
======================
|
||||||
|
OpenStack on OpenStack
|
||||||
|
======================
|
||||||
|
|
||||||
|
In some cases, users may run OpenStack nested on top
|
||||||
|
of another OpenStack cloud. This scenario describes how to
|
||||||
|
manage and provision complete OpenStack environments on instances
|
||||||
|
supported by hypervisors and servers, which an underlying OpenStack
|
||||||
|
environment controls.
|
||||||
|
|
||||||
|
Public cloud providers can use this technique to manage the
|
||||||
|
upgrade and maintenance process on complete OpenStack environments.
|
||||||
|
Developers and those testing OpenStack can also use this
|
||||||
|
technique to provision their own OpenStack environments on
|
||||||
|
available OpenStack Compute resources, whether public or private.
|
||||||
|
|
||||||
|
Challenges
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
The network aspect of deploying a nested cloud is the most
|
||||||
|
complicated aspect of this architecture.
|
||||||
|
You must expose VLANs to the physical ports on which the underlying
|
||||||
|
cloud runs because the bare metal cloud owns all the hardware.
|
||||||
|
You must also expose them to the nested levels as well.
|
||||||
|
Alternatively, you can use the network overlay technologies on the
|
||||||
|
OpenStack environment running on the host OpenStack environment to
|
||||||
|
provide the required software defined networking for the deployment.
|
||||||
|
|
||||||
|
Hypervisor
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
In this example architecture, consider which
|
||||||
|
approach you should take to provide a nested
|
||||||
|
hypervisor in OpenStack. This decision influences which
|
||||||
|
operating systems you use for the deployment of the nested
|
||||||
|
OpenStack deployments.
|
||||||
|
|
||||||
|
Possible solutions: deployment
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Deployment of a full stack can be challenging but you can mitigate
|
||||||
|
this difficulty by creating a Heat template to deploy the
|
||||||
|
entire stack, or a configuration management system. After creating
|
||||||
|
the Heat template, you can automate the deployment of additional stacks.
|
||||||
|
|
||||||
|
The OpenStack-on-OpenStack project (:term:`TripleO`)
|
||||||
|
addresses this issue. Currently, however, the project does
|
||||||
|
not completely cover nested stacks. For more information, see
|
||||||
|
https://wiki.openstack.org/wiki/TripleO.
|
||||||
|
|
||||||
|
Possible solutions: hypervisor
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In the case of running TripleO, the underlying OpenStack
|
||||||
|
cloud deploys the Compute nodes as bare-metal. You then deploy
|
||||||
|
OpenStack on these Compute bare-metal servers with the
|
||||||
|
appropriate hypervisor, such as KVM.
|
||||||
|
|
||||||
|
In the case of running smaller OpenStack clouds for testing
|
||||||
|
purposes, where performance is not a critical factor, you can use
|
||||||
|
QEMU instead. It is also possible to run a KVM hypervisor in an instance
|
||||||
|
(see http://davejingtian.org/2014/03/30/nested-kvm-just-for-fun/),
|
||||||
|
though this is not a supported configuration, and could be a
|
||||||
|
complex solution for such a use case.
|
||||||
|
|
||||||
|
Diagram
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
.. figure:: figures/Specialized_OOO.png
|
||||||
|
:width: 100%
|
@ -0,0 +1,5 @@
|
|||||||
|
======================
|
||||||
|
Scaling multiple cells
|
||||||
|
======================
|
||||||
|
|
||||||
|
.. TODO
|
5
doc/arch-design-draft/source/specialized-single-site.rst
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
==================================================
|
||||||
|
Single site architecture with OpenStack Networking
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
.. TODO
|
@ -0,0 +1,46 @@
|
|||||||
|
===========================
|
||||||
|
Software-defined networking
|
||||||
|
===========================
|
||||||
|
|
||||||
|
Software-defined networking (SDN) is the separation of the data
|
||||||
|
plane and control plane. SDN is a popular method of
|
||||||
|
managing and controlling packet flows within networks.
|
||||||
|
SDN uses overlays or directly controlled layer-2 devices to
|
||||||
|
determine flow paths, and as such presents challenges to a
|
||||||
|
cloud environment. Some designers may wish to run their
|
||||||
|
controllers within an OpenStack installation. Others may wish
|
||||||
|
to have their installations participate in an SDN-controlled network.
|
||||||
|
|
||||||
|
Challenges
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
SDN is a relatively new concept that is not yet standardized,
|
||||||
|
so SDN systems come in a variety of different implementations.
|
||||||
|
Because of this, a truly prescriptive architecture is not feasible.
|
||||||
|
Instead, examine the differences between an existing and a planned
|
||||||
|
OpenStack design and determine where potential conflicts and gaps exist.
|
||||||
|
|
||||||
|
Possible solutions
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
If an SDN implementation requires layer-2 access because it
|
||||||
|
directly manipulates switches, we do not recommend running an
|
||||||
|
overlay network or a layer-3 agent.
|
||||||
|
If the controller resides within an OpenStack installation,
|
||||||
|
it may be necessary to build an ML2 plug-in and schedule the
|
||||||
|
controller instances to connect to tenant VLANs that they can
|
||||||
|
talk directly to the switch hardware.
|
||||||
|
Alternatively, depending on the external device support,
|
||||||
|
use a tunnel that terminates at the switch hardware itself.
|
||||||
|
|
||||||
|
Diagram
|
||||||
|
-------
|
||||||
|
|
||||||
|
OpenStack hosted SDN controller:
|
||||||
|
|
||||||
|
.. figure:: figures/Specialized_SDN_hosted.png
|
||||||
|
|
||||||
|
OpenStack participating in an SDN controller network:
|
||||||
|
|
||||||
|
.. figure:: figures/Specialized_SDN_external.png
|
||||||
|
|