Merge "[arch-design] Convert massively scalable to RST"

2015-11-12 20:53:16 +00:00 · 2015-11-12 20:53:16 +00:00 · 366d989652
commit 366d989652
parent 6e5f56471f 900caef15a
3 changed files with 108 additions and 0 deletions
--- a/doc/arch-design-rst/source/figures/Massively_Scalable_Cells_regions_azs.png
+++ b/doc/arch-design-rst/source/figures/Massively_Scalable_Cells_regions_azs.png
--- a/doc/arch-design-rst/source/massively-scalable.rst
+++ b/doc/arch-design-rst/source/massively-scalable.rst
@ -6,6 +6,7 @@ Massively scalable
   :maxdepth: 2

   user-requirements-massively-scalable.rst
+   tech-considerations-massively-scalable.rst

 A massively scalable architecture is a cloud implementation
 that is either a very large deployment, such as a commercial
--- a/doc/arch-design-rst/source/tech-considerations-massively-scalable.rst
+++ b/doc/arch-design-rst/source/tech-considerations-massively-scalable.rst
@ -0,0 +1,107 @@
+Technical considerations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Repurposing an existing OpenStack environment to be massively scalable is a
+formidable task. When building a massively scalable environment from the
+ground up, ensure you build the initial deployment with the same principles
+and choices that apply as the environment grows. For example, a good approach
+is to deploy the first site as a multi-site environment. This enables you to
+use the same deployment and segregation methods as the environment grows to
+separate locations across dedicated links or wide area networks. In a
+hyperscale cloud, scale trumps redundancy. Modify applications with this in
+mind, relying on the scale and homogeneity of the environment to provide
+reliability rather than redundant infrastructure provided by non-commodity
+hardware solutions.
+
+Infrastructure segregation
+--------------------------
+
+OpenStack services support massive horizontal scale. Be aware that this is
+not the case for the entire supporting infrastructure. This is particularly a
+problem for the database management systems and message queues that OpenStack
+services use for data storage and remote procedure call communications.
+
+Traditional clustering techniques typically provide high availability and some
+additional scale for these environments. In the quest for massive scale,
+however, you must take additional steps to relieve the performance pressure on
+these components in order to prevent them from negatively impacting the
+overall performance of the environment. Ensure that all the components are in
+balance so that if the massively scalable environment fails, all the
+components are near maximum capacity and a single component is not causing the
+failure.
+
+Regions segregate completely independent installations linked only by an
+Identity and Dashboard (optional) installation. Services have separate API
+endpoints for each region, and include separate database and queue
+installations. This exposes some awareness of the environment's fault domains
+to users and gives them the ability to ensure some degree of application
+resiliency while also imposing the requirement to specify which region to
+apply their actions to.
+
+Environments operating at massive scale typically need their regions or sites
+subdivided further without exposing the requirement to specify the failure
+domain to the user. This provides the ability to further divide the
+installation into failure domains while also providing a logical unit for
+maintenance and the addition of new hardware. At hyperscale, instead of adding
+single compute nodes, administrators can add entire racks or even groups of
+racks at a time with each new addition of nodes exposed via one of the
+segregation concepts mentioned herein.
+
+:term:`Cells <cell>` provide the ability to subdivide the compute portion of
+an OpenStack installation, including regions, while still exposing a single
+endpoint. Each region has an API cell along with a number of compute cells
+where the workloads actually run. Each cell has its own database and message
+queue setup (ideally clustered), providing the ability to subdivide the load
+on these subsystems, improving overall performance.
+
+Each compute cell provides a complete compute installation, complete with full
+database and queue installations, scheduler, conductor, and multiple compute
+hosts. The cells scheduler handles placement of user requests from the single
+API endpoint to a specific cell from those available. The normal filter
+scheduler then handles placement within the cell.
+
+Unfortunately, Compute is the only OpenStack service that provides good
+support for cells. In addition, cells do not adequately support some standard
+OpenStack functionality such as security groups and host aggregates. Due to
+their relative newness and specialized use, cells receive relatively little
+testing in the OpenStack gate. Despite these issues, cells play an important
+role in well known OpenStack installations operating at massive scale, such as
+those at CERN and Rackspace.
+
+Host aggregates
+---------------
+
+Host aggregates enable partitioning of OpenStack Compute deployments into
+logical groups for load balancing and instance distribution. You can also use
+host aggregates to further partition an availability zone. Consider a cloud
+which might use host aggregates to partition an availability zone into groups
+of hosts that either share common resources, such as storage and network, or
+have a special property, such as trusted computing hardware. You cannot target
+host aggregates explicitly. Instead, select instance flavors that map to host
+aggregate metadata. These flavors target host aggregates implicitly.
+
+Availability zones
+------------------
+
+Availability zones provide another mechanism for subdividing an installation
+or region. They are, in effect, host aggregates exposed for (optional)
+explicit targeting by users.
+
+Unlike cells, availability zones do not have their own database server or
+queue broker but represent an arbitrary grouping of compute nodes. Typically,
+nodes are grouped into availability zones using a shared failure domain based
+on a physical characteristic such as a shared power source or physical network
+connections. Users can target exposed availability zones; however, this is not
+a requirement. An alternative approach is to set a default availability zone
+to schedule instances to a non-default availability zone of nova.
+
+Segregation example
+-------------------
+
+In this example the cloud is divided into two regions, one for each site, with
+two availability zones in each based on the power layout of the data centers.
+A number of host aggregates enable targeting of virtual machine instances
+using flavors, that require special capabilities shared by the target hosts
+such as SSDs, 10 GbE networks, or GPU cards.
+
+.. figure:: /figures/Massively_Scalable_Cells_regions_azs.png