[ha-guide] Cleaning up old files and edits to structure

1. Removing empty files from the guide 2. Restructuring information to avoid unnecessary files Change-Id: I2570e7fd9d75bae121b33449db94306f783bd19b Implements: blueprint ha-guide-todos
2016-12-06 13:44:41 +00:00 · 2016-12-06 13:44:41 +00:00 · 5b3f15200d
commit 5b3f15200d
parent 509e22b8ce
18 changed files with 346 additions and 466 deletions
--- a/doc/ha-guide/source/compute-node-ha-api.rst
+++ b/doc/ha-guide/source/compute-node-ha-api.rst
@ -1,9 +0,0 @@
 ==============================================
 Configuring high availability on compute nodes
 ==============================================
 The `Newton Installation Tutorials and Guides
 <http://docs.openstack.org/project-install-guide/newton/>`_
 provide instructions for installing multiple compute nodes.
 To make the compute nodes highly available, you must configure the
 environment to include multiple instances of the API and other services.
--- a/doc/ha-guide/source/compute-node-ha.rst
+++ b/doc/ha-guide/source/compute-node-ha.rst
@ -2,8 +2,54 @@
 Configuring the compute node
 ============================
-.. toctree::
+The `Newton Installation Tutorials and Guides
-   :maxdepth: 2
+<http://docs.openstack.org/project-install-guide/newton/>`_
 provide instructions for installing multiple compute nodes.
 To make the compute nodes highly available, you must configure the
 environment to include multiple instances of the API and other services.
-   compute-node-ha-api.rst
+Configuring high availability for instances
-   instance-ha.rst
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 As of September 2016, the OpenStack High Availability community is
 designing and developing an official and unified way to provide high
 availability for instances. We are developing automatic
 recovery from failures of hardware or hypervisor-related software on
 the compute node, or other failures that could prevent instances from
 functioning correctly, such as, issues with a cinder volume I/O path.
 More details are available in the `user story
 <http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html>`_
 co-authored by OpenStack's HA community and `Product Working Group
 <https://wiki.openstack.org/wiki/ProductTeam>`_ (PWG), where this feature is
 identified as missing functionality in OpenStack, which
 should be addressed with high priority.
 Existing solutions
 ~~~~~~~~~~~~~~~~~~
 The architectural challenges of instance HA and several currently
 existing solutions were presented in `a talk at the Austin summit
 <https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation>`_,
 for which `slides are also available <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/>`_.
 The code for three of these solutions can be found online at the following
 links:
 * `a mistral-based auto-recovery workflow
  <https://github.com/gryf/mistral-evacuate>`_, by Intel
 * `masakari <https://launchpad.net/masakari>`_, by NTT
 * `OCF RAs
  <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/#/ocf-pros-cons>`_,
  as used by Red Hat and SUSE
 Current upstream work
 ~~~~~~~~~~~~~~~~~~~~~
 Work is in progress on a unified approach, which combines the best
 aspects of existing upstream solutions. More details are available on
 `the HA VMs user story wiki
 <https://wiki.openstack.org/wiki/ProductTeam/User_Stories/HA_VMs>`_.
 To get involved with this work, see the section on the
 :doc:`ha-community`.
--- a/doc/ha-guide/source/controller-ha.rst
+++ b/doc/ha-guide/source/controller-ha.rst
@ -8,9 +8,66 @@ all other services.
 .. toctree::
   :maxdepth: 2
   intro-ha-arch-pacemaker.rst
   controller-ha-pacemaker.rst
   controller-ha-vip.rst
   controller-ha-haproxy.rst
   controller-ha-memcached.rst
   controller-ha-identity.rst
   controller-ha-telemetry.rst
 Overview of highly available controllers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 OpenStack is a set of services exposed to the end users
 as HTTP(s) APIs. Additionally, for your own internal usage, OpenStack
 requires an SQL database server and AMQP broker. The physical servers,
 where all the components are running, are called controllers.
 This modular OpenStack architecture allows you to duplicate all the
 components and run them on different controllers.
 By making all the components redundant, it is possible to make
 OpenStack highly available.
 In general, we can divide all the OpenStack components into three categories:
 - OpenStack APIs: APIs that are HTTP(s) stateless services written in python,
  easy to duplicate and mostly easy to load balance.
 - The SQL relational database server provides stateful type consumed by other
  components. Supported databases are MySQL, MariaDB, and PostgreSQL.
  Making the SQL database redundant is complex.
 - :term:`Advanced Message Queuing Protocol (AMQP)` provides OpenStack
  internal stateful communication service.
 Common deployment architectures
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We recommend two primary architectures for making OpenStack highly available.
 The architectures differ in the sets of services managed by the
 cluster.
 Both use a cluster manager, such as Pacemaker or Veritas, to
 orchestrate the actions of the various services across a set of
 machines. Because we are focused on FOSS, we refer to these as
 Pacemaker architectures.
 Traditionally, Pacemaker has been positioned as an all-encompassing
 solution. However, as OpenStack services have matured, they are
 increasingly able to run in an active/active configuration and
 gracefully tolerate the disappearance of the APIs on which they
 depend.
 With this in mind, some vendors are restricting Pacemaker's use to
 services that must operate in an active/passive mode (such as
 ``cinder-volume``), those with multiple states (for example, Galera), and
 those with complex bootstrapping procedures (such as RabbitMQ).
 The majority of services, needing no real orchestration, are handled
 by systemd on each node. This approach avoids the need to coordinate
 service upgrades or location changes with the cluster and has the
 added advantage of more easily scaling beyond Corosync's 16 node
 limit. However, it will generally require the addition of an
 enterprise monitoring solution such as Nagios or Sensu for those
 wanting centralized failure reporting.
--- a/doc/ha-guide/source/ha-community.rst
+++ b/doc/ha-guide/source/ha-community.rst
@ -2,20 +2,16 @@
 HA community
 ============
 Weekly IRC meetings
 ~~~~~~~~~~~~~~~~~~~
 The OpenStack HA community holds `weekly IRC meetings
 <https://wiki.openstack.org/wiki/Meetings/HATeamMeeting>`_ to discuss
 a range of topics relating to HA in OpenStack. Everyone interested is
 encouraged to attend. The `logs of all previous meetings
 <http://eavesdrop.openstack.org/meetings/ha/>`_ are available to read.
 Contacting the community
 ~~~~~~~~~~~~~~~~~~~~~~~~
 You can contact the HA community directly in `the #openstack-ha
 channel on Freenode IRC <https://wiki.openstack.org/wiki/IRC>`_, or by
-sending mail to `the openstack-dev mailing list
+sending mail to the `openstack-dev
 <https://wiki.openstack.org/wiki/Mailing_Lists#Future_Development>`_
-with the ``[HA]`` prefix in the ``Subject`` header.
+or `openstack-docs
 <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-docs>`_
 mailing list with the ``[HA]`` prefix in the ``Subject`` header.
--- a/doc/ha-guide/source/index.rst
+++ b/doc/ha-guide/source/index.rst
@ -19,9 +19,6 @@ This guide documents OpenStack Newton, Mitaka, and Liberty releases.
   `bug list <https://bugs.launchpad.net/openstack-manuals/>`_.
   Please help where you are able.
 Contents
 ~~~~~~~~
 .. toctree::
   :maxdepth: 2
@ -50,8 +47,3 @@ Glossary
   :maxdepth: 1
   common/glossary.rst
 Search in this guide
 ~~~~~~~~~~~~~~~~~~~~
 * :ref:`search`
--- a/doc/ha-guide/source/instance-ha.rst
+++ b/doc/ha-guide/source/instance-ha.rst
@ -1,46 +0,0 @@
 ========================================
 Configure high availability of instances
 ========================================
 As of September 2016, the OpenStack High Availability community is
 designing and developing an official and unified way to provide high
 availability for instances. We are developing automatic
 recovery from failures of hardware or hypervisor-related software on
 the compute node, or other failures that could prevent instances from
 functioning correctly, such as, issues with a cinder volume I/O path.
 More details are available in the `user story
 <http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html>`_
 co-authored by OpenStack's HA community and `Product Working Group
 <https://wiki.openstack.org/wiki/ProductTeam>`_ (PWG), where this feature is
 identified as missing functionality in OpenStack, which
 should be addressed with high priority.
 Existing solutions
 ~~~~~~~~~~~~~~~~~~
 The architectural challenges of instance HA and several currently
 existing solutions were presented in `a talk at the Austin summit
 <https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation>`_,
 for which `slides are also available <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/>`_.
 The code for three of these solutions can be found online at the following
 links:
 * `a mistral-based auto-recovery workflow
  <https://github.com/gryf/mistral-evacuate>`_, by Intel
 * `masakari <https://launchpad.net/masakari>`_, by NTT
 * `OCF RAs
  <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/#/ocf-pros-cons>`_,
  as used by Red Hat and SUSE
 Current upstream work
 ~~~~~~~~~~~~~~~~~~~~~
 Work is in progress on a unified approach, which combines the best
 aspects of existing upstream solutions. More details are available on
 `the HA VMs user story wiki
 <https://wiki.openstack.org/wiki/ProductTeam/User_Stories/HA_VMs>`_.
 To get involved with this work, please see the section on the
 :doc:`ha-community`.
--- a/doc/ha-guide/source/intro-ha-compute.rst
+++ b/doc/ha-guide/source/intro-ha-compute.rst
@ -1,4 +0,0 @@
 ==========================================
 Overview of highly available compute nodes
 ==========================================
--- a/doc/ha-guide/source/intro-ha-concepts.rst
+++ b/doc/ha-guide/source/intro-ha-concepts.rst
@ -1,208 +0,0 @@
 ==========================
 High availability concepts
 ==========================
 High availability systems seek to minimize the following issues:
 #. System downtime: Occurs when a user-facing service is unavailable
   beyond a specified maximum amount of time.
 #. Data loss: Accidental deletion or destruction of data.
 Most high availability systems guarantee protection against system downtime
 and data loss only in the event of a single failure.
 However, they are also expected to protect against cascading failures,
 where a single failure deteriorates into a series of consequential failures.
 Many service providers guarantee a :term:`Service Level Agreement (SLA)`
 including uptime percentage of computing service, which is calculated based
 on the available time and system downtime excluding planned outage time.
 Redundancy and failover
 ~~~~~~~~~~~~~~~~~~~~~~~
 High availability is implemented with redundant hardware
 running redundant instances of each service.
 If one piece of hardware running one instance of a service fails,
 the system can then failover to use another instance of a service
 that is running on hardware that did not fail.
 A crucial aspect of high availability
 is the elimination of single points of failure (SPOFs).
 A SPOF is an individual piece of equipment or software
 that causes system downtime or data loss if it fails.
 In order to eliminate SPOFs, check that mechanisms exist for redundancy of:
 - Network components, such as switches and routers
 - Applications and automatic service migration
 - Storage components
 - Facility services such as power, air conditioning, and fire protection
 In the event that a component fails and a back-up system must take on
 its load, most high availability systems will replace the failed
 component as quickly as possible to maintain necessary redundancy. This
 way time spent in a degraded protection state is minimized.
 Most high availability systems fail in the event of multiple
 independent (non-consequential) failures. In this case, most
 implementations favor protecting data over maintaining availability.
 High availability systems typically achieve an uptime percentage of
 99.99% or more, which roughly equates to less than an hour of
 cumulative downtime per year. In order to achieve this, high
 availability systems should keep recovery times after a failure to
 about one to two minutes, sometimes significantly less.
 OpenStack currently meets such availability requirements for its own
 infrastructure services, meaning that an uptime of 99.99% is feasible
 for the OpenStack infrastructure proper. However, OpenStack does not
 guarantee 99.99% availability for individual guest instances.
 This document discusses some common methods of implementing highly
 available systems, with an emphasis on the core OpenStack services and
 other open source services that are closely aligned with OpenStack.
 You will need to address high availability concerns for any applications
 software that you run on your OpenStack environment. The important thing is
 to make sure that your services are redundant and available.
 How you achieve that is up to you.
 Stateless versus stateful services
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The following are the definitions of stateless and stateful services:
 Stateless service
  A service that provides a response after your request
  and then requires no further attention.
  To make a stateless service highly available,
  you need to provide redundant instances and load balance them.
  OpenStack services that are stateless include ``nova-api``,
  ``nova-conductor``, ``glance-api``, ``keystone-api``,
  ``neutron-api``, and ``nova-scheduler``.
 Stateful service
  A service where subsequent requests to the service
  depend on the results of the first request.
  Stateful services are more difficult to manage because a single
  action typically involves more than one request. Providing
  additional instances and load balancing does not solve the problem.
  For example, if the horizon user interface reset itself every time
  you went to a new page, it would not be very useful.
  OpenStack services that are stateful include the OpenStack database
  and message queue.
  Making stateful services highly available can depend on whether you choose
  an active/passive or active/active configuration.
 Active/passive versus active/active
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Stateful services can be configured as active/passive or active/active,
 which are defined as follows:
 :term:`active/passive configuration`
  Maintains a redundant instance
  that can be brought online when the active service fails.
  For example, OpenStack writes to the main database
  while maintaining a disaster recovery database that can be brought online
  if the main database fails.
  A typical active/passive installation for a stateful service maintains
  a replacement resource that can be brought online when required.
  Requests are handled using a :term:`virtual IP address (VIP)` that
  facilitates returning to service with minimal reconfiguration.
  A separate application (such as Pacemaker or Corosync) monitors
  these services, bringing the backup online as necessary.
 :term:`active/active configuration`
  Each service also has a backup but manages both the main and
  redundant systems concurrently.
  This way, if there is a failure, the user is unlikely to notice.
  The backup system is already online and takes on increased load
  while the main system is fixed and brought back online.
  Typically, an active/active installation for a stateless service
  maintains a redundant instance, and requests are load balanced using
  a virtual IP address and a load balancer such as HAProxy.
  A typical active/active installation for a stateful service includes
  redundant services, with all instances having an identical state. In
  other words, updates to one instance of a database update all other
  instances. This way a request to one instance is the same as a
  request to any other. A load balancer manages the traffic to these
  systems, ensuring that operational systems always handle the
  request.
 Clusters and quorums
 ~~~~~~~~~~~~~~~~~~~~
 The quorum specifies the minimal number of nodes
 that must be functional in a cluster of redundant nodes
 in order for the cluster to remain functional.
 When one node fails and failover transfers control to other nodes,
 the system must ensure that data and processes remain sane.
 To determine this, the contents of the remaining nodes are compared
 and, if there are discrepancies, a majority rules algorithm is implemented.
 For this reason, each cluster in a high availability environment should
 have an odd number of nodes and the quorum is defined as more than a half
 of the nodes.
 If multiple nodes fail so that the cluster size falls below the quorum
 value, the cluster itself fails.
 For example, in a seven-node cluster, the quorum should be set to
 ``floor(7/2) + 1 == 4``. If quorum is four and four nodes fail simultaneously,
 the cluster itself would fail, whereas it would continue to function, if
 no more than three nodes fail. If split to partitions of three and four nodes
 respectively, the quorum of four nodes would continue to operate the majority
 partition and stop or fence the minority one (depending on the
 no-quorum-policy cluster configuration).
 And the quorum could also have been set to three, just as a configuration
 example.
 .. note::
  We do not recommend setting the quorum to a value less than ``floor(n/2) + 1``
  as it would likely cause a split-brain in a face of network partitions.
 When four nodes fail simultaneously, the cluster would continue to function as
 well. But if split to partitions of three and four nodes respectively, the
 quorum of three would have made both sides to attempt to fence the other and
 host resources. Without fencing enabled, it would go straight to running
 two copies of each resource.
 This is why setting the quorum to a value less than ``floor(n/2) + 1`` is
 dangerous. However it may be required for some specific cases, such as a
 temporary measure at a point it is known with 100% certainty that the other
 nodes are down.
 When configuring an OpenStack environment for study or demonstration purposes,
 it is possible to turn off the quorum checking. Production systems should
 always run with quorum enabled.
 Single-controller high availability mode
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 OpenStack supports a single-controller high availability mode
 that is managed by the services that manage highly available environments
 but is not actually highly available because
 no redundant controllers are configured to use for failover.
 This environment can be used for study and demonstration
 but is not appropriate for a production environment.
 It is possible to add controllers to such an environment
 to convert it into a truly highly available environment.
 High availability is not for every user. It presents some challenges.
 High availability may be too complex for databases or
 systems with large amounts of data. Replication can slow large systems
 down. Different setups have different prerequisites. Read the guidelines
 for each setup.
 .. important::
   High availability is turned off as the default in OpenStack setups.
--- a/doc/ha-guide/source/intro-ha-controller.rst
+++ b/doc/ha-guide/source/intro-ha-controller.rst
@ -1,77 +0,0 @@
 ========================================
 Overview of highly available controllers
 ========================================
 OpenStack is a set of multiple services exposed to the end users
 as HTTP(s) APIs. Additionally, for your own internal usage, OpenStack
 requires an SQL database server and AMQP broker. The physical servers,
 where all the components are running, are called controllers.
 This modular OpenStack architecture allows you to duplicate all the
 components and run them on different controllers.
 By making all the components redundant it is possible to make
 OpenStack highly available.
 In general we can divide all the OpenStack components into three categories:
 - OpenStack APIs: These are HTTP(s) stateless services written in python,
  easy to duplicate and mostly easy to load balance.
 - SQL relational database server provides stateful type consumed by other
  components. Supported databases are MySQL, MariaDB, and PostgreSQL.
  Making SQL database redundant is complex.
 - :term:`Advanced Message Queuing Protocol (AMQP)` provides OpenStack
  internal stateful communication service.
 Network components
 ~~~~~~~~~~~~~~~~~~
 [TODO Need discussion of network hardware, bonding interfaces,
 intelligent Layer 2 switches, routers and Layer 3 switches.]
 The configuration uses static routing without
 Virtual Router Redundancy Protocol (VRRP)
 or similar techniques implemented.
 [TODO Need description of VIP failover inside Linux namespaces
 and expected SLA.]
 See :doc:`networking-ha` for more information about configuring
 Networking for high availability.
 Common deployment architectures
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We recommend two primary architectures for making OpenStack highly available.
 The architectures differ in the sets of services managed by the
 cluster.
 Both use a cluster manager, such as Pacemaker or Veritas, to
 orchestrate the actions of the various services across a set of
 machines. Because we are focused on FOSS, we refer to these as
 Pacemaker architectures.
 Traditionally, Pacemaker has been positioned as an all-encompassing
 solution. However, as OpenStack services have matured, they are
 increasingly able to run in an active/active configuration and
 gracefully tolerate the disappearance of the APIs on which they
 depend.
 With this in mind, some vendors are restricting Pacemaker's use to
 services that must operate in an active/passive mode (such as
 ``cinder-volume``), those with multiple states (for example, Galera), and
 those with complex bootstrapping procedures (such as RabbitMQ).
 The majority of services, needing no real orchestration, are handled
 by Systemd on each node. This approach avoids the need to coordinate
 service upgrades or location changes with the cluster and has the
 added advantage of more easily scaling beyond Corosync's 16 node
 limit. However, it will generally require the addition of an
 enterprise monitoring solution such as Nagios or Sensu for those
 wanting centralized failure reporting.
 .. toctree::
   :maxdepth: 1
   intro-ha-arch-pacemaker.rst
--- a/doc/ha-guide/source/intro-ha-other.rst
+++ b/doc/ha-guide/source/intro-ha-other.rst
@ -1,3 +0,0 @@
 ======================================
 High availability for other components
 ======================================
--- a/doc/ha-guide/source/intro-ha-storage.rst
+++ b/doc/ha-guide/source/intro-ha-storage.rst
@ -1,12 +0,0 @@
 =====================================
 Overview of high availability storage
 =====================================
 Making the Block Storage (cinder) API service highly available in
 active/active mode involves:
 * Configuring Block Storage to listen on the VIP address
 * Managing the Block Storage API daemon with the Pacemaker cluster manager
 * Configuring OpenStack services to use this IP address
--- a/doc/ha-guide/source/intro-ha.rst
+++ b/doc/ha-guide/source/intro-ha.rst
@ -2,12 +2,207 @@
 Introduction to OpenStack high availability
 ===========================================
-.. toctree::
+High availability systems seek to minimize the following issues:
   :maxdepth: 2
-   intro-ha-concepts.rst
+#. System downtime: Occurs when a user-facing service is unavailable
-   intro-ha-controller.rst
+   beyond a specified maximum amount of time.
   intro-ha-storage.rst
   intro-ha-compute.rst
   intro-ha-other.rst
 #. Data loss: Accidental deletion or destruction of data.
 Most high availability systems guarantee protection against system downtime
 and data loss only in the event of a single failure.
 However, they are also expected to protect against cascading failures,
 where a single failure deteriorates into a series of consequential failures.
 Many service providers guarantee a :term:`Service Level Agreement (SLA)`
 including uptime percentage of computing service, which is calculated based
 on the available time and system downtime excluding planned outage time.
 Redundancy and failover
 ~~~~~~~~~~~~~~~~~~~~~~~
 High availability is implemented with redundant hardware
 running redundant instances of each service.
 If one piece of hardware running one instance of a service fails,
 the system can then failover to use another instance of a service
 that is running on hardware that did not fail.
 A crucial aspect of high availability
 is the elimination of single points of failure (SPOFs).
 A SPOF is an individual piece of equipment or software
 that causes system downtime or data loss if it fails.
 In order to eliminate SPOFs, check that mechanisms exist for redundancy of:
 - Network components, such as switches and routers
 - Applications and automatic service migration
 - Storage components
 - Facility services such as power, air conditioning, and fire protection
 In the event that a component fails and a back-up system must take on
 its load, most high availability systems will replace the failed
 component as quickly as possible to maintain necessary redundancy. This
 way time spent in a degraded protection state is minimized.
 Most high availability systems fail in the event of multiple
 independent (non-consequential) failures. In this case, most
 implementations favor protecting data over maintaining availability.
 High availability systems typically achieve an uptime percentage of
 99.99% or more, which roughly equates to less than an hour of
 cumulative downtime per year. In order to achieve this, high
 availability systems should keep recovery times after a failure to
 about one to two minutes, sometimes significantly less.
 OpenStack currently meets such availability requirements for its own
 infrastructure services, meaning that an uptime of 99.99% is feasible
 for the OpenStack infrastructure proper. However, OpenStack does not
 guarantee 99.99% availability for individual guest instances.
 This document discusses some common methods of implementing highly
 available systems, with an emphasis on the core OpenStack services and
 other open source services that are closely aligned with OpenStack.
 You will need to address high availability concerns for any applications
 software that you run on your OpenStack environment. The important thing is
 to make sure that your services are redundant and available.
 How you achieve that is up to you.
 Stateless versus stateful services
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The following are the definitions of stateless and stateful services:
 Stateless service
  A service that provides a response after your request
  and then requires no further attention.
  To make a stateless service highly available,
  you need to provide redundant instances and load balance them.
  OpenStack services that are stateless include ``nova-api``,
  ``nova-conductor``, ``glance-api``, ``keystone-api``,
  ``neutron-api``, and ``nova-scheduler``.
 Stateful service
  A service where subsequent requests to the service
  depend on the results of the first request.
  Stateful services are more difficult to manage because a single
  action typically involves more than one request. Providing
  additional instances and load balancing does not solve the problem.
  For example, if the horizon user interface reset itself every time
  you went to a new page, it would not be very useful.
  OpenStack services that are stateful include the OpenStack database
  and message queue.
  Making stateful services highly available can depend on whether you choose
  an active/passive or active/active configuration.
 Active/passive versus active/active
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Stateful services can be configured as active/passive or active/active,
 which are defined as follows:
 :term:`active/passive configuration`
  Maintains a redundant instance
  that can be brought online when the active service fails.
  For example, OpenStack writes to the main database
  while maintaining a disaster recovery database that can be brought online
  if the main database fails.
  A typical active/passive installation for a stateful service maintains
  a replacement resource that can be brought online when required.
  Requests are handled using a :term:`virtual IP address (VIP)` that
  facilitates returning to service with minimal reconfiguration.
  A separate application (such as Pacemaker or Corosync) monitors
  these services, bringing the backup online as necessary.
 :term:`active/active configuration`
  Each service also has a backup but manages both the main and
  redundant systems concurrently.
  This way, if there is a failure, the user is unlikely to notice.
  The backup system is already online and takes on increased load
  while the main system is fixed and brought back online.
  Typically, an active/active installation for a stateless service
  maintains a redundant instance, and requests are load balanced using
  a virtual IP address and a load balancer such as HAProxy.
  A typical active/active installation for a stateful service includes
  redundant services, with all instances having an identical state. In
  other words, updates to one instance of a database update all other
  instances. This way a request to one instance is the same as a
  request to any other. A load balancer manages the traffic to these
  systems, ensuring that operational systems always handle the
  request.
 Clusters and quorums
 ~~~~~~~~~~~~~~~~~~~~
 The quorum specifies the minimal number of nodes
 that must be functional in a cluster of redundant nodes
 in order for the cluster to remain functional.
 When one node fails and failover transfers control to other nodes,
 the system must ensure that data and processes remain sane.
 To determine this, the contents of the remaining nodes are compared
 and, if there are discrepancies, a majority rules algorithm is implemented.
 For this reason, each cluster in a high availability environment should
 have an odd number of nodes and the quorum is defined as more than a half
 of the nodes.
 If multiple nodes fail so that the cluster size falls below the quorum
 value, the cluster itself fails.
 For example, in a seven-node cluster, the quorum should be set to
 ``floor(7/2) + 1 == 4``. If quorum is four and four nodes fail simultaneously,
 the cluster itself would fail, whereas it would continue to function, if
 no more than three nodes fail. If split to partitions of three and four nodes
 respectively, the quorum of four nodes would continue to operate the majority
 partition and stop or fence the minority one (depending on the
 no-quorum-policy cluster configuration).
 And the quorum could also have been set to three, just as a configuration
 example.
 .. note::
  We do not recommend setting the quorum to a value less than ``floor(n/2) + 1``
  as it would likely cause a split-brain in a face of network partitions.
 When four nodes fail simultaneously, the cluster would continue to function as
 well. But if split to partitions of three and four nodes respectively, the
 quorum of three would have made both sides to attempt to fence the other and
 host resources. Without fencing enabled, it would go straight to running
 two copies of each resource.
 This is why setting the quorum to a value less than ``floor(n/2) + 1`` is
 dangerous. However it may be required for some specific cases, such as a
 temporary measure at a point it is known with 100% certainty that the other
 nodes are down.
 When configuring an OpenStack environment for study or demonstration purposes,
 it is possible to turn off the quorum checking. Production systems should
 always run with quorum enabled.
 Single-controller high availability mode
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 OpenStack supports a single-controller high availability mode
 that is managed by the services that manage highly available environments
 but is not actually highly available because
 no redundant controllers are configured to use for failover.
 This environment can be used for study and demonstration
 but is not appropriate for a production environment.
 It is possible to add controllers to such an environment
 to convert it into a truly highly available environment.
 High availability is not for every user. It presents some challenges.
 High availability may be too complex for databases or
 systems with large amounts of data. Replication can slow large systems
 down. Different setups have different prerequisites. Read the guidelines
 for each setup.
 .. important::
   High availability is turned off as the default in OpenStack setups.
--- a/doc/ha-guide/source/networking-ha-lbaas.rst
+++ b/doc/ha-guide/source/networking-ha-lbaas.rst
@ -1,11 +0,0 @@
 ==========================
 Run Networking LBaaS agent
 ==========================
 Currently, no native feature is provided to make the LBaaS agent highly
 available using the default plug-in HAProxy. A common way to make HAProxy
 highly available is to use the VRRP (Virtual Router Redundancy Protocol).
 Unfortunately, this is not yet implemented in the LBaaS HAProxy plug-in.
 [TODO: update this section.]
--- a/doc/ha-guide/source/networking-ha-metadata.rst
+++ b/doc/ha-guide/source/networking-ha-metadata.rst
@ -1,12 +0,0 @@
 =============================
 Run Networking metadata agent
 =============================
 Currently, no native feature is available to make this service highly
 available. At this time, the active/passive solution exists to run the
 neutron metadata agent in failover mode with Pacemaker.
 [TODO: Update this information.
 Can this service now be made HA in active/active mode
 or do we need to pull in the instructions
 to run this service in active/passive mode?]
--- a/doc/ha-guide/source/networking-ha.rst
+++ b/doc/ha-guide/source/networking-ha.rst
@ -2,57 +2,34 @@
 Configuring the networking services
 ===================================
 .. toctree::
   :maxdepth: 2
   networking-ha-dhcp.rst
   networking-ha-l3.rst
 Configure networking on each node. See the basic information
 about configuring networking in the *Networking service*
 section of the
 `Install Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_,
 depending on your distribution.
 Notes from planning outline:
 - Rather than configuring neutron here,
  we should simply mention physical network HA methods
  such as bonding and additional node/network requirements
  for L3HA and DVR for planning purposes.
 - Neutron agents should be described for active/active;
  deprecate single agent's instances case.
 - For Kilo and beyond, focus on L3HA and DVR.
 - Link to `OpenStack Networking Guide <http://docs.openstack.org/networking-guide/>`_
  for configuration details.
 [TODO: Verify that the active/passive
 network configuration information from
 `<http://docs.openstack.org/high-availability-guide/content/s-neutron-server.html>`_
 should not be included here.
 `LP1328922 <https://bugs.launchpad.net/openstack-manuals/+bug/1328922>`_
 and
 `LP1349398 <https://bugs.launchpad.net/openstack-manuals/+bug/1349398>`_
 are related.]
 OpenStack network nodes contain:
 - :doc:`Networking DHCP agent<networking-ha-dhcp>`
 - Networking L2 agent.
  Note that the L2 agent cannot be distributed and highly available.
  Instead, it must be installed on each data forwarding node
  to control the virtual network drivers
  such as Open vSwitch or Linux Bridge.
  One L2 agent runs per node and controls its virtual interfaces.
 - :doc:`Neutron L3 agent<networking-ha-l3>`
- :doc:`Neutron metadata agent<networking-ha-metadata>`
+- Networking L2 agent
- :doc:`Neutron LBaaS agent<networking-ha-lbaas>`
+
  .. note::
     The L2 agent cannot be distributed and highly available.
     Instead, it must be installed on each data forwarding node
     to control the virtual network driver such as Open vSwitch
     or Linux Bridge. One L2 agent runs per node and controls its
     virtual interfaces.
 .. note::
-   For Liberty, we do not have the standalone network nodes in general.
+   For Liberty, you can not have the standalone network nodes.
-   We usually run the Networking services on the controller nodes.
+   The Networking services are run on the controller nodes.
-   In this guide, we use the term "network nodes" for convenience.
+   In this guide, the term `network nodes` is used for convenience.
 .. toctree::
   :maxdepth: 2
   networking-ha-dhcp.rst
   networking-ha-l3.rst
   networking-ha-metadata.rst
   networking-ha-lbaas.rst
--- a/doc/ha-guide/source/shared-database.rst
+++ b/doc/ha-guide/source/shared-database.rst
@ -2,6 +2,12 @@
 Database (Galera Cluster) for high availability
 ===============================================
 .. toctree::
  :maxdepth: 2
  shared-database-configure.rst
  shared-database-manage.rst
 The first step is to install the database that sits at the heart of the
 cluster. To implement high availability, run an instance of the database on
 each controller node and use Galera Cluster to provide replication between
@ -24,9 +30,3 @@ There are three implementations of Galera Cluster available to you:
 In addition to Galera Cluster, you can also achieve high availability
 through other database options, such as PostgreSQL, which has its own
 replication system.
 .. toctree::
  :maxdepth: 2
  shared-database-configure.rst
  shared-database-manage.rst
--- a/doc/ha-guide/source/storage-ha-backend.rst
+++ b/doc/ha-guide/source/storage-ha-backend.rst
@ -57,13 +57,3 @@ it supports `live migration
 <http://docs.openstack.org/admin-guide/compute-live-migration-usage.html>`_
 of VMs with ephemeral drives. LVM only supports live migration of
 volume-backed VMs.
 Remote backup facilities
 ------------------------
 [TODO: Add discussion of remote backup facilities
 as an alternate way to secure ones data.
 Include brief mention of key third-party technologies
 with links to their documentation]
--- a/doc/ha-guide/source/storage-ha.rst
+++ b/doc/ha-guide/source/storage-ha.rst
@ -9,3 +9,12 @@ Configuring storage
   storage-ha-block.rst
   storage-ha-file-systems.rst
   storage-ha-backend.rst
 Making the Block Storage (cinder) API service highly available in
 active/active mode involves:
 * Configuring Block Storage to listen on the VIP address
 * Managing the Block Storage API daemon with the Pacemaker cluster manager
 * Configuring OpenStack services to use this IP address