openstack-manuals/doc/ha-guide/source/intro-ha-arch-keepalived.rst
qiaomin 0265a39140 [ha-guide] Use "project" to replace "tenant" term in use-guide
This patch use "project" to replace "tenant" term in ha-guide for cleanup.

Partial-Bug: #1475005
Change-Id: I8b6ea24340f0e9eea500000d01ab65fa2ea45b03
2016-08-18 01:15:25 +00:00

4.0 KiB

The keepalived architecture

High availability strategies

The following diagram shows a very simplified view of the different strategies used to achieve high availability for the OpenStack services:

image

Depending on the method used to communicate with the service, the following availability strategies will be followed:

  • Keepalived, for the HAProxy instances.
  • Access via an HAProxy virtual IP, for services such as HTTPd that are accessed via a TCP socket that can be load balanced
  • Built-in application clustering, when available from the application. Galera is one example of this.
  • Starting up one instance of the service on several controller nodes, when they can coexist and coordinate by other means. RPC in nova-conductor is one example of this.
  • No high availability, when the service can only work in active/passive mode.

There are known issues with cinder-volume that recommend setting it as active-passive for now, see: https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support

While there will be multiple neutron LBaaS agents running, each agent will manage a set of load balancers, that cannot be failed over to another node.

Architecture limitations

This architecture has some inherent limitations that should be kept in mind during deployment and daily operations. The following sections describe these limitations.

  1. Keepalived and network partitions

    In case of a network partitioning, there is a chance that two or more nodes running keepalived claim to hold the same VIP, which may lead to an undesired behaviour. Since keepalived uses VRRP over multicast to elect a master (VIP owner), a network partition in which keepalived nodes cannot communicate will result in the VIPs existing on two nodes. When the network partition is resolved, the duplicate VIPs should also be resolved. Note that this network partition problem with VRRP is a known limitation for this architecture.

  2. Cinder-volume as a single point of failure

    There are currently concerns over the cinder-volume service ability to run as a fully active-active service. During the Mitaka timeframe, this is being worked on, see: https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support Thus, cinder-volume will only be running on one of the controller nodes, even if it will be configured on all nodes. In case of a failure in the node running cinder-volume, it should be started in a surviving controller node.

  3. Neutron-lbaas-agent as a single point of failure

    The current design of the neutron LBaaS agent using the HAProxy driver does not allow high availability for the project load balancers. The neutron-lbaas-agent service will be enabled and running on all controllers, allowing for load balancers to be distributed across all nodes. However, a controller node failure will stop all load balancers running on that node until the service is recovered or the load balancer is manually removed and created again.

  4. Service monitoring and recovery required

    An external service monitoring infrastructure is required to check the OpenStack service health, and notify operators in case of any failure. This architecture does not provide any facility for that, so it would be necessary to integrate the OpenStack deployment with any existing monitoring environment.

  5. Manual recovery after a full cluster restart

    Some support services used by RDO or RHEL OSP use their own form of application clustering. Usually, these services maintain a cluster quorum, that may be lost in case of a simultaneous restart of all cluster nodes, for example during a power outage. Each service will require its own procedure to regain quorum.

If you find any or all of these limitations concerning, you are encouraged to refer to the Pacemaker HA architecture<intro-ha-arch-pacemaker> instead.