[[s-rabbitmq]] ==== Highly available RabbitMQ RabbitMQ is the default AMQP server used by many OpenStack services. Making the RabbitMQ service highly available involves * configuring a DRBD device for use by RabbitMQ, * configuring RabbitMQ to use a data directory residing on that DRBD device, * selecting and assigning a virtual IP address (VIP) that can freely float between cluster nodes, * configuring RabbitMQ to listen on that IP address, * managing all resources, including the RabbitMQ daemon itself, with the Pacemaker cluster manager. NOTE: There is an alternative method of configuring RabbitMQ for high availability. That approach, known as http://www.rabbitmq.com/ha.html[active-active mirrored queues], happens to be the one preferred by the RabbitMQ developers -- however it has shown less than ideal consistency and reliability in OpenStack clusters. Thus, at the time of writing, the Pacemaker/DRBD based approach remains the recommended one for OpenStack environments, although this may change in the near future as RabbitMQ active-active mirrored queues mature. ===== Configuring DRBD The Pacemaker based RabbitMQ server requires a DRBD resource from which it mounts the +/var/lib/rabbitmq+ directory. In this example, the DRBD resource is simply named +rabbitmq+: .+rabbitmq+ DRBD resource configuration (+/etc/drbd.d/rabbitmq.res+) ---- include::includes/rabbitmq.res[] ---- This resource uses an underlying local disk (in DRBD terminology, a _backing device_) named +/dev/data/rabbitmq+ on both cluster nodes, +node1+ and +node2+. Normally, this would be an LVM Logical Volume specifically set aside for this purpose. The DRBD +meta-disk+ is +internal+, meaning DRBD-specific metadata is being stored at the end of the +disk+ device itself. The device is configured to communicate between IPv4 addresses 10.0.42.100 and 10.0.42.254, using TCP port 7701. Once enabled, it will map to a local DRBD block device with the device minor number 1, that is, +/dev/drbd1+. Enabling a DRBD resource is explained in detail in http://www.drbd.org/users-guide-8.3/s-first-time-up.html[the DRBD User's Guide]. In brief, the proper sequence of commands is this: ---- drbdadm create-md rabbitmq <1> drbdadm up rabbitmq <2> drbdadm -- --force primary rabbitmq <3> ---- <1> Initializes DRBD metadata and writes the initial set of metadata to +/dev/data/rabbitmq+. Must be completed on both nodes. <2> Creates the +/dev/drbd1+ device node, _attaches_ the DRBD device to its backing store, and _connects_ the DRBD node to its peer. Must be completed on both nodes. <3> Kicks off the initial device synchronization, and puts the device into the +primary+ (readable and writable) role. See http://www.drbd.org/users-guide-8.3/ch-admin.html#s-roles[Resource roles] (from the DRBD User's Guide) for a more detailed description of the primary and secondary roles in DRBD. Must be completed _on one node only,_ namely the one where you are about to continue with creating your filesystem. ===== Creating a file system Once the DRBD resource is running and in the primary role (and potentially still in the process of running the initial device synchronization), you may proceed with creating the filesystem for RabbitMQ data. XFS is the generally recommended filesystem: ---- mkfs -t xfs /dev/drbd1 ---- You may also use the alternate device path for the DRBD device, which may be easier to remember as it includes the self-explanatory resource name: ---- mkfs -t xfs /dev/drbd/by-res/rabbitmq ---- Once completed, you may safely return the device to the secondary role. Any ongoing device synchronization will continue in the background: ---- drbdadm secondary rabbitmq ---- ===== Preparing RabbitMQ for Pacemaker high availability In order for Pacemaker monitoring to function properly, you must ensure that RabbitMQ's +.erlang.cookie+ files are identical on all nodes, regardless of whether DRBD is mounted there or not. The simplest way of doing so is to take an existing +.erlang.cookie+ from one of your nodes, copying it to the RabbitMQ data directory on the other node, and also copying it to the DRBD-backed filesystem. ---- node1:# scp -a /var/lib/rabbitmq/.erlang.cookie node2:/var/lib/rabbitmq/ node1:# mount /dev/drbd/by-res/rabbitmq /mnt node1:# cp -a /var/lib/rabbitmq/.erlang.cookie /mnt node1:# umount /mnt ---- ===== Adding RabbitMQ resources to Pacemaker You may now proceed with adding the Pacemaker configuration for RabbitMQ resources. Connect to the Pacemaker cluster with +crm configure+, and add the following cluster resources: ---- include::includes/pacemaker-rabbitmq.crm[] ---- This configuration creates * +p_ip_rabbitmp+, a virtual IP address for use by RabbitMQ (192.168.42.100), * +p_fs_rabbitmq+, a Pacemaker managed filesystem mounted to +/var/lib/rabbitmq+ on whatever node currently runs the RabbitMQ service, * +ms_drbd_rabbitmq+, the _master/slave set_ managing the +rabbitmq+ DRBD resource, * a service +group+ and +order+ and +colocation+ constraints to ensure resources are started on the correct nodes, and in the correct sequence. +crm configure+ supports batch input, so you may copy and paste the above into your live pacemaker configuration, and then make changes as required. For example, you may enter +edit p_ip_rabbitmq+ from the +crm configure+ menu and edit the resource to match your preferred virtual IP address. Once completed, commit your configuration changes by entering +commit+ from the +crm configure+ menu. Pacemaker will then start the RabbitMQ service, and its dependent resources, on one of your nodes. ===== Configuring OpenStack services for highly available RabbitMQ Your OpenStack services must now point their RabbitMQ configuration to the highly available, virtual cluster IP address -- rather than a RabbitMQ server's physical IP address as you normally would. For OpenStack Image, for example, if your RabbitMQ service IP address is 192.168.42.100 as in the configuration explained here, you would use the following line in your OpenStack Image API configuration file (+glance-api.conf+): ---- rabbit_host = 192.168.42.100 ---- No other changes are necessary to your OpenStack configuration. If the node currently hosting your RabbitMQ experiences a problem necessitating service failover, your OpenStack services may experience a brief RabbitMQ interruption, as they would in the event of a network hiccup, and then continue to run normally.