Set RabbitMQ ha-promote-on-shutdown=always

Changes the default value of `rabbitmq-ha-promote-on-shutdown` to
`"always"`.

We are seeing issues with RabbitMQ automatically recovering when nodes
are restarted. https://www.rabbitmq.com/ha.html#cluster-shutdown

Rather than waiting for operator interventions, it is better we allow
recovery to happen, even if that means we may loose some messages.
A few failed and timed out operations is better than a totaly broken
cloud. This is achieved using ha-promote-on-shutdown=always.

Note, when a node failure is detected, this is already the default
behaviour from 3.7.5 onwards:
https://www.rabbitmq.com/ha.html#promoting-unsynchronised-mirrors

Related-Bug: #1954925
Change-Id: I484a81163f703fa27112df22473d657e2a9ab964
This commit is contained in:
Matt Crees 2023-02-07 09:56:43 +00:00
parent 0d42110e01
commit a87810db7e
2 changed files with 12 additions and 1 deletions

View File

@ -88,7 +88,7 @@ rabbitmq_cluster_partition_handling: "pause_minority"
# The rabbitmq default for ha queues is "when-synced" # The rabbitmq default for ha queues is "when-synced"
# More details see: # More details see:
# https://www.rabbitmq.com/ha.html#promoting-unsynchronised-mirrors # https://www.rabbitmq.com/ha.html#promoting-unsynchronised-mirrors
rabbitmq_ha_promote_on_shutdown: rabbitmq_ha_promote_on_shutdown: "always"
# The number of rabbitmq replicas should follow this advice: # The number of rabbitmq replicas should follow this advice:
# https://www.rabbitmq.com/ha.html#replication-factor # https://www.rabbitmq.com/ha.html#replication-factor
# This means, if you have three rabbit nodes, we request two # This means, if you have three rabbit nodes, we request two

View File

@ -0,0 +1,11 @@
---
upgrade:
- |
The RabbitMQ variable `rabbitmq-ha-promote-on-shutdown` now defaults to
`"always"`. This only has an effect if
`om_enable_rabbitmq_high_availability` is set to `True`. When
`ha-promote-on-shutdown` is set to `always`, queue mirrors are promted on
shutdown even if they aren't fully synced. This means that value
availability over the risk of losing some messages. Note that the contents
of the RabbitMQ definitions.json are now changed, meaning RabbitMQ
containers will be restarted on next deploy/upgrade.