Currently we do not follow the RabbitMQ advice on replicas here:
https://www.rabbitmq.com/ha.html#replication-factor
Here we reduce the number of replicas to n // 2 + 1 as advised
above. The hope it this helps speed up recovery from rabbit
issues.
Related-Bug: #1954925
Change-Id: Ib6bcb26c499c9884faa4a0cd51abaec00cacb096
Adds the flag `rabbitmq_ha_replica_count` to change how many different
nodes a queue should be mirrored across. If the value is not set, then
it defaults to "ha-mode":"all". This value is unset by default to avoid
any unexpected changes to the RabbitMQ definitions.json file, as that
would trigger an unexpected restart of RabbitMQ during the next deploy.
Change-Id: Iee98cd937197a73a3b04aa8501fa325e8ecfff24
By default ha-promote-on-shutdown=when-synced. However we are seeing
issues with RabbitMQ automatically recovering when nodes are restarted.
https://www.rabbitmq.com/ha.html#cluster-shutdown
Rather than waiting for operator interventions, it is better we allow
recovery to happen, even if that means we may loose some messages.
A few failed and timed out operations is better than a totaly broken
cloud. This is achieved using ha-promote-on-shutdown=always.
Note, when a node failure is detected, this is already the default
behaviour from 3.7.5 onwards:
https://www.rabbitmq.com/ha.html#promoting-unsynchronised-mirrors
This patch adds the option to change the ha-promote-on-shutdown
definition, using the flag `rabbitmq_ha_promote_on_shutdown`. This
value is unset by default to avoid any unexpected changes to the
RabbitMQ definitions.json file, as that would trigger an unexpected
restart of RabbitMQ during the next deploy.
Related-Bug: #1954925
Change-Id: I2146bda2c72ddac2c9923c6941b0596395fd9ab5
This change serialises the neutron l3 agent restart process and adds a
user configurable delay between restarts. This can prevent connectivity
loss due to all agents being restarted at the same time.
Routers increase the recovery time, making this issue more prevalent.
Change-Id: I3be0ebfa12965e6ae32d1b5f13f8fd23c3f52b8c
In order to honour configured max number of attempts
it has to be presented in nova.conf inside of
nova_conductor container, otherwise the default value
of 3 will be used
Closes-Bug: #2003587
Change-Id: I928af332b8658223444594f96417830233057284
This commit adds SystemdWorker class to kolla_docker ansible module.
It is used to manage container state via systemd calls.
Change-Id: I20e65a6771ebeee462a3aaaabaa5f0596bdd0581
Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Signed-off-by: Martin Hiner <m.hiner@partner.samsung.com>
As rabbitmq's configuration file is not ini or yaml file,
there is no option to extend configuration by new config
options via merge_configs or merge_yaml.
This patch moves config options to dictionary
so it can be overriden in /etc/kolla/globals.yml.
Change-Id: I5cd772f4fb80a0e200fb24d67be735ca81e3fdeb
Nova changes to RBAC [1] are breaking Kolla Ansible and causing most CI
jobs to fail. Disable these changes until we can adapt.
[1] https://review.opendev.org/c/openstack/nova/+/866218
Change-Id: I506697d2b374e74a6b066c788bd2d61edc8d4876
It's not supported in ansible-collection-kolla since Zed release,
and Kolla executed Kolla-Ansible CI jobs fail on it, because
they build images.
Change-Id: Ib0358f780a77af152225761a4aa3b6acbea2eeaf
According to the code, docs and oslo-config-validator, this
configuration option is not supported.
Change-Id: I34410e5267d527ec629748f35771f227183810b6
Makes sure the facts required to generate octavia.conf are available
when using genconfig.
This change also ensures that the necessary tasks run when using Ansible
check mode.
Closes-Bug: #1987299
Change-Id: Ib8fbee2d3abdcfd2eae0f9b3e9b69eeb0e3086e0
A combination of durable queues and classic queue mirroring can be used
to provide high availability of RabbitMQ. However, these options should
only be used together, otherwise the system will become unstable. Using
the flag ``om_enable_rabbitmq_high_availability`` will either enable
both options at once, or neither of them.
There are some queues that should not be mirrored:
* ``reply`` queues (these have a single consumer and TTL policy)
* ``fanout`` queues (these have a TTL policy)
* ``amq`` queues (these are auto-delete queues, with a single consumer)
An exclusionary pattern is used in the classic mirroring policy. This
pattern is ``^(?!(amq\\.)|(.*_fanout_)|(reply_)).*``
Change-Id: I51c8023b260eb40b2eaa91bd276b46890c215c25
We've noticed cases where nodepool.private_ipv4 is empty, probably
caused by [1] or a change in nodepool provider configuration.
[1]: https://review.opendev.org/c/zuul/nodepool/+/862522
Change-Id: Ibeca7d99571d9f6d4d1b90277121d685d73c9a59
When running in check mode, some prechecks previously failed because
they use the command module which is silently not run in check mode.
Other prechecks were not running correctly in check mode due to e.g.
looking for a string in empty command output or not querying which
containers are running.
This change fixes these issues.
Closes-Bug: #2002657
Change-Id: I5219cb42c48d5444943a2d48106dc338aa08fa7c
Prevent the haproxy-config role from attempting to modify firewalld when
running kolla-ansible genconfig.
Closes-Bug: #2002522
Change-Id: Ie8a524cc944aa8cb9cf0999b1b8da79f30b40092
The ``[oslo_messaging_rabbit] heartbeat_in_pthread`` config option
is set to ``true`` for wsgi applications to allow the RabbitMQ
heartbeats to function. For non-wsgi applications it is set to ``false``
as it may otherwise break the service [1].
[1] https://docs.openstack.org/releasenotes/oslo.messaging/zed.html#upgrade-notes
Change-Id: Id89bd6158aff42d59040674308a8672c358ccb3c