We've seen issues in CI when keepalived haproxy check script returns
an error and keepalived is switching to backup and then again to primary
on a single node environment.
Closes-Bug: #2025219
Change-Id: Iba62e76b3cf83f3ade6df81288d2d77129ffc725
This patch fixing issue with octavia security group
rules creation when using IPv6 configuration for octavia
management network.
Closes-Bug: #2023502
Change-Id: I3f8fbb0632ec6ecdc9f3820ebbcf01480de59e1f
Replaces the instance label on prometheus metrics with the inventory
hostname as opposed to the ip address. The ip address is still used as
the target address which means that there is no issue of the hostname
being unresolvable. Can be optionally enabled or set to FQDNs by
changing the prometheus_instance_label variable as mentioned in the
release notes.
Co-Authored-By: Will Szumski <will@stackhpc.com>
Change-Id: I387c9d8f5c01baf6054381834ecf4e554d0fff35
Ansible 2.14.3 introduced a change that broke the method used for
restarting MariaDB and RabbitMQ serially [1][2]. In
I57425680a4cdbf0daeb9b2cc35920f1b933aa4a8 we limited to 2.14.2 to work
around this. Ansible upstream claim this behaviour was unintentional,
and will not fix it.
This change moves to a different approach where we use separate plays
with a 'serial' keyword to execute the restart.
This change also removes the restriction on the maximum supported
version of 2.14.2 on ansible-core - any 2.14 release is now supported.
[1] 65366f663d
[2] https://github.com/ansible/ansible/issues/80848
Depends-On: https://review.opendev.org/c/openstack/kolla/+/884208
Change-Id: I5a12670d07077d24047aaff57ce8d33ccf7156ff
This patch is adding a feature for an option to copy different
ceph configuration files and corresponding keyrings for cinder,
glance, manila, gnocchi and nova services.
This is especially useful when the deployment uses availability
zones as below example.
- Individual compute can read/write to individual ceph
cluster in same AZ.
- Cinder can write to several ceph clusters in several AZs.
- Glance can use multistore and upload images to
several ceph clusters in several AZs at once.
Change-Id: Ie4d8ab5a3df748137835cae1c943b9180cd10eb1
According to the documentation [1] type of the Cyborg service should
be 'accelerator' and description 'Acceleration Service'. Also, this
change fixes incorrect endpoint URLs, and not configures an admin
endpoint [2] because the documentation [1] not updated yet.
1. https://docs.openstack.org/cyborg/latest/install/common.html
2. Icf3bf08deab2c445361f0a0124d87ad8b0e4e9d9
Closes-Bug: #2020080
Change-Id: I002db50cbad5a90e479498e605bdeab343e129c7
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwd
commands now creates or updates passwords.yml with correct
permissions. Also they display warning message about incorrect
permissions.
Closes-Bug: #2018338
Change-Id: I4b50053ced9150499d1d09fd4a0ec2e243cf938b
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
Add file to the reno documentation build to show release notes for
stable/2023.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.
Sem-Ver: feature
Change-Id: I870c0569a1e175ac5df59fc495812ba81c5147e6
We limit to 2.14.2 due to a regression in ansible-core [1] that breaks
conditional include_task loops in handlers. This is used for controlled
restarts of MariaDB and RabbitMQ.
[1]: 65366f663d
Change-Id: I57425680a4cdbf0daeb9b2cc35920f1b933aa4a8
Co-Authored-By: Michal Nasiadka <michal@stackhpc.com>
As of I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8 nova
now requires the service_user section to be configured
to address CVE-2023-2088. This change adds
the service user section to the nova.conf template in
the nova and nova-cell roles.
Related-Bug: #2004555
Signed-off-by: Sven Kieske <kieske@osism.tech>
Change-Id: I2189dafca070accfd8efcd4b8cc4221c6decdc9f
(cherry picked from commit a77ea13ef1991543df29b7eea14b1f91ef26f858)
(cherry picked from commit 03c12abbcc107bfec451f4558bc97d14facae01c)
(cherry picked from commit cb105dc293ff1cdb11ab63fa3e3bf39fd17e0ee0)
(cherry picked from commit efe6650d09441b02cf93738a94a59723d84c5b19)
The flags ``--db-nb-pid`` and ``--db-sb-pid`` are corected to be
``--db-nb-pidfile`` and ``--db-sb-pidfile`` respectively. See here for
reference:
6c6a7ad1c6/utilities/ovn-ctl (L1045)
Closes-Bug: #2018436
Change-Id: Ic1e8768374566eb2198302807ecc644a19cd3062
When using externally managed certificates, according to [1],
one should set `kolla_externally_managed_cert: yes` and ensure
that the certificates are in the correct place.
However, RabbitMQ precheck still expects the certificates to be
available on the controller node. This is incorrect.
Fix by not running the tasks in question when `kolla_externally_managed_cert: yes`
[1] https://docs.openstack.org/kolla-ansible/latest/admin/tls.html
Closes-Bug: 1999081
Related-Bug: 1940286
Signed-off-by: Magnus Lööf <magnus.loof@basalt.se>
Change-Id: I9f845a7bdf5055165e199ab1887ed3ccbfb9d808
This reverts commit 9867060b6b3bd36aad121b53b9e5dddfca8a8e4c.
Reason for revert: seems this broke some jobs
Change-Id: I1ca81214ece403351c0a522ea05bf07802e4c4c0
This patch introduces distributed lock for masakari-api
service when handle the concurrent notifications for the same
host failure from multiple masakari-hostmonitor services.
Change-Id: I46985202dc8da22601357eefe2727599e7a413e5
With the addition of the variable
`om_enable_rabbitmq_high_availability`, this feature in the upgrade
task should be brought back. It is also now used in the deploy task. The
`ha-all` policy is cleared only when
`om_enable_rabbitmq_high_availability` is set to `false`.
Change-Id: Ia056aa40e996b1f0fed43c0f672466c7e4a2f547
Puts the RabbitMQ node into maintenance mode before restarting the
container. This will make the node shutdown less disruptive. For details
on what maintenance mode does, see:
https://www.rabbitmq.com/upgrade.html#maintenance-mode
Change-Id: Ia61573f3fb95fe8fcde6b789ca77ef5b45fe0a65
Since CVE-2022-29404 is fixed [1,2] the default value for the
LimitRequestBody directive in the Apache HTTP Server has been changed
from 0 (unlimited) to 1 GiB. This limits the size of images (for
example) uploaded in Horizon. This change add the ability to
configure the limit.
1. https://access.redhat.com/articles/6975397
2. https://ubuntu.com/security/CVE-2022-29404
Closes-Bug: #2012588
Change-Id: I4cd9dd088cbcf38ff6f8d188ebcc56be7d9ea1c9
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
When upgrading Nova, we sometimes hit an error where an old hypervisor
that hasn’t been upgraded recently (for example due to broken hardware)
is preventing Nova API from starting properly. This can be detected
using the tool ``nova-status upgrade check`` to make sure that there are
no ``nova-compute`` that are older than N-1 releases. This is already
used in the Kolla Ansible upgrade task for Nova. However, this task uses
the current ``nova-api`` container, so computes which will be too old
after the upgrade are not caught.
This patch changes Kolla Ansible so that the upgraded ``nova-api`` image
is used to run the upgrade checks, allowing computes that will be too
old to be detected before the upgrades are performed.
Depends-On: https://review.opendev.org/c/openstack/kolla/+/878744
Closes-Bug: #1957080
Co-Authored-By: Pierre Riteau <pierre@stackhpc.com>
Change-Id: I3a899411001834a0c88e37f45a756247ee11563d
Following ideas here:
https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit
Make sure old messages with no consumer are dropped after the message
TTL of 10 mins, longer than the 1 min RPC timeout.
Also ensure queues expire after an hour of inactivity, so queues from
removed nodes or renamed nodes don't grow over time.
Change-Id: Ifb28ac68b6328adb604a7474d01e5f7a47b2e788
Adds two new flags to alter behaviour in RabbitMQ:
* `rabbitmq_message_ttl_ms`, which lets you set a TTL on messages.
* `rabbitmq_queue_expiry_ms`, which lets you set an expiry time on queues.
See https://www.rabbitmq.com/ttl.html for more information on both.
Change-Id: I51ca37ffbb1bb5c07f2d39873f0f33ca20263f2a
Changes the default value of `rabbitmq-ha-promote-on-shutdown` to
`"always"`.
We are seeing issues with RabbitMQ automatically recovering when nodes
are restarted. https://www.rabbitmq.com/ha.html#cluster-shutdown
Rather than waiting for operator interventions, it is better we allow
recovery to happen, even if that means we may loose some messages.
A few failed and timed out operations is better than a totaly broken
cloud. This is achieved using ha-promote-on-shutdown=always.
Note, when a node failure is detected, this is already the default
behaviour from 3.7.5 onwards:
https://www.rabbitmq.com/ha.html#promoting-unsynchronised-mirrors
Related-Bug: #1954925
Change-Id: I484a81163f703fa27112df22473d657e2a9ab964
With the parameter ``mariadb_datadir_volume`` it is possible
to use a directory as volume for the mariadb service. By default,
a volume named mariadb is used (the previous default).
Change-Id: Ic61fe981825c5fa6f50e53c9555b6a102f42f522