This patch fixes the creation of the openvswitch
bridge by fixing an ansible task that was rewritten
to use an ansible module, but unfortunately, its loop
was implemented incorrectly.
Closes-Bug: #2056332
Change-Id: Ia55a36c0f9b122b72d757ca973e7d8f76ae84344
Tooz 6.0.1 includes commit [1], which introduced
parsing the username from the Redis connection URL.
As a result, services started authenticating as admin
which, by the way, was incorrect even before, as either
a created user or the default one should have been used.
The reason it worked before is simply because the username
'admin' wasn't parsed anywhere.
This patch fixes the user being used and sets the correct
'default' one.
[1] https://review.opendev.org/c/openstack/tooz/+/907656
Closes-Bug: #2056667
Depends-On: https://review.opendev.org/c/openstack/kolla/+/911703
Change-Id: I5568dba15fa98e009ad4a9e41756aba0fa659371
These were omitted from I387c9d8f5c01baf6054381834ecf4e554d0fff35 and
I387c9d8f5c01baf6054381834ecf4e554d0fff35.
Closes-Bug: #2041855
Change-Id: I25e5450d1caeebd9c900c190fc0079988f1ca574
Fixes not being able to add additional plugins at build time due to the
`grafana` volume being mounted over the existing `/var/lib/grafana`
directory. This is fixed by copying the dashboards into the container
from an existing bind mount instead of using the ``grafana`` volume.
This however leaves behind the volume which should be removed with
`docker volume rm grafana` or by setting `grafana_remove_old_volume` to
`True`.
Closes-Bug: #2039498
Change-Id: Ibcffa5d8922c470f655f447558d4a9c73b1ba361
This patch adds check_mode: false to tasks
in restart_services.yml which just checking
some WSREP status and if port is UP.
Closes-Bug: #2052501
Change-Id: I92a591900d85138a87991a18dd4339efd053ef1b
This patch disables periodic compute.instance.exists
notifications when designate is enabled.
Related-Bug: #2049503
Change-Id: I39fe2db9182de23c1df814d911eec15e86317702
Service user passwords will now be updated in keystone if services are
reconfigured with new passwords set in config. This behaviour can be
overridden.
Closes-Bug: #2045990
Change-Id: I91671dda2242255e789b521d19348b0cccec266f
Shard allocation is disabled at the start of the OpenSearch upgrade
task. This is set as a transient setting, meaning it will be removed
once the containers are restarted. However, if there is not change in
the OpenSearch container it will not be restarted so the cluster is left
in a broken state: unable to allocate shards.
This patch moves the pre-upgrade tasks to within the handlers, so shard
allocation and the flush are only performed when the OpenSearch
container is going to be restarted.
Closes-Bug: #2049512
Change-Id: Ia03ba23bfbde7d50a88dc16e4f117dec3c98a448
This change fixes the trove failed to discover swift endpoint
by adding service_credentials in guest-agent.conf
Closes-Bug: #2048829
Change-Id: I185484d2a0d0a2d4016df6acf8a6b0a7f934c237
This change fixes the trove guest instance failed to connect to
RabbitMQ by adding durable queues support to oslo_messaging_rabbit
section in guest-agent.conf.
Partial-Bug: #2048822
Change-Id: I8efc3c92e861816385e6cda3b231a950a06bf57d
The addition of an instance resize operation [1] to CI testing is
triggering a failure in kolla-ansible-debian-ovn jobs, which are using a
nodeset with multiple nodes:
oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: scp -r /var/lib/nova/instances/8ca2c7e8-acae-404c-af7d-6cac38e354b8_resize/disk 192.0.2.2:/var/lib/nova/instances/8ca2c7e8-acae-404c-af7d-6cac38e354b8/disk
Exit code: 255
Stdout: ''
Stderr: "Warning: Permanently added '[192.0.2.2]:8022' (ED25519) to the list of known hosts.\r\nsubsystem request failed on channel 0\r\nscp: Connection closed\r\n"
This is not seen on Ubuntu Jammy, which uses OpenSSH 8.9, while Debian
Bookworm uses OpenSSH 9.2. This is likely related to this change in
OpenSSH 9.0 [2]:
This release switches scp(1) from using the legacy scp/rcp protocol
to using the SFTP protocol by default.
Configure sftp subsystem like on RHEL9 derivatives. Even though it is
not yet required for Ubuntu, we also configure it so we are ready for
the Noble release.
[1] https://review.opendev.org/c/openstack/kolla-ansible/+/904249
[2] https://www.openssh.com/txt/release-9.0
Closes-Bug: #2048700
Change-Id: I9f1129136d7664d5cc3b57ae5f7e8d05c499a2a5
This patch sets URL to glance worker.
If this is set, other glance workers will know how to contact this one
directly if needed. For image import, a single worker stages the image
and other workers need to be able to proxy the import request to the
right one.
With current setup glance image import just not working.
Closes-Bug: #2048525
Change-Id: I4246dc8a80038358cd5b6e44e991b3e2ed72be0e
The prometheus_cadvisor container has high CPU usage. On various
production systems I checked it sits around 13-16% on controllers,
averaged over the prometheus 1m scrape interval. When viewed with top we
can see it is a bit spikey and can jump over 100%.
There are various bugs about this, but I found
https://github.com/google/cadvisor/issues/2523 which suggests reducing
the per-container housekeeping interval. This defaults to 1s, which
provides far greater granularity than we need with the default
prometheus scrape interval of 60s.
Reducing the housekeeping interval to 60s on a production controller
reduced the CPU usage from 13% to 3.5% average. This still seems high,
but is more reasonable.
Change-Id: I89c62a45b1f358aafadcc0317ce882f4609543e7
Closes-Bug: #2048223
HAProxy exposes a Prometheus metrics endpoint, it just needs to be
enabled. Enable this and remove configuration for
prometheus-haproxy-exporter. Remaining prometheus-haproxy-exporter
containers will automatically be removed.
Change-Id: If6e75691d2a996b06a9b95cb0aae772db54389fb
Co-Authored-By: Matt Anson <matta@stackhpc.com>
Some containers exiting with 143 instead of 0, but
this is still OK. This patch just allows
ExitCode 143 (SIGTERM) as fix. Details in
bugreport.
Services which exited with 143 (SIGTERM):
kolla-cron-container.service
kolla-designate_producer-container.service
kolla-keystone_fernet-container.service
kolla-letsencrypt_lego-container.service
kolla-magnum_api-container.service
kolla-mariadb_clustercheck-container.service
kolla-neutron_l3_agent-container.service
kolla-openvswitch_db-container.service
kolla-openvswitch_vswitchd-container.service
kolla-proxysql-container.service
Partial-Bug: #2048130
Change-Id: Ia8c85d03404cfb368e4013066c67acd2a2f68deb
We previously used ElasticSearch Curator for managing log
retention. Now that we have moved to OpenSearch, we can use
the Index State Management (ISM) plugin which is bundled with
OpenSearch.
This change adds support for automating the configuration of
the ISM plugin via the OpenSearch API. By default, it has
similar behaviour to the previous ElasticSearch Curator
default policy.
Closes-Bug: #2047037
Change-Id: I5c6d938f2bc380f1575ee4f16fe17c6dca37dcba
Adds a precheck to fail if non-quorum queues are found in RabbitMQ.
Currently excludes fanout and reply queues, pending support in
oslo.messaging [1].
[1]: https://review.opendev.org/c/openstack/oslo.messaging/+/888479
Closes-Bug: #2045887
Change-Id: Ibafdcd58618d97251a3405ef9332022d4d930e2b