During an upgrade, nova pins the version of RPC calls to the minimum
seen across all services. This ensures that old services do not receive
data they cannot handle. After the upgrade is complete, all nova
services are supposed to be reloaded via SIGHUP to cause them to check
again the RPC versions of services and use the new latest version which
should now be supported by all running services.
Due to a bug [1] in oslo.service, sending services SIGHUP is currently
broken. We replaced the HUP with a restart for the nova_compute
container for bug 1821362, but not other nova services. It seems we need
to restart all nova services to allow the RPC version pin to be removed.
Testing in a Queens to Rocky upgrade, we find the following in the logs:
Automatically selected compute RPC version 5.0 from minimum service
version 30
However, the service version in Rocky is 35.
There is a second issue in that it takes some time for the upgraded
services to update the nova services database table with their new
version. We need to wait until all nova-compute services have done this
before the restart is performed, otherwise the RPC version cap will
remain in place. There is currently no interface in nova available for
checking these versions [2], so as a workaround we use a configurable
delay with a default duration of 30 seconds. Testing showed it takes
about 10 seconds for the version to be updated, so this gives us some
headroom.
This change restarts all nova services after an upgrade, after a 30
second delay.
[1] https://bugs.launchpad.net/oslo.service/+bug/1715374
[2] https://bugs.launchpad.net/nova/+bug/1833542
Change-Id: Ia6fc9011ee6f5461f40a1307b72709d769814a79
Closes-Bug: #1833069
Related-Bug: #1833542
Many tasks that use Docker have become specified already, but
not all. This change ensures all tasks that use the following
modules have become:
* kolla_docker
* kolla_ceph_keyring
* kolla_toolbox
* kolla_container_facts
It also adds become for 'command' tasks that use docker CLI.
Change-Id: I4a5ebcedaccb9261dbc958ec67e8077d7980e496
No need to touch the sudoers.d file each time
Creation and mode setting is handled by lineinfile itself
Change-Id: Ia36e21b04d3a08fab3c748f6298f142c1d73ee6d
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Script looks like it is meant to be run and docs mention
running it rather than sourcing, yet the examples sourced it.
Change-Id: Ib4492ae01bee11b562022099cee8b06b4e3ee3c1
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
... or "what I wish existed when I first became PTL"
Some general improvements to the contributor guide, plus new sections
for PTL duties and release management.
Change-Id: If2f3b7c18de2e6c8d9bac131a16c28c2eeb348f2
When bootstrapping, Heat was not setting a region explicitly, so it
could default to a region other than the one being deployed.
Change-Id: I0a0596a020fbff91ccc5b9f44f271eab220c88cd
The Nova aggregate was always defaulting to some region (usually first
in the Keystone endpoint list) when registering the Nova aggregate for
Blazar. Add in a region override to ensure we are always writing to the
region being deployed.
Change-Id: I3f921ac51acab1b1020a459c07c755af7023e026
When ansible goes in to a loop, by default it prints all the keys for
the item it is looping over. Some roles, when setting up the databases,
iterate over an object that includes the database password.
Override the loop label to hide everything but the database name.
Change-Id: I336a81a5ecd824ace7d40e9a35942a1c853554cd
In a multi-region environment, each region is being deployed separately.
Cell discovery, however, would sometimes fail due to it picking a region
different than the one being deployed. Most likely, an internal endpoint
for region A will not be visible from region B. Furthermore, it is not
very useful to discover hosts on a region you're not modifying.
This changes the check to only run against nova compute services located
in the region being deployed.
Change-Id: I21eb1164c2f67098b81edbd5cc106472663b92cb
- Remove trusted_cidrs that has just been removed from
Qinling code.
- Remove use_api_certificate because it's true by default
- Improve list syntax
- Add etcd section
Change-Id: I0426a9d61fbeaa23a1affbc7e981a78283e88263