![Michele Baldessari](/assets/img/avatar_default.png)
Currently when we call the major-upgrade step we do the following: """ ... if [[ -n $(is_bootstrap_node) ]]; then check_clean_cluster fi ... if [[ -n $(is_bootstrap_node) ]]; then migrate_full_to_ng_ha fi ... for service in $(services_to_migrate); do manage_systemd_service stop "${service%%-clone}" ... done """ The problem with the above code is that it is open to the following race condition: 1. Code gets run first on a non-bootstrap controller node so we start stopping a bunch of services 2. Pacemaker notices will notice that services are down and will mark the service as stopped 3. Code gets run on the bootstrap node (controller-0) and the check_clean_cluster function will fail and exit 4. Eventually also the script on the non-bootstrap controller node will timeout and exit because the cluster never shut down (it never actually started the shutdown because we failed at 3) Let's make sure we first only call the HA NG migration step as a separate heat step. Only afterwards we start shutting down the systemd services on all nodes. We also need to move the STONITH_STATE variable into a file because it is being used across two different scripts (1 and 2) and we need to store that state. Co-Authored-By: Athlan-Guyot Sofer <sathlang@redhat.com> Closes-Bug: #1640407 Change-Id: Ifb9b9e633fcc77604cca2590071656f4b2275c60
37 lines
1.2 KiB
Bash
Executable File
37 lines
1.2 KiB
Bash
Executable File
#!/bin/bash
|
|
|
|
set -eu
|
|
|
|
check_cluster
|
|
check_pcsd
|
|
if [[ -n $(is_bootstrap_node) ]]; then
|
|
check_clean_cluster
|
|
fi
|
|
check_python_rpm
|
|
check_galera_root_password
|
|
check_disk_for_mysql_dump
|
|
|
|
# We want to disable fencing during the cluster --stop as it might fence
|
|
# nodes where a service fails to stop, which could be fatal during an upgrade
|
|
# procedure. So we remember the stonith state. If it was enabled we reenable it
|
|
# at the end of this script
|
|
if [[ -n $(is_bootstrap_node) ]]; then
|
|
STONITH_STATE=$(pcs property show stonith-enabled | grep "stonith-enabled" | awk '{ print $2 }')
|
|
# We create this empty file if stonith was set to true so we can reenable stonith in step2
|
|
rm -f /var/tmp/stonith-true
|
|
if [ $STONITH_STATE == "true" ]; then
|
|
touch /var/tmp/stonith-true
|
|
fi
|
|
pcs property set stonith-enabled=false
|
|
fi
|
|
|
|
# Migrate to HA NG and fix up rabbitmq queues
|
|
# We fix up the rabbitmq ha queues after the migration because it will
|
|
# restart the rabbitmq resource. Doing it after the migration means no other
|
|
# services will be restart as there are no other constraints
|
|
if [[ -n $(is_bootstrap_node) ]]; then
|
|
migrate_full_to_ng_ha
|
|
rabbitmq_mitaka_newton_upgrade
|
|
fi
|
|
|