First part of patchset:
https://review.opendev.org/c/openstack/kolla-ansible/+/799229/
in which was suggested to split patch into smaller ones.
This implements kolla_container_engine variable
in command calls of docker,so later on it can be
also used for podman without further change.
Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Change-Id: Ic30b67daa2e215524096ad1f4385c569e3d41b95
There seems to be a bug in Galera that causes
TASK [mariadb : Check MariaDB service WSREP sync status]
to fail.
One (in case of 3-node cluster) or more (possible with
more-than-3-node clusters) nodes may "lose the race" and get stuck
in the "initialized" state of WSREP.
This is entirely random as is the case with most race issues.
MariaDB service restart on that node will fix the situation but
it's unwieldy.
The above may happen because Kolla Ansible starts and waits for
all new nodes at once.
This did not bother the old galera (galera 3) which figured out
the ordering for itself and let each node join the cluster properly.
The proposed workaround is to start and wait for nodes serially.
Change-Id: I449d4c2073d4e3953e9f09725577d2e1c9d563c9
Closes-Bug: #1947485
- Replace hardcoded haproxy monitor user with variable.
- Rename mariadb_backup variable to mariadb_backup_possible.
- Drop creation of monitor user in handlers as this is
now handled in register.yml for good reason.
Change-Id: I255a79d36ae18ca42d0befd00b235ca509197db3
Kolla-ansible is currently installing mariadb
cluster on hosts defined in group['mariadb']
and render haproxy configuration for this hosts.
This is not enough if user want to have several
service databases in several mariadb clusters (shards).
Spread service databases to multiple clusters (shards)
is usefull especially for databases with high load
(neutron,nova).
How it works ?
It works exactly same as now, but group reference 'mariadb'
is now used as group where all mariadb clusters (shards)
are located, and mariadb clusters are installed to
dynamic groups created by group_by and host variable
'mariadb_shard_id'.
It also adding special user 'shard_X' which will be used
for creating users and databases, but only if haproxy
is not used as load-balance solution.
This patch will not affect user which has all databases
on same db cluster on hosts in group 'mariadb', host
variable 'mariadb_shard_id' is set to 0 if not defined.
Mariadb's task in loadbalancer.yml (haproxy) is configuring
mariadb default shard hosts as haproxy backends. If mariadb
role is used to install several clusters (shards), only
default one is loadbalanced via haproxy.
Mariadb's backup is working only for default shard (cluster)
when using haproxy as mariadb loadbalancer, if proxysql
is used, all shards are backuped.
After this patch will be merged, there will be way for proxysql
patches which will implement L7 SQL balancing based on
users and schemas.
Example of inventory:
[mariadb]
server1
server2
server3 mariadb_shard_id=1
server4 mariadb_shard_id=1
server5 mariadb_shard_id=2
server6 mariadb_shard_id=3
Extra:
wait_for_loadbalancer is removed instead of modified as its role
is served by check already. The relevant refactor is applied as
well.
Change-Id: I933067f22ecabc03247ea42baf04f19100dffd08
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
The handler was firing even when we were only generating config.
This is an issue because the services may not have been deployed.
TrivialFix
Change-Id: I2f832d73138b4c9f29e3c71e2463293eab71483a
This fixes issues reported by Mark:
- possible failure with 4-node cluster (however unlikely)
- failure to stop all nodes from progressing when conditions are
not valid (due to: "any_errors_fatal: False")
Change-Id: Ib6995bf4c99202c9813859b3d9e2f420448f0445
These affected both deploy (and reconfigure) and upgrade
resulting in WSREP issues, failed deploys or need to
recover the cluster.
This patch makes sure k-a does not abruptly terminate
nodes to break cluster.
This is achieved by cleaner separation between stages
(bootstrap, restart current, deploy new) and 3 phases
for restarts (to keep the quorum).
Upgrade actions, which operate on a healthy cluster,
went to its section.
Service restart was refactored.
We no longer rely on the master/slave distinction as
all nodes are masters in Galera.
Closes-bug: #1857908
Closes-bug: #1859145
Change-Id: I83600c69141714fc412df0976f49019a857655f5
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.
This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.
I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.
[1] https://review.opendev.org/667363
Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
* Fix wsrep sequence number detection. Log message format is
'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
the UUID rather than the sequence number. This is as good as random.
* Add become: true to log file reading and removal since
I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
'docker cp' command which creates it.
* Don't run handlers during recovery. If the config files change we
would end up restarting the cluster twice.
* Wait for wsrep recovery container completion (don't detach). This
avoids a potential race between wsrep recovery and the subsequent
'stop_container'.
* Finally, we now wait for the bootstrap host to report that it is in
an OPERATIONAL state. Without this we can see errors where the
MariaDB cluster is not ready when used by other services.
Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
Closes-Bug: #1834467
Since we are now in the Train cycle, we can be sure that any running
MariaDB containers can be safely stopped, and we do not need to perform
an explicit shutdown prior to restarting them.
Change-Id: I5450690f1cbe0c995e8e4b01a76e90dac2574d61
Related-Bug: #1820325
Upgrading MariaDB from Rocky to Stein currently fails, with the new
container left continually restarting. The problem is that the Rocky
container does not shutdown cleanly, leaving behind state that the new
container cannot recover. The container does not shutdown cleanly
because we run dumb-init with a --single-child argument, causing it to
forward signals to only the process executed by dumb-init. In our case
this is mysqld_safe, which ignores various signals, including SIGTERM.
After a (default 10 second) timeout, Docker then kills the container.
A Kolla change [1] removes the --single-child argument from dumb-init
for the MariaDB container, however we still need to support upgrading
from Rocky images that don't have this change. To do that, we add new
handlers to execute 'mysqladmin shutdown' to cleanly shutdown the
service.
A second issue with the current upgrade approach is that we don't
execute mysql_upgrade after starting the new service. This can leave the
database state using the format of the previous release. This patch also
adds handlers to execute mysql_upgrade.
[1] https://review.openstack.org/644244
Depends-On: https://review.openstack.org/644244
Depends-On: https://review.openstack.org/645990
Change-Id: I08a655a359ff9cfa79043f2166dca59199c7d67f
Closes-Bug: #1820325
With the more recent versions of ansible, we should now use
"is" instead of the "|"
This should update it.
Change-Id: I6fba56fca182349972e8b0ee5452b37aa4090e0c
This commit is to apply resource-constraints to a few more OpenStack services.
Commit to apply constraints to the last set of services will be made in
the upcoming commit.
Depends-on: Icafa54baca24d2de64238222a5677b9d8b90e2aa
Change-Id: I39004f54281f97d53dfa4b1dbcf248650ad6f186
Add become to all tasks that use the module "kolla_docker"
Change-Id: I4309c4011687b88ec31d739fd8f834fe2326ff10
Partial-Implements: blueprint ansible-specific-task-become
- rename action and serial to kolla_ansible and kolla_serial
- use become instead of "sudo <command>" in shell
- Remove quota for failed_when and changed_when in rabbitmq tasks
Change-Id: I78cb60168aaa40bb6439198283546b7faf33917c
Implements: blueprint migrate-to-ansible-2-2-0
For the genconfig command, master_host will not be defined as it is
defined dynamically in bootstrap.yml.
Co-Authored-By: Stig Telfer <stig@stackhpc.com>
Change-Id: Ib988c8e2de475e9b973fed2f7f752cb2500953c3
Closes-Bug: #1707856