Second part of patchset:
https://review.opendev.org/c/openstack/kolla-ansible/+/799229/
in which was suggested to split patch into smaller ones.
This change adds container_engine variable to kolla_container_facts
module, this prepares module to be used with docker and podman as well
without further changes in roles.
Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Co-authored-by: Martin Hiner <m.hiner@partner.samsung.com>
Change-Id: I9e8fa30646844ab4a288555f3aafdda345b3a118
First part of patchset:
https://review.opendev.org/c/openstack/kolla-ansible/+/799229/
in which was suggested to split patch into smaller ones.
This implements kolla_container_engine variable
in command calls of docker,so later on it can be
also used for podman without further change.
Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Change-Id: Ic30b67daa2e215524096ad1f4385c569e3d41b95
This patch adds loadbalancer-config role
which is "wrapper" around haproxy-config
and proxysql-config role which will be added
in follow-up patches.
Change-Id: I64d41507317081e1860a94b9481a85c8d400797d
Kolla environment currently uses haproxy
to fullfill HA in mariadb. This patch
is switching haproxy to proxysql if enabled.
This patch is also replacing mariadb's user
'haproxy' with user 'monitor'. This replacement
has two reasons:
- Use better name to "monitor" galera claster
as there are two services using this user
(HAProxy, ProxySQL)
- Set password for monitor user as it's
always better to use password then not use.
Previous haproxy user didn't use password
as it was historically not possible with
haproxy and mariadb-clustercheck wasn't
implemented.
Depends-On: https://review.opendev.org/c/openstack/kolla/+/769385
Depends-On: https://review.opendev.org/c/openstack/kolla/+/765781
Depends-On: https://review.opendev.org/c/openstack/kolla/+/850656
Change-Id: I0edae33d982c2e3f3b5f34b3d5ad07a431162844
In case of running mariadb role with --limit the group_by module will only include the limited hosts and other hosts that are not limited by ansible will not be included.
Using add_host will add all hosts in mariadb group to their shards group
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
Change-Id: I1331698e313bd714a16fc35f38fb579d75b56370
Closes-Bug: #1947589
"BINLOG MONITOR" and "SLAVE MONITOR" replace
"REPLICATION CLIENT" (which is now an alias for "BINLOG MONITOR").
The validation in Ansible MySQL collection is too simple to
understand aliases and breaks. Hence, let's use the canonical
names and adapt per service according to its needs.
Change-Id: I1175e4846384accd19942620dc155d0c5728e64b
We get a nice optimisation by using a filtered loop instead
of task skipping per service with 'when'.
Partially-Implements: blueprint performance-improvements
Change-Id: I8f68100870ab90cb2d6b68a66a4c97df9ea4ff52
- Replace hardcoded haproxy monitor user with variable.
- Rename mariadb_backup variable to mariadb_backup_possible.
- Drop creation of monitor user in handlers as this is
now handled in register.yml for good reason.
Change-Id: I255a79d36ae18ca42d0befd00b235ca509197db3
Kolla-ansible is currently installing mariadb
cluster on hosts defined in group['mariadb']
and render haproxy configuration for this hosts.
This is not enough if user want to have several
service databases in several mariadb clusters (shards).
Spread service databases to multiple clusters (shards)
is usefull especially for databases with high load
(neutron,nova).
How it works ?
It works exactly same as now, but group reference 'mariadb'
is now used as group where all mariadb clusters (shards)
are located, and mariadb clusters are installed to
dynamic groups created by group_by and host variable
'mariadb_shard_id'.
It also adding special user 'shard_X' which will be used
for creating users and databases, but only if haproxy
is not used as load-balance solution.
This patch will not affect user which has all databases
on same db cluster on hosts in group 'mariadb', host
variable 'mariadb_shard_id' is set to 0 if not defined.
Mariadb's task in loadbalancer.yml (haproxy) is configuring
mariadb default shard hosts as haproxy backends. If mariadb
role is used to install several clusters (shards), only
default one is loadbalanced via haproxy.
Mariadb's backup is working only for default shard (cluster)
when using haproxy as mariadb loadbalancer, if proxysql
is used, all shards are backuped.
After this patch will be merged, there will be way for proxysql
patches which will implement L7 SQL balancing based on
users and schemas.
Example of inventory:
[mariadb]
server1
server2
server3 mariadb_shard_id=1
server4 mariadb_shard_id=1
server5 mariadb_shard_id=2
server6 mariadb_shard_id=3
Extra:
wait_for_loadbalancer is removed instead of modified as its role
is served by check already. The relevant refactor is applied as
well.
Change-Id: I933067f22ecabc03247ea42baf04f19100dffd08
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
This trivial patch is just turning off ansible
changed report for group_by tasks as it could
be confusing for user.
Change-Id: I7512af573782359a6f01290a55291ac7eb0de867
Need to consider Negative seqno to compare in some cases,
but the task does not support to do that, we need to make it work.
1.we use mariabackup to restore datas on control1, delete the
mariadb data on control2 and control3, and then use cluster recovery,
as a result that the seqno of the other two nodes will be '-1'.
2. add one more control node into our existing mariadb cluster,
and then use cluster recovery, the seqno of the new node will be '-1'.
Change-Id: Ic1ac8656f28c3835e091637014f075ac5479d390
This reverts commit 9cae59be51e8d2d798830042a5fd448a4aa5e7dc.
Reason for revert: This patch was found to introduce issues with fluentd customisation. The underlying issue is not currently fully understood, but could be a sign of other obscure issues.
Change-Id: Ia4859c23d85699621a3b734d6cedb70225576dfc
Closes-Bug: #1906288
Mariadb recovery fails if a cluster has previously been deployed, but any of
the mariadb containers do not exist.
Steps to reproduce
==================
* Deploy a mariadb galera cluster
* Remove the mariadb container from at least one host (docker rm -f mariadb)
* Run kolla-ansible mariadb_recovery
Expected results
================
The cluster is recovered, and a new container deployed where necessary.
Actual results
==============
The task 'Stop MariaDB containers' fails on any host where the container does
not exist.
Solution
========
This change fixes the issue by using the 'ignore_missing' flag for kolla_docker
with the stop_container action. This means the task does not fail when the
container does not exist. It is also necessary to swap some 'docker cp'
commands for 'cp' on the host, using the path to the volume.
Closes-Bug: #1907658
Change-Id: Ibd4a6adeb8443e12c45cbab65f501392ffb16fc7
Main plays are action-redirect-stubs, ideal for import_tasks.
This avoids 'include' penalty and makes logs/ara look nicer.
Fixes haproxy and rabbitmq not to check the host group as well.
Change-Id: I46136fc40b815e341befff80b54a91ef431eabc0
Partially-Implements: blueprint performance-improvements
Config plays do not need to check containers. This avoids skipping
tasks during the genconfig action.
Ironic and Glance rolling upgrades are handled specially.
Swift and Bifrost do not use the handlers at all.
Partially-Implements: blueprint performance-improvements
Change-Id: I140bf71d62e8f0932c96270d1f08940a5ba4542a
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. For unconditionally included tasks, switching to
import_tasks provides a clear benefit.
Benchmarking of include vs. import is available at [1].
This change switches from include_tasks to import_tasks where there is
no condition applied to the include.
[1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md#task-include-and-import
Partially-Implements: blueprint performance-improvements
Change-Id: Ia45af4a198e422773d9f009c7f7b2e32ce9e3b97
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. In the case of the check-containers.yml include, the
included file only has a single task, so the overhead of skipping this
task will not be greater than the overhead of the task import. It
therefore makes sense to switch to use import_tasks there.
Partially-Implements: blueprint performance-improvements
Change-Id: I65d911670649960708b9f6a4c110d1a7df1ad8f7
mariadb container name variable is fixed in some places,
but in the defaults directory, mariadb container_name variable
is variable. If the mariadb container_name variable is changed
during deployment, it will not be assigned to container_name,
but a fixed 'mariadb' name.
Change-Id: Ie8efa509953d5efa5c3073c9b550be051a7f4f9b
Both include_role and import_role expect role's name to be given
via "name" param instead of "role".
This worked but caused errors with ansible-lint.
See: https://review.opendev.org/694779
Change-Id: I388d4ae27111e430d38df1abcb6c6127d90a06e0
We assume that all groups are present in the inventory, and quite obtuse
errors can result if any are not.
This change adds a precheck that checks for the presence of all expected
groups in the inventory for each service. It also introduces a common
service-precheck role that we can use for other common prechecks.
Change-Id: Ia0af1e7df4fff7f07cd6530e5b017db8fba530b3
Partially-Implements: blueprint improve-prechecks
This fixes issues reported by Mark:
- possible failure with 4-node cluster (however unlikely)
- failure to stop all nodes from progressing when conditions are
not valid (due to: "any_errors_fatal: False")
Change-Id: Ib6995bf4c99202c9813859b3d9e2f420448f0445
These affected both deploy (and reconfigure) and upgrade
resulting in WSREP issues, failed deploys or need to
recover the cluster.
This patch makes sure k-a does not abruptly terminate
nodes to break cluster.
This is achieved by cleaner separation between stages
(bootstrap, restart current, deploy new) and 3 phases
for restarts (to keep the quorum).
Upgrade actions, which operate on a healthy cluster,
went to its section.
Service restart was refactored.
We no longer rely on the master/slave distinction as
all nodes are masters in Galera.
Closes-bug: #1857908
Closes-bug: #1859145
Change-Id: I83600c69141714fc412df0976f49019a857655f5
As part of the effort to implement Ansible code linting in CI
(using ansible-lint) - we need to implement recommendations from
ansible-lint output [1].
One of them is to stop using local_action in favor of delegate_to -
to increase readability and and match the style of typical ansible
tasks.
[1]: https://review.opendev.org/694779/
Partially implements: blueprint ansible-lint
Change-Id: I46c259ddad5a6aaf9c7301e6c44cd8a1d5c457d3
After performing a recovery of MariaDB, the mariadb containers are left
without a restart policy. This leaves them unable to recover from the
crash of a single galera node. There is another issue, in that the
'master' node is left in a bootstrap configuration, with the
--wsrep-new-cluster argument configured as BOOTSTRAP_ARGS.
This change fixes these issues by removing the restart policy of 'no'
from the 'slave' containers, and recreating the master container without
the restart policy or bootstrap arguments.
Change-Id: I36c875611931163ca2c29ae93b71d3af64cb197c
Closes-Bug: #1851594
Sometimes as cloud admins, we want to only update code that is running
in a cloud. But we dont need to do anything else. Make an action in
kolla-ansible that allows us to do that.
Change-Id: I904f595c69f7276e71692696471e32fd1f88e6e8
Implements: blueprint deploy-containers-action
Explicitly wait for the database to be accessible via the load balancer.
Sometimes it can reject connections even when all database services are up,
possibly due to the health check polling in HAProxy.
Closes-Bug: #1840145
Change-Id: I7601bb710097a78f6b29bc4018c71f2c6283eef2
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.
This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.
I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.
[1] https://review.opendev.org/667363
Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
* Fix wsrep sequence number detection. Log message format is
'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
the UUID rather than the sequence number. This is as good as random.
* Add become: true to log file reading and removal since
I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
'docker cp' command which creates it.
* Don't run handlers during recovery. If the config files change we
would end up restarting the cluster twice.
* Wait for wsrep recovery container completion (don't detach). This
avoids a potential race between wsrep recovery and the subsequent
'stop_container'.
* Finally, we now wait for the bootstrap host to report that it is in
an OPERATIONAL state. Without this we can see errors where the
MariaDB cluster is not ready when used by other services.
Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
Closes-Bug: #1834467
Many tasks that use Docker have become specified already, but
not all. This change ensures all tasks that use the following
modules have become:
* kolla_docker
* kolla_ceph_keyring
* kolla_toolbox
* kolla_container_facts
It also adds become for 'command' tasks that use docker CLI.
Change-Id: I4a5ebcedaccb9261dbc958ec67e8077d7980e496