129 Commits

Author SHA1 Message Date
Zuul
980dd33721 Merge "mariadb: Deprecate wsrep-notify.sh" 2021-04-21 09:50:44 +00:00
Michał Nasiadka
451844ac67 mariadb: Deprecate wsrep-notify.sh
Change-Id: I14376dac46809f8bb466ec41f279be8d323d459d
2021-04-15 08:12:31 +00:00
Michal Arbet
5d17100118 Additional small changes in role/mariadb
- Replace hardcoded haproxy monitor user with variable.
 - Rename mariadb_backup variable to mariadb_backup_possible.
 - Drop creation of monitor user in handlers as this is
   now handled in register.yml for good reason.

Change-Id: I255a79d36ae18ca42d0befd00b235ca509197db3
2021-04-14 16:10:30 +02:00
Michał Nasiadka
d7a9be84d4 mariadb: Disable wsrep-notify script if clustercheck enabled
Change-Id: Id16ec7d7b57630ae20430675c4a196e63ca8d4a5
2021-04-14 09:46:20 +00:00
Michal Arbet
09b3c6ca07 Refactor mariadb to support shards
Kolla-ansible is currently installing mariadb
cluster on hosts defined in group['mariadb']
and render haproxy configuration for this hosts.

This is not enough if user want to have several
service databases in several mariadb clusters (shards).

Spread service databases to multiple clusters (shards)
is usefull especially for databases with high load
(neutron,nova).

How it works ?

It works exactly same as now, but group reference 'mariadb'
is now used as group where all mariadb clusters (shards)
are located, and mariadb clusters are installed to
dynamic groups created by group_by and host variable
'mariadb_shard_id'.

It also adding special user 'shard_X' which will be used
for creating users and databases, but only if haproxy
is not used as load-balance solution.

This patch will not affect user which has all databases
on same db cluster on hosts in group 'mariadb', host
variable 'mariadb_shard_id' is set to 0 if not defined.

Mariadb's task in loadbalancer.yml (haproxy) is configuring
mariadb default shard hosts as haproxy backends. If mariadb
role is used to install several clusters (shards), only
default one is loadbalanced via haproxy.

Mariadb's backup is working only for default shard (cluster)
when using haproxy as mariadb loadbalancer, if proxysql
is used, all shards are backuped.

After this patch will be merged, there will be way for proxysql
patches which will implement L7 SQL balancing based on
users and schemas.

Example of inventory:

[mariadb]
server1
server2
server3 mariadb_shard_id=1
server4 mariadb_shard_id=1
server5 mariadb_shard_id=2
server6 mariadb_shard_id=3

Extra:
wait_for_loadbalancer is removed instead of modified as its role
is served by check already. The relevant refactor is applied as
well.

Change-Id: I933067f22ecabc03247ea42baf04f19100dffd08
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2021-04-07 23:19:42 +02:00
Michal Arbet
209dc1e9dc Set changed_when to false for group_by tasks
This trivial patch is just turning off ansible
changed report for group_by tasks as it could
be confusing for user.

Change-Id: I7512af573782359a6f01290a55291ac7eb0de867
2021-03-13 13:59:23 +00:00
fudunwei
068f3fea50 Negative seqno need to be considered when comparing seqno
Need to consider Negative seqno to compare in some cases,
but the task does not support to do that, we need to make it work.

1.we use mariabackup to restore datas on control1, delete the
mariadb data on control2 and control3, and then use cluster recovery,
 as a result that the seqno of the other two nodes will be '-1'.

2. add one more control node into our existing mariadb cluster,
and then use cluster recovery, the seqno of the new node will be '-1'.

Change-Id: Ic1ac8656f28c3835e091637014f075ac5479d390
2021-01-29 13:46:37 +08:00
fudunwei
aec42f0f5f Correct spell error
correct spell 'Cheching' to 'Checking'

Change-Id: I3ceb6960c3b38f371d0d4163ee37d4b34e61f401
2021-01-25 09:58:12 +08:00
Zuul
cb6ffa25e8 Merge "Fix mariadb_recovery when mariadb container is missing" 2020-12-16 09:36:54 +00:00
Mark Goddard
db4fc85c33 Revert "Performance: Use import_tasks in the main plays"
This reverts commit 9cae59be51e8d2d798830042a5fd448a4aa5e7dc.

Reason for revert: This patch was found to introduce issues with fluentd customisation. The underlying issue is not currently fully understood, but could be a sign of other obscure issues.

Change-Id: Ia4859c23d85699621a3b734d6cedb70225576dfc
Closes-Bug: #1906288
2020-12-14 10:36:55 +00:00
Mark Goddard
f903d774af Fix mariadb_recovery when mariadb container is missing
Mariadb recovery fails if a cluster has previously been deployed, but any of
the mariadb containers do not exist.

Steps to reproduce
==================

* Deploy a mariadb galera cluster
* Remove the mariadb container from at least one host (docker rm -f mariadb)
* Run kolla-ansible mariadb_recovery

Expected results
================

The cluster is recovered, and a new container deployed where necessary.

Actual results
==============

The task 'Stop MariaDB containers' fails on any host where the container does
not exist.

Solution
========

This change fixes the issue by using the 'ignore_missing' flag for kolla_docker
with the stop_container action. This means the task does not fail when the
container does not exist. It is also necessary to swap some 'docker cp'
commands for 'cp' on the host, using the path to the volume.

Closes-Bug: #1907658

Change-Id: Ibd4a6adeb8443e12c45cbab65f501392ffb16fc7
2020-12-10 12:27:25 +00:00
Radosław Piliszek
9cae59be51 Performance: Use import_tasks in the main plays
Main plays are action-redirect-stubs, ideal for import_tasks.

This avoids 'include' penalty and makes logs/ara look nicer.

Fixes haproxy and rabbitmq not to check the host group as well.

Change-Id: I46136fc40b815e341befff80b54a91ef431eabc0
Partially-Implements: blueprint performance-improvements
2020-10-27 19:09:32 +01:00
Radosław Piliszek
3411b9e420 Performance: optimize genconfig
Config plays do not need to check containers. This avoids skipping
tasks during the genconfig action.

Ironic and Glance rolling upgrades are handled specially.

Swift and Bifrost do not use the handlers at all.

Partially-Implements: blueprint performance-improvements
Change-Id: I140bf71d62e8f0932c96270d1f08940a5ba4542a
2020-10-12 19:30:06 +02:00
Mark Goddard
b685ac44e0 Performance: replace unconditional include_tasks with import_tasks
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. For unconditionally included tasks, switching to
import_tasks provides a clear benefit.

Benchmarking of include vs. import is available at [1].

This change switches from include_tasks to import_tasks where there is
no condition applied to the include.

[1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md#task-include-and-import

Partially-Implements: blueprint performance-improvements

Change-Id: Ia45af4a198e422773d9f009c7f7b2e32ce9e3b97
2020-08-28 16:12:03 +00:00
Mark Goddard
9702d4c3c3 Performance: use import_tasks for check-containers.yml
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. In the case of the check-containers.yml include, the
included file only has a single task, so the overhead of skipping this
task will not be greater than the overhead of the task import. It
therefore makes sense to switch to use import_tasks there.

Partially-Implements: blueprint performance-improvements

Change-Id: I65d911670649960708b9f6a4c110d1a7df1ad8f7
2020-07-28 12:10:59 +01:00
Mark Goddard
b84d2f8b77 Fix handler notification for mariadb-clustercheck
This was missed in the original patch.

Change-Id: I991b0563560cf4a0b1feb718951ffdf21ab81856
2020-06-08 14:43:34 +01:00
Zuul
6f829575c9 Merge "Custom haproxy script for monitoring galera" 2020-06-02 15:01:55 +00:00
Michal Nasiadka
026f5cc48a Custom haproxy script for monitoring galera
Depends-On: https://review.opendev.org/710217/

Change-Id: I85652f23e487c40192106d23f2cdd45a3077deca
2020-05-20 13:02:44 +02:00
Zuul
87984f5425 Merge "Add Ansible group check to prechecks" 2020-04-16 15:33:46 +00:00
LinPeiWen
8a206699d4 mariadb container name variable
mariadb container name variable is fixed in some places,
but in the defaults directory, mariadb container_name variable
is variable. If the mariadb container_name variable is changed
during deployment, it will not be assigned to container_name,
but a fixed 'mariadb' name.

Change-Id: Ie8efa509953d5efa5c3073c9b550be051a7f4f9b
2020-03-25 01:17:29 -04:00
Radosław Piliszek
266fd61ad7 Use "name:" instead of "role:" for *_role modules
Both include_role and import_role expect role's name to be given
via "name" param instead of "role".
This worked but caused errors with ansible-lint.
See: https://review.opendev.org/694779

Change-Id: I388d4ae27111e430d38df1abcb6c6127d90a06e0
2020-03-02 10:01:17 +01:00
Mark Goddard
49fb55f182 Add Ansible group check to prechecks
We assume that all groups are present in the inventory, and quite obtuse
errors can result if any are not.

This change adds a precheck that checks for the presence of all expected
groups in the inventory for each service. It also introduces a common
service-precheck role that we can use for other common prechecks.

Change-Id: Ia0af1e7df4fff7f07cd6530e5b017db8fba530b3
Partially-Implements: blueprint improve-prechecks
2020-02-28 16:23:14 +00:00
Radosław Piliszek
1ea029a91d Followup on MariaDB handling fixes
This fixes issues reported by Mark:
- possible failure with 4-node cluster (however unlikely)
- failure to stop all nodes from progressing when conditions are
  not valid (due to: "any_errors_fatal: False")

Change-Id: Ib6995bf4c99202c9813859b3d9e2f420448f0445
2020-02-02 16:39:29 +01:00
Zuul
67a9d289b4 Merge "Fix multiple issues with MariaDB handling" 2020-01-21 09:29:59 +00:00
Radosław Piliszek
9f14ad651a Fix multiple issues with MariaDB handling
These affected both deploy (and reconfigure) and upgrade
resulting in WSREP issues, failed deploys or need to
recover the cluster.

This patch makes sure k-a does not abruptly terminate
nodes to break cluster.
This is achieved by cleaner separation between stages
(bootstrap, restart current, deploy new) and 3 phases
for restarts (to keep the quorum).

Upgrade actions, which operate on a healthy cluster,
went to its section.

Service restart was refactored.

We no longer rely on the master/slave distinction as
all nodes are masters in Galera.

Closes-bug: #1857908
Closes-bug: #1859145
Change-Id: I83600c69141714fc412df0976f49019a857655f5
2020-01-15 20:15:09 +01:00
Mark Goddard
5fb10e08fe Ansible lint: use command module instead of shell
Change-Id: Ibf40216b847f103e383f19fe1ef608a75fcfd452
Co-Authored-By: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
2020-01-13 10:45:10 +00:00
Mark Goddard
a6cb008c54 Ansible lint: task names
Change-Id: Iecbc2fe5fa3391dca5a3cc7e575314b95942114b
Co-Authored-By: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
2020-01-13 10:38:12 +00:00
Michal Nasiadka
1009931162 Change local_action to delegate_to: localhost
As part of the effort to implement Ansible code linting in CI
(using ansible-lint) - we need to implement recommendations from
ansible-lint output [1].

One of them is to stop using local_action in favor of delegate_to -
to increase readability and and match the style of typical ansible
tasks.

[1]: https://review.opendev.org/694779/

Partially implements: blueprint ansible-lint

Change-Id: I46c259ddad5a6aaf9c7301e6c44cd8a1d5c457d3
2019-11-22 15:04:44 +00:00
Mark Goddard
f979ae1f8e Fix restart policy after MariaDB recovery
After performing a recovery of MariaDB, the mariadb containers are left
without a restart policy. This leaves them unable to recover from the
crash of a single galera node. There is another issue, in that the
'master' node is left in a bootstrap configuration, with the
--wsrep-new-cluster argument configured as BOOTSTRAP_ARGS.

This change fixes these issues by removing the restart policy of 'no'
from the 'slave' containers, and recreating the master container without
the restart policy or bootstrap arguments.

Change-Id: I36c875611931163ca2c29ae93b71d3af64cb197c
Closes-Bug: #1851594
2019-11-07 10:08:57 +00:00
Zuul
6160cdc576 Merge "Use mariabackup for database backups" 2019-11-04 12:10:42 +00:00
Mark Goddard
7f47ddf7f4 Use mariabackup for database backups
Currently, Xtrabackup is used for database backups. However, Xtrabackup
is not compatible with MariaDB 10.3. This change switches to use
mariabackup [1], which is available in the mariadb image.

The documented full and incremental restore procedures have been
modified to use mariabackup, following [2] and [3].

[1] https://mariadb.com/kb/en/library/mariabackup-overview/
[2] https://mariadb.com/kb/en/library/full-backup-and-restore-with-mariabackup/
[3] https://mariadb.com/kb/en/library/incremental-backup-and-restore-with-mariabackup/

Change-Id: Id52b9b1f7b013277e401b1f6b8aed34473d2b2c4
Closes-Bug: #1843043
Depends-On: https://review.opendev.org/691290
2019-11-01 18:44:10 +00:00
Mark Goddard
a6372a66f2 Fix deploy-containers command for mariadb
The MariaDB handlers require master_host to be set.

TrivialFix

Change-Id: I162efbd9e615b86dcdc6e8a4af081cda2f8b0b2b
2019-10-25 17:20:52 +00:00
Kris Lindgren
2fe0d98ebb Add a job that *only* deploys updated containers
Sometimes as cloud admins, we want to only update code that is running
in a cloud.  But we dont need to do anything else.  Make an action in
kolla-ansible that allows us to do that.

Change-Id: I904f595c69f7276e71692696471e32fd1f88e6e8
Implements: blueprint deploy-containers-action
2019-09-26 17:51:14 +01:00
Zuul
d8e961eeaa Merge "Wait for MariaDB to be accessible via HAProxy" 2019-08-27 12:58:06 +00:00
Scott Solkhon
03cd7eb356 Wait for MariaDB to be accessible via HAProxy
Explicitly wait for the database to be accessible via the load balancer.
Sometimes it can reject connections even when all database services are up,
possibly due to the health check polling in HAProxy.

Closes-Bug: #1840145
Change-Id: I7601bb710097a78f6b29bc4018c71f2c6283eef2
2019-08-15 10:00:36 +00:00
Radosław Piliszek
6a737b1968 Fix handling of docker restart policy
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.

This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.

I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.

[1] https://review.opendev.org/667363

Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2019-07-18 13:39:06 +00:00
Mark Goddard
86f373a198 Fixes for MariaDB bootstrap and recovery
* Fix wsrep sequence number detection. Log message format is
  'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
  the UUID rather than the sequence number. This is as good as random.

* Add become: true to log file reading and removal since
  I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
  'docker cp' command which creates it.

* Don't run handlers during recovery. If the config files change we
  would end up restarting the cluster twice.

* Wait for wsrep recovery container completion (don't detach). This
  avoids a potential race between wsrep recovery and the subsequent
  'stop_container'.

* Finally, we now wait for the bootstrap host to report that it is in
  an OPERATIONAL state. Without this we can see errors where the
  MariaDB cluster is not ready when used by other services.

Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
Closes-Bug: #1834467
2019-07-05 09:20:34 +00:00
Mark Goddard
b123bf6621 Use become for all docker tasks
Many tasks that use Docker have become specified already, but
not all. This change ensures all tasks that use the following
modules have become:

* kolla_docker
* kolla_ceph_keyring
* kolla_toolbox
* kolla_container_facts

It also adds become for 'command' tasks that use docker CLI.

Change-Id: I4a5ebcedaccb9261dbc958ec67e8077d7980e496
2019-06-06 19:04:58 +01:00
Raimund Hook
84ea42bd7c Updating Jinja filters to conform to Ansible 2.5+
Since Ansible 2.5, the use of jinja tests as filters has been
deprecated.

I've run the script provided by the ansible team to 'fix' the
jinja filters to conform to the newer syntax.

This fixes the deprecation warnings.

Change-Id: I844ecb7bec94e561afb09580f58b1bf83a6d00bd
Closes-bug: #1827370
2019-05-02 14:58:09 +01:00
Zuul
d9a0734fc5 Merge "Use database_address and database_port var for mariadb check" 2019-04-04 07:35:08 +00:00
Jim Rollenhagen
524f969bfc Use database_address and database_port var for mariadb check
This is how services reach mariadb; verify it that way.

Closes-Bug: #1823005
Change-Id: I9924ad050118b8a853e2309654a089f65178cd77
2019-04-03 12:54:21 +00:00
Mark Goddard
a4bb8567da Fix up config file permissions on the host
Several config file permissions are incorrect on the host. In general,
files should be 0660, and directories and executables 0770.

Change-Id: Id276ac1864f280554e98b937f2845bb424d521de
Closes-Bug: #1821579
2019-04-02 17:23:31 +01:00
Mark Goddard
b25c0ee477 Fix MariaDB 10.3 upgrade
Upgrading MariaDB from Rocky to Stein currently fails, with the new
container left continually restarting. The problem is that the Rocky
container does not shutdown cleanly, leaving behind state that the new
container cannot recover. The container does not shutdown cleanly
because we run dumb-init with a --single-child argument, causing it to
forward signals to only the process executed by dumb-init. In our case
this is mysqld_safe, which ignores various signals, including SIGTERM.
After a (default 10 second) timeout, Docker then kills the container.

A Kolla change [1] removes the --single-child argument from dumb-init
for the MariaDB container, however we still need to support upgrading
from Rocky images that don't have this change. To do that, we add new
handlers to execute 'mysqladmin shutdown' to cleanly shutdown the
service.

A second issue with the current upgrade approach is that we don't
execute mysql_upgrade after starting the new service. This can leave the
database state using the format of the previous release. This patch also
adds handlers to execute mysql_upgrade.

[1] https://review.openstack.org/644244

Depends-On: https://review.openstack.org/644244
Depends-On: https://review.openstack.org/645990
Change-Id: I08a655a359ff9cfa79043f2166dca59199c7d67f
Closes-Bug: #1820325
2019-03-23 10:21:37 +00:00
Eduardo Gonzalez
1a682fab28 Support stop specific containers
With this change, an operator may be able to stop a
service container without stopping all services in a host.
This change is the starting point to start
fast-forward upgrades support.
In next changes new flags will be introducced to disable
stop dataplane services during upgrades.

Change-Id: Ifde7a39d7d8596ef0d7405ecf1ac1d49a459d9ef
Implements: blueprint support-stop-containers
2018-11-26 08:07:01 +00:00
Nick Jones
f704a78029 Add new option to perform an on-demand backup of MariaDB
blueprint database-backup-recovery

Introduce a new option, mariadb_backup, which takes a backup of all
databases hosted in MariaDB.

Backups are performed using XtraBackup, the output of which is saved to
a dedicated Docker volume on the target host (which defaults to the
first node in the MariaDB cluster).

It supports either full (the default) or incremental backups.

Change-Id: Ied224c0d19b8734aa72092aaddd530155999dbc3
2018-11-22 09:20:59 +00:00
Adam Harwell
f1c8136556 Refactor haproxy config (split by service) V2.0
Having all services in one giant haproxy file makes altering
configuration for a service both painful and dangerous. Each service
should be configured with a simple set of variables and rendered with a
single unified template.

Available are two new templates:

* haproxy_single_service_listen.cfg.j2: close to the original style, but
only one service per file
* haproxy_single_service_split.cfg.j2: using the newer haproxy syntax
for separated frontend and backend

For now the default will be the single listen block, for ease of
transition.

Change-Id: I6e237438fbc0aa3c89a3c8bd706a53b74e71904b
2018-09-26 03:30:38 -07:00
caoyuan
471985dc2c Update usage of "|" to "is"
With the more recent versions of ansible, we should now use
"is" instead of the "|"

This should update it.

Change-Id: I6fba56fca182349972e8b0ee5452b37aa4090e0c
2018-08-13 12:40:10 +05:30
Zuul
3e45b2cbec Merge "Use include_tasks instead of include" 2018-07-27 08:16:08 +00:00
Lakshmi Prasanna Goutham Pratapa
14bf524756 Apply Resource Constraints to Services.
This commit is to apply resource-constraints to a few more OpenStack services.
Commit to  apply constraints to the last set of services will be made in
the upcoming commit.

Depends-on: Icafa54baca24d2de64238222a5677b9d8b90e2aa
Change-Id: I39004f54281f97d53dfa4b1dbcf248650ad6f186
2018-07-26 11:35:28 +00:00
Jeffrey Zhang
b51eeed89e Use include_tasks instead of include
include is marked as deprecated since ansible 2.4[0]

[0] https://docs.ansible.com/ansible/2.4/include_module.html#deprecated

Co-Authored-By: confi-surya <singh.surya64mnnit@gmail.com>
Change-Id: Ic9d71e1865d1c728890625aeddf424a5734c0a8a
2018-07-25 23:57:22 +08:00