We use the wsrep_notify.sh script to notify changes in Galera cluster
membership to haproxy. When xtrabackup was used for the state transfer,
nodes in the Donor state would be included in the backend pool. However,
since the switch to mariabackup in the Stein cycle, we now remove nodes
in the Donor state from the backend pool.
This change ensures that nodes in the Donor state are included in the
backend pool when the SST method is either xtrabackup or mariabackup.
https://galeracluster.com/library/documentation/mysql-wsrep-options.html#wsrep-notify-cmd
Change-Id: Ide4301779a0d221ae5d4dbdd4873fb8a40eb7297
Co-authored-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Closes-Bug: #1850945
Introduce kolla_address filter.
Introduce put_address_in_context filter.
Add AF config to vars.
Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]
Other changes:
globals.yml - mention just IP in comment
prechecks/port_checks (api_intf) - kolla_address handles validation
3x interface conditional (swift configs: replication/storage)
2x interface variable definition with hostname
(haproxy listens; api intf)
1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)
neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network
basic multinode source CI job for IPv6
prechecks for rabbitmq and qdrouterd use proper NSS database now
MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)
Ceph naming workaround in CI
TODO: probably needs documenting
RabbitMQ IPv6-only proto_dist
Ceph ms switch to IPv6 mode
Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)
haproxy upgrade checks for slaves based on ipv6 addresses
TODO:
ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.
ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.
rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.
ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.
KNOWN ISSUES (beyond us):
One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN
RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982
For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227
Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689
Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Upgrading MariaDB from Rocky to Stein currently fails, with the new
container left continually restarting. The problem is that the Rocky
container does not shutdown cleanly, leaving behind state that the new
container cannot recover. The container does not shutdown cleanly
because we run dumb-init with a --single-child argument, causing it to
forward signals to only the process executed by dumb-init. In our case
this is mysqld_safe, which ignores various signals, including SIGTERM.
After a (default 10 second) timeout, Docker then kills the container.
A Kolla change [1] removes the --single-child argument from dumb-init
for the MariaDB container, however we still need to support upgrading
from Rocky images that don't have this change. To do that, we add new
handlers to execute 'mysqladmin shutdown' to cleanly shutdown the
service.
A second issue with the current upgrade approach is that we don't
execute mysql_upgrade after starting the new service. This can leave the
database state using the format of the previous release. This patch also
adds handlers to execute mysql_upgrade.
[1] https://review.openstack.org/644244
Depends-On: https://review.openstack.org/644244
Depends-On: https://review.openstack.org/645990
Change-Id: I08a655a359ff9cfa79043f2166dca59199c7d67f
Closes-Bug: #1820325
Those issues intermittently show up in various branches,
in all cases it's wrong path used to resolveip binary.
Similar to the recent kolla-ansible-ubuntu-source job failures.
Change-Id: I8cce42b60897e4ceb8d3b0bd5181fda88b10c2b8
- py35/py36 jobs are failing
python 3.6 pycache also includes links - so those also
need to be removed by tox testenv
- kolla-ansible-ubuntu-source job is failing
Without basedir set in galera.cnf - mysql_install_db looks for resolveip
in /usr/sbin, instead of /usr/bin, thus complains about cannot resolving
neither $HOSTNAME, nor localhost.
Change-Id: I40514c0a7c43ae01c7680aac81123942be1cdef9
xtrabackup doesnt work with mariadb 10.3,
need to be changed to mariadb-backup tool.
For now only migrate galera, not kolla-backup tool
to fix the CI.
https://jira.mariadb.org/browse/MDEV-15774
Change-Id: Ie77ae41e419873feed4b036a307887b22455183b
Depends-On: Icefe3a77fb12d57c869521000d458e3f58435374
blueprint database-backup-recovery
Introduce a new option, mariadb_backup, which takes a backup of all
databases hosted in MariaDB.
Backups are performed using XtraBackup, the output of which is saved to
a dedicated Docker volume on the target host (which defaults to the
first node in the MariaDB cluster).
It supports either full (the default) or incremental backups.
Change-Id: Ied224c0d19b8734aa72092aaddd530155999dbc3
As reported in the bug, these can grow to 10s to 100s of GB
in a month. To reduce the chance of filling the disk and
bringing down the control plane this change defines
an expiry time.
Closes-Bug: 1720113
Change-Id: I508aad1f515d5108a3d08c90318b70d0a918908c
Current debian stretch use mariadb 10.1.26 which integrates
a backup tool call 'MariaDB Backup' [1]. It is based on
Percona XtraBackup and support full backup capability for
MariaDB Server that includes encrypted and compressed data.
This patch also fixs muti-node deployment failed on Debian
aarch64. Percona's repo has no XtraBackup package for Debian
aarch64. In such case we can use MariaDB builtin backup tool
'MariaDB Backup'.
[1] https://mariadb.com/kb/en/library/mariadb-backup-overview/
Change-Id: I7271d3f93b41d4839670a2c4a358744333411cd7
Kolla assumes mariadb 10.0 where galera stuff was always enabled. And
this works for CentOS and Ubuntu as they use 10.0 on x86-64.
Debian uses mariadb 10.1 by default (and CentOS on !x86-64 and
Ubuntu/aarch64) where galera stuff is disabled by default.
Closes-Bug: 1740060
Co-Authored-By: Xinliang Liu <xinliang.liu@linaro.org>
Change-Id: I8374ac2219ad7880970cd789727d01af7cac1077
kolla-kubernetes is using its own configuration generation[0], so it is
time for kolla-ansible to remove the related code to simplify the
logical.
[0] https://github.com/openstack/kolla-kubernetes/tree/master/ansible
Change-Id: I7bb0b7fe3b8eea906613e936d5e9d19f4f2e80bb
Implements: blueprint clean-k8s-config
Now, I see mariadb are using utf8_general_ci as a default collation.
- https://mariadb.com/kb/en/mariadb/supported-character-sets-and-collations/
This mean all of Devstack database will be created with utf8_general_ci collation,
so may be, one service/project can be deployed successfully via Devstack
but will be fail with Kolla deployment.
Therefore, we should use above default collation for Kolla-ansible.
Change-Id: Icbb6c15f536fc6986816c58f4fd68bfb95813e46
Closes-Bug: 1680783
The binary log contains a record of all changes to the databases, both
data and structure. It consists of a set of binary log files and an
index.
Activate it in MariaDB will fix Telegraf error on Mysql/MariaDB inputs
retrieval.
Change-Id: I040ed75ffbf1afded87ba0f8f63a3e384707d1fb
Closes-Bug: #1673969
It's good if k8s reuses ansible templates, but we need to abstract all
ansible specific variables to achieve that.
- Implements ansible override variable api_interface_address.
- Adds api_interface_address setting and comments to globals.yml
- Makes changes to mariadb templates to accept this new setting.
- Disabled Galera when api_interface_address==0.0.0.0 in the
case of Kubernetes. Otherwise, mariadb fails to start.
- Tested with and without setting to ensure kolla genconfig output
does not change when setting is disabled or undefined.
Change-Id: Ia0e4951c327be01b717aebb86ef4c3a4e7ed170e
Partially-implements: blueprint api-interface-bind-address-override
Co-authored-by: David Wang <dcwangmit01@gmail.com>
Co-authored-by: Ryan Hallisey <rhallise@redhat.com>
Co-authored-by: Kevin Fox <kevin@efox.cc>
I just thought I'd dash off a quick patch and change the underscore
to a dash.
Change-Id: Ib34cfc8039de01be7e37176648482f9815ac3848
Closes-Bug: #1589734
This patch enable wsrep_notify_cmd to rename haproxy user in haproxy_blocked
when the node is not ready to serve and restore it when ready.
Change-Id: I4f49960d7ff2fa689d6ea730b2574f16f083edc1
Closes-Bug: 1578752
Closes-Bug: 1587752
Added general_log to ansible/roles/mariadb/templates/galera.cnf.j2
to improve mariadb logging.
This will be helpful to debug mariadb issues especially when
mariadb is scaled.
Test results of this patch set are at:
http://paste.openstack.org/show/492852/
Change-Id: I80438d1bbdd1ed2a1f47489c6f9c45b8107340a0
Closes-Bug: #1563668
Scale limit testing on a 64 node cluster with 13 TB RAM and 2600
cores showed that atleast 1800 database connections are required
to appropriately start 2000 virtual machines simulateounsly. Other
documentation on the internet recommends larger values such as 8000,
so we set to a larger value that should be able to handle a maxed out
512GB ram per compute node cluster with all services enabled.
Change-Id: I8767cf3fb04e066cc22e796c647e944b4e4a1742
Closes-Bug: #1564275
The lightsout recover patch broke multinode mysql. Also the lightsout
recovery didnt probably pass the --wsrep-new-cluster flag. This
updates the mariadb bootstrap to work with multinode again.
Closes-Bug: #1559480
Related-Id: I903c3bcd069af39814bcabcef37684b1f043391f
Change-Id: I1ec91a8b2144930ea8f04cc1c201b53712352e4e
There is no reason to have a hostname-unique pidfile in the container
as we currently have. This posed problems with kolla-mesos reusing
the same script. Since there is no reason for this pidfile to be
configurable in path _at_ _all_, we hardcode the path.
Additionally, we adjust the file perm change to only update the perms
on the folder if it is not already properly set.
This also incorperates a kolla-ansible file in the bootstrap process
which follows our other container techniques of using the idempotent
creation of a volume in the bootstrap process (see nova)
TrivialFix
Related-Bug: #1538136
Change-Id: I2380529fc7146a9603145cdc31e649cb8841f7dd
Currently, there are arbitrary wait for mariadb service startup.
However, this leads to nondeterministic results in the current
workflow. This patch tries to make the workflow more deterministic.
Change-Id: I3c6245cce93c7ff0d3d57cb2ae065a1ed1487769
Closes-Bug: #1491782
In heterogeneous environment, api_interfaces are different each other.
So we should specify it from hostvars.
Implements: bp configure-network-interface
Change-Id: Id15d70bfb9ebb62a64a3847a6b77407efb171dbe
A previous commit [1] broke Galera on Ubuntu while trying to fix
Centos. This fixes the underlying different between the two distros
and should help prevent that kind of mistake in the future.
[1] I523d1989575dbe24a891fcae3b6bf56d83e69615
Change-Id: Ie3e47f10cb669f36f8d2f166c88555931a54e3ec
Backport: Liberty
Closes-Bug: #1509281
Xtrabackup previously used a hardcoded datadir. In the latest
update to xtrabackup, the my.cnf config option datadir is now
parsed. This variable was unset, causing galera to implode.
backport: liberty
Change-Id: I523d1989575dbe24a891fcae3b6bf56d83e69615
Closes-Bug: #1509281
This brings Kolla images inline with FHS and should make finding
locations of things more consistent and reliable with the linux world
at large.
Change-Id: Iece5b4da4bace0fb8b1f41a65ab2c852ec73e6f8
Closes-Bug: #1485742
The default incoming database connection count is limited to 151
connections. These are not sufficient in a 100 node 3 controller
deployment to launch several heat stacks simulatenously.
Data measured on bare metal shows that 250 connections are needed
to launch one 25 VM heat stack with 3 controller nodes with 209
tasks (default enablement).
Change-Id: If60b540010d6d173a393fa91fa30cb3ba572cfc0
Closes-Bug: #1492719
In MariaDB we adjust the dependecies of what we install there as well
adding only what is appropriate for the install.
In Ansible we adjust some templates to all work around differences
between the different Linux families.
Change-Id: Ibc26e2f4d4a732630632d3ed27fb595b6fe019d2
Partially-Implements: blueprint install-from-ubuntu
Cleanup all options in galera.cnf. Bind to all interfaces and ports
appropriately.
Change-Id: I516613d09673ba61aadda2c7bbb4abbbe4ea47ac
Partially-Implements: blueprint update-configs
Closes-Bug: #1478330
The original purpose for having an abstract like 'database' rather than
the service name of 'mariadb' has been change. Our direction is different
and this patch reflects consistent naming throughout
Change-Id: I704896191cc5243f9dab2a4cca9120e9dc2ceb2c
Closes-Bug: #1478328