This patch fixes 2 problems with MariaDB backup:
1) If a user with grants to a database has a hyphenated name, the backup
script errors out and the grants for this user won't be saved in the backup.
2) While restoring databases from a backup, if connections are allowed
during the restore operation, there is potential for deadlock. Table
level locks are added to the backup sql file in order to try to prevent
these deadlock situations.
Change-Id: If612e7b9f3f4d75fc67018eea17609f07a0c0b0f
In the scenario where grastate values cannot be found, we will set the
configmap to 'None' and log a warning.. This should also prevent a possible
type incompatibility issue in error scenario.
Change-Id: I0fb08b329a3fb05c65bead5781c84a592ae4c263
Signed-off-by: Tin Lam <tin@irrational.io>
subprocess.Popen() returns byte object by defect which has issue with
operations that treats thing as str. This ensure Popen() encodes the
return as utf-8 before we do anything.
Change-Id: I321771f69cfcb492be1308c61313a0598b1e766a
Signed-off-by: Tin Lam <tin@irrational.io>
This patch set handles an unexpected IndexError stacktrace when the
galera cluster's data file does not return with an expected key with a
colon (:) in the string.
Change-Id: I4f58e97753a0f68468a02b98676e031176145e44
Signed-off-by: Tin Lam <tin@irrational.io>
There are scenarios where the wsrep_rec_pos variable is being returned
without it being first initialized when the .communicate() method
returns a blank. This patchset sets up a default initialization, so
the readiness check does not error out with an exception.
Change-Id: Ifea922f446bf3cbc9220f39a41dffc2763e6a5f3
Signed-off-by: Tin Lam <tin@irrational.io>
Move to updated MariaDB version 10.2.31. Tweak start.py for python3
as /usr/bin/python doesn't exist and isn't robust.
Change-Id: Ib64ed5de34e3ff87c634d09f98aaddeb374d2bd6
Each MariaDB instance updates the grastate configmap on a periodic
basis, every 10s by default. Collisions can occur when multiple
instances try to write their state at the same time (within a few
milliseconds). One instance will write successfully, and the other will
get a 409 error. There is nothing to break the synchronization, so the
failures tend to be persistent.
This change adds a small sleep after a collision is encountered,
creating an offset between the cycles.
Change-Id: Ib8a64f8f7ee15a6579e901d80ae759c38e0e901e
The mariadb container launches two threads in addition to the mysql
daemon, one to mantain a configmap containing the Galera Cluster state,
and the other to handle leader elections. These threads die if they
suffer any exceptions talking to the kubernetes apiserver. This can
happen sometimes, e.g. when a k8s control node reboots.
This change logs and ignores the kubernetes.client.rest.ApiException,
allowing the threads to retry and hopefully succeed once the k8s api
becomes available.
Change-Id: I5745a763bb07f719d83a41c1f27be2b76ce998e9
This patchset fixes a serious database restoration problem where the
user is trying to restore a single database, but in the process of
restoring the database, the script inadvertently also removes all
tables from the other databases.
The root cause was that the mysql "--one-database" restore option
achieves the single database restoration, but somehow corrupts the
other databases. The new approach taken in this patchset is to
create a temporary database user which only has permission to
restore the chosen database, and that will leave the other databases
unharmed. This approach, which can be applied for restoring
individual databases and even database tables, was recommended in (1).
After the database is restored, the temporary user is deleted.
(1) https://mariadb.com/kb/en/restoring-data-from-dump-files/
Also improved some of the error handling as well.
Change-Id: I805c605ed2b424640ad6a0a379b1c0b9c0004e94
This change addresses the results that were found when running
bandit against the templated python files in the various charts.
This also makes the bandit gate only run when python template
files are changed as well as makes the job voting.
Change-Id: Ia158f5f9d6d791872568dafe8bce69575fece5aa
This patch introduces new cluster status "reboot"
which is set by leader node hence other nodes will
start mysql without "--wsrep-new-cluster" option.
Before this following situation took place:
All pods go down one by one with some offset;
First and second nodes have max seqno;
The script on the first node detects there are no active
backends and starts timeout loop;
The script on the second node detects there are no active
backends and starts timeout loop (with approx. 20 sec offset
from first node) ;
Timeout loop finishes on first node, it checks highest seqno
and lowest hostname and wins the ability to start cluster.
Mysql is started with “--wsrep-new-cluster” parameter.
Seqno is set to “-1” for this node after mysql startup;
Periodic job syncs values from grastate file to configmap;
Timeout loop finishes on second node. It checks node with
highest seqno and lowest hostname and since seqno is already
“-1” for first node, the second node decides that it should
lead the cluster startup and executes mysql with “--wsrep-new-cluster”
option as well which leads to split brain
Change-Id: Ic63fd916289cb05411544cb33d5fdeed1352b380
An audit user is added to Mariadb with only the SELECT permission
to mysql database user table for database user audit purposes.
Change-Id: I5d046dd263e0994fea66e69359931b7dba4a766c
This change adds a means of introducing new storage classes
and local persistent volumes.
Change-Id: I340c75f3d0a1678f3149f3cf62e4ab104823cc49
Co-Authored-By: Steven Fitzpatrick <steven.fitzpatrick@att.com>
This patch set updates and tests the apiVersion for rbac.authorization.k8s.io
from v1beta1 to v1 in preparation for its removal in k8s 1.20.
Change-Id: I4e68db1f75ff72eee55ecec93bd59c68c179c627
Signed-off-by: Tin Lam <tin@irrational.io>
Currently using envsubst to perform substitution of value overrides in
the feature gate caused conflicts as gotpl gets templated into those
overrides. This adds in '%%%REPLACE_${var}%%%' and uses sed to perform
the substitution instead to address the issue.
Change-Id: I9d3d630b53a2f3d828866229a5072bb04440ae15
Signed-off-by: Tin Lam <tin@irrational.io>
This patch set places logic to generate kubernetes egress network policy
rule based on the dependencies specified in values.yaml. This also sets
up the necessary default network policy for the OSH gate.
Change-Id: I1ac649cc9debb5d1f4ea0a32f506dcda4d8b8536
Signed-off-by: Tin Lam <tin@irrational.io>
This updates charts that consume images built from osh-images to
use tags other than the :latest tags. This will be followed up
with the definition of jobs to allow for vetting out of updated
images, as reliance on :latest tags assumes any change merged into
osh-images will result in functionally correct behavior (which has
shown to not be the case traditionally)
Change-Id: I181aa56ed187604dc7583d8081e53cc69eb27310
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Currently when updating configuration for mariadb, ingress pods also
are being restarted, however there were no reasons for this.
Change-Id: I398e20541a0e2337e9a5d100f3ef6ce4ad7d0284
It was observed that sometimes during
galera ckuster restart the node with highest
seqno is determined incorrecly. After investigation
it was found that max function is invoked on the
list of string values which can lead to incorrect results.
This patch performs casting the value to integer before building
list of seqnos hence max function will return correct result
Change-Id: I604ec837f3f2d157c829ab43a44e561879775c77
This updates the kubernetes-entrypoint image reference to consume
the publicly available kubernetes-entrypoint image that is built
and maintained under the airshipit namespace, as the stackanetes
image is no longer actively maintained
Change-Id: I5bfdc156ae228ab16da57569ac6b05a9a125cb6a
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This patch set fixes the calculation of how long a database backup
has be taken. In the existing code, the time difference is rounded
to days, even a second less than 4 days will be rounded to 3 days.
This effectively allows archives to be kept for one additional day.
The new calculation and comparison is based on seconds.
Change-Id: I5547e923538ddb83f409b1e7df936baf664e717a
This updates the kubeadm and minikube Kubernetes deployments to
deploy version 1.16.2
Change-Id: I324f9665a24c9383c59376fb77cdb853facd0f18
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This patch set is one of many to migrate existing code/script to be
python-3 compatible as python-2 is sunsetting in January 2020.
Change-Id: I4a8fa4c07fd36583716b5ccfdcb0bcdc008db3e7
Signed-off-by: Tin Lam <tin@irrational.io>
In Python 3, sys.maxint is removed per [0]. This patch set replaces
sys.maxint with sys.maxsize.
[0] https://docs.python.org/3.1/whatsnew/3.0.html#integers
Change-Id: I267fa6700558b69d3e646838b933e3289067a621
Signed-off-by: Tin Lam <tin@irrational.io>
This PS exposes the ability to adjust the readiness check
probe params.
Change-Id: Ic4730ef1d07f5cdf4b6fae5bb1331d788ea84e2e
Signed-off-by: Pete Birley <pete@port.direct>
This patch adds functionality to check
current version of mysql_exporter binary and to modify
configuration flags depending on version
Change-Id: Ic1f42fbf5c99203d6e2fca4fc345632b64e5dc0a
This PS updates the mariadb ingress error page server to run as the
nobody user.
Change-Id: I13756ba79e8c7b857e0807447192e06b11762abf
Signed-off-by: Pete Birley <pete@port.direct>
This change adds network policy overrides for multiple infra
services for the openstack-helm network policy gate.
Change-Id: If051ec1749cb9ed1e289f0cf82a8876371e36531
This PS moves to drive all mariadb config via the values fed
to the chart.
Change-Id: I4ed3624737af4d5c90b1b5de451a0a0b75a5eda1
Signed-off-by: Pete Birley <pete@port.direct>
This PS updates the wsrep_provider_options to define the timeouts
explitlcitly for evs.suspect_timeout, gmcast.peer_timeout. Their
defaults are PT5S, and PT3S respectively, which are increased by
a factor of approx 5, to accomdate network instability that may
occur during node outage events.
Change-Id: Ie5cdd06d91299e5e2632b70cb9b50a7ad14f62b1
Signed-off-by: Pete Birley <pete@port.direct>
This PS cleans up the container dir entirely on container restart,
as sometimes remnets of previous runs can cause issues.
Change-Id: I873667a8a57bca6096cbe777ee83ef8648a368d4
Signed-off-by: Pete Birley <pete@port.direct>
This PS sets `--enable-ssl-chain-completion=false` for the MariaDB
ingress controller. This is the default for current versions of
the nginx-ingress-controller, but for 0.9.0 needs to be set.
If enableSSLChainCompletion is left on, nginx will attempt to
autocomplete SSL certificate chains with missing intermediate CA
certificates, causing unnecessary network and errors in pod logs.
Change-Id: I088b33fe994281dca6997baa87a6b599c3f10c14
Closes-Bug: #1835364