This change must enable postgresql-exporter to push additional metrics
(like replication_lag) which are derived using a SQL query against Postgres DB.
(Co-Author: Steven Fitzpatrick)
Change-Id: I78dc433a3782b48155ab293cb5afe90b3bc0ef1f
The mariadb container launches two threads in addition to the mysql
daemon, one to mantain a configmap containing the Galera Cluster state,
and the other to handle leader elections. These threads die if they
suffer any exceptions talking to the kubernetes apiserver. This can
happen sometimes, e.g. when a k8s control node reboots.
This change logs and ignores the kubernetes.client.rest.ApiException,
allowing the threads to retry and hopefully succeed once the k8s api
becomes available.
Change-Id: I5745a763bb07f719d83a41c1f27be2b76ce998e9
This change updated the script used by zuul to check elasticsearch
deployment so that the curator will be ran during the timeframe of the
check, verifying the compatibility of the ES and Curator versions being
used.
Change-Id: I309530d71061fbb42c80e133948a0e0c3cf1927e
This change converts alert expressions which relied on instant vectors
to use range aggregate functions instead - For just the 'basic_linux'
rules.
Change-Id: I30d6ab71d747b297f522bbeb12b8f4dbfce1eefe
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
This change updates the prometheus alerting rules to use ranged vectors
in their expressions, to avoid situations wher missed scrapes would
cause scalar metrics to "go stale" - resetting the alert timer.
Only the ceph alerts are affected by this change.
Change-Id: Ib47866d12616aaa808e6a09c58aa4352e338a152
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
This change converts alert expressions which relied on instant vectors
to use range aggregate functions instead.
Change-Id: I4df757f961524bed23b6a6ad361779c1749ca2c5
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
This adds a new check to make sure msgr2 is enabled if it is
supported by all of the mons. When mon quorum is lost the
mons revert to the v1 protocol, which results in a Ceph
warning state if v2 is supported by all of the available
mons.
Change-Id: Ib85243d38f122c1993aba945b7ae943eed262dbf
Currently postgresql database backup job will fail due to not having
correct permissions on the mounted PVC. This patchset corrects the
permissions on the PVC mount so that the backup pods can write to the
/var/backup directory structure.
Another problem was that pg_dumpall was not able to get the correct
password from the admin_user.conf. This may be due to the extra lines
in the file, so this patchset reads it differently in order to find
the password. This was a change to the backup and restore scripts.
Also there are a number of small corrections made to the error handling
for both backup and restore scripts, to be consistent with the MariaDB
backup/restore scripts.
Change-Id: Ica361764c591099e16d03a0988f73c6976583ceb
This patch set adds in needed override to support OpenStack Train
release by moving the libvirt version to > 3.0.0.
Change-Id: I36097544024df5c6dfc87a032bd8383be98f1a3a
Signed-off-by: Tin Lam <tin@irrational.io>
This patchset fixes a serious database restoration problem where the
user is trying to restore a single database, but in the process of
restoring the database, the script inadvertently also removes all
tables from the other databases.
The root cause was that the mysql "--one-database" restore option
achieves the single database restoration, but somehow corrupts the
other databases. The new approach taken in this patchset is to
create a temporary database user which only has permission to
restore the chosen database, and that will leave the other databases
unharmed. This approach, which can be applied for restoring
individual databases and even database tables, was recommended in (1).
After the database is restored, the temporary user is deleted.
(1) https://mariadb.com/kb/en/restoring-data-from-dump-files/
Also improved some of the error handling as well.
Change-Id: I805c605ed2b424640ad6a0a379b1c0b9c0004e94
This change addresses the results that were found when running
bandit against the templated python files in the various charts.
This also makes the bandit gate only run when python template
files are changed as well as makes the job voting.
Change-Id: Ia158f5f9d6d791872568dafe8bce69575fece5aa
The output of 'ceph pg ls-by-pool' changed format in Nautilus,
which caused the checkPGs.py script to fail in some scenarios.
This change addresses that format change and fixes Nautilus
compatibility in the script. Mimic compatibility is maintained.
Change-Id: I11d8337b548f959d0a4b58b7e8f76720a0371e73
This patch set provides a way to specify clean up scripts for rally tests
to clean up orphaned resources in the event of rally test failures.
Change-Id: Ifc988002711d34186975988abb33ecd8a9a2fba4
Signed-off-by: Tin Lam <tin@irrational.io>
Sometimes jobs fail, the default of 6 retries is far too brief to get
logs (which are purged after the final failure); as we need the jobs
to succeed always, having a much higher default here seems prudent.
Change-Id: I7f20a3eb9a98669ae4af657d36a776830b82dfca
This is to fix the logic to find osd id for wal lvm and also
to find correct lvm device for osd disk.
Change-Id: Id4ee1dbd5c82dcbe9893f81c3ad3b9e18d1f9509
This is to fix the logic to use osd device name instaed of whole disk path
while osd initilizing.
also correct the ceph osd ls command to use correct keyring.
Change-Id: I90f0c3fd5d1e1b835326b1c690582990f7ca15cb
This is to wait for all the osd devices before initializing and also
to add few more checks to make sure disk is used or not .
Change-Id: I68e1d4c8c1ade39f856c69333585dfcba3ea35ab
This commit adds an audit user to the postgresql database which
will have only SELECT privileges on the postgresql database tables.
This is accomplished by setting up audit user creation parameters
in the Patroni bootstrap environment settings, according to (1).
(1) https://patroni.readthedocs.io/en/latest/ENVIRONMENT.html
Change-Id: Idf1cd90b5d093f12fa4a3c5c794d4b5bbc6c8831
In this PS we explicitly define the admin user rather than letting
patroni use the default username and password.
Change-Id: I9885314902c3a60e709f96e2850a719ff9586b3d
This patch introduces new cluster status "reboot"
which is set by leader node hence other nodes will
start mysql without "--wsrep-new-cluster" option.
Before this following situation took place:
All pods go down one by one with some offset;
First and second nodes have max seqno;
The script on the first node detects there are no active
backends and starts timeout loop;
The script on the second node detects there are no active
backends and starts timeout loop (with approx. 20 sec offset
from first node) ;
Timeout loop finishes on first node, it checks highest seqno
and lowest hostname and wins the ability to start cluster.
Mysql is started with “--wsrep-new-cluster” parameter.
Seqno is set to “-1” for this node after mysql startup;
Periodic job syncs values from grastate file to configmap;
Timeout loop finishes on second node. It checks node with
highest seqno and lowest hostname and since seqno is already
“-1” for first node, the second node decides that it should
lead the cluster startup and executes mysql with “--wsrep-new-cluster”
option as well which leads to split brain
Change-Id: Ic63fd916289cb05411544cb33d5fdeed1352b380