2330 Commits

Author SHA1 Message Date
Zuul
01aa16620b Merge "Prometheus: Status Alerts Scalar/Vector Conversion" 2020-02-18 17:35:43 +00:00
Zuul
57ad8ad603 Merge "Prometheus: Ceph Alerts Scalar/Vector Conversion" 2020-02-18 17:35:42 +00:00
Zuul
3c7a9de243 Merge "Prometheus: Node Alerts Scalar/Vector Conversion" 2020-02-18 17:29:48 +00:00
Zuul
3781948505 Merge "Check Elasticsearch and Curator Compatibility" 2020-02-18 17:27:48 +00:00
Evgeny L
749e2be9f5 Add liveness and readiness probes for RabbitMQ exporter
Allow to configure liveness and readiness probes for
RabbitMQ exporter.

Change-Id: I80748276d20f688659c4ea2752c1941f9cfcaac4
2020-02-18 16:33:28 +00:00
diwakar thyagaraj
17592f54ae Enable Docker default Apparmor for all Prometheus Containers
Change-Id: I97fc39e52b36fc0be84abd049fdbce1e7026107d
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-02-18 14:46:09 +00:00
KHIYANI, RAHUL (rk0850)
2712f54117 Add Docker default AppArmor profile to mariadb exporter chart
Change-Id: I6d5fcbb511f4f9cdb31727421fe320beeff1a882
2020-02-18 04:49:44 +00:00
Radhika Pai
c884ec439b Postgresql_exporter: Adding queries.yaml file
This change must enable postgresql-exporter to push additional metrics
(like replication_lag) which are derived using a SQL query against Postgres DB.

(Co-Author: Steven Fitzpatrick)

Change-Id: I78dc433a3782b48155ab293cb5afe90b3bc0ef1f
2020-02-17 19:26:29 -06:00
dt241s@att.com
f633555f16 Enable Docker default Apparmor for Postgresql and prometheus-postgresql.
Change-Id: I013ca5f99e5032c44f0d679e467da9e928c02a6b
2020-02-17 23:01:06 +00:00
Zuul
88b419d133 Merge "[FIX] Add apparmor to prometheus." 2020-02-17 17:53:53 +00:00
Phil Sphicas
b482b57e6e mariadb: avoid state management thread death
The mariadb container launches two threads in addition to the mysql
daemon, one to mantain a configmap containing the Galera Cluster state,
and the other to handle leader elections. These threads die if they
suffer any exceptions talking to the kubernetes apiserver. This can
happen sometimes, e.g. when a k8s control node reboots.

This change logs and ignores the kubernetes.client.rest.ApiException,
allowing the threads to retry and hopefully succeed once the k8s api
becomes available.

Change-Id: I5745a763bb07f719d83a41c1f27be2b76ce998e9
2020-02-17 01:13:37 -08:00
dt241s@att.com
8bd4a2624a [FIX] Add apparmor to prometheus.
This also fixes Elasticsearch apparmor Jobs.

Change-Id: I8f2a9aa12beffe3ca394a2e9dd00aba7e5292f29
2020-02-14 23:13:38 +00:00
Zuul
26982ca705 Merge "Add Docker default AppArmor profile to memcached chart" 2020-02-14 01:13:53 +00:00
KHIYANI, RAHUL (rk0850)
ae41873341 Add Docker default AppArmor profile to ingress chart
Change-Id: Id4fee2008fd7544ccbf865084949c767013ca3fa
2020-02-13 22:41:37 +00:00
KHIYANI, RAHUL (rk0850)
cce2e61c16 Add Docker default AppArmor profile to memcached chart
Adding apparmor profile to memcached and memcached-exporter charts

Change-Id: I40ece825d75b6884714b9121d8d501efcbce2f53
2020-02-13 10:51:15 -06:00
Zuul
f17b6de1a2 Merge "Add Docker default AppArmor profile to mariadb" 2020-02-12 20:15:18 +00:00
KHIYANI, RAHUL (rk0850)
483d6f0047 Add Docker default AppArmor profile to mariadb
Change-Id: I256f169d6ff2de71b7218ab522bac9975d971c41
2020-02-12 10:32:22 -06:00
Zuul
b21fdfabad Merge "Fix MariaDB Single Database Restore" 2020-02-11 22:22:06 +00:00
Zuul
69fabcc1e4 Merge "[Ceph-Mon] Check for ceph-mon messenger V2" 2020-02-11 22:21:55 +00:00
Zuul
7fa99287a1 Merge "[ceph-client] Enable Nautilus PG autoscaler for all ceph pools" 2020-02-11 22:18:47 +00:00
Zuul
4a770bb908 Merge "Fix postgresql database backup issue" 2020-02-11 21:11:45 +00:00
Steven Fitzpatrick
31d0161a39 Check Elasticsearch and Curator Compatibility
This change updated the script used by zuul to check elasticsearch
deployment so that the curator will be ran during the timeframe of the
check, verifying the compatibility of the ES and Curator versions being
used.

Change-Id: I309530d71061fbb42c80e133948a0e0c3cf1927e
2020-02-11 15:17:06 +00:00
Steven Fitzpatrick
a41262e459 Prometheus: Node Alerts Scalar/Vector Conversion
This change converts alert expressions which relied on instant vectors
to use range aggregate functions instead - For just the 'basic_linux'
rules.

Change-Id: I30d6ab71d747b297f522bbeb12b8f4dbfce1eefe
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
2020-02-11 15:14:40 +00:00
Steven Fitzpatrick
f37865d6a0 Prometheus: Ceph Alerts Scalar/Vector Conversion
This change updates the prometheus alerting rules to use ranged vectors
in their expressions, to avoid situations wher missed scrapes would
cause scalar metrics to "go stale" - resetting the alert timer.

Only the ceph alerts are affected by this change.

Change-Id: Ib47866d12616aaa808e6a09c58aa4352e338a152
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
2020-02-11 15:14:35 +00:00
Steven Fitzpatrick
d408bed90d Prometheus: Status Alerts Scalar/Vector Conversion
This change converts alert expressions which relied on instant vectors
to use range aggregate functions instead.

Change-Id: I4df757f961524bed23b6a6ad361779c1749ca2c5
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
2020-02-11 15:14:27 +00:00
Chinasubbareddy Mallavarapu
622f604cbe [Ceph-Mon] Check for ceph-mon messenger V2
This adds a new check to make sure msgr2 is enabled if it is
supported by all of the mons. When mon quorum is lost the
mons revert to the v1 protocol, which results in a Ceph
warning state if v2 is supported by all of the available
mons.

Change-Id: Ib85243d38f122c1993aba945b7ae943eed262dbf
2020-02-10 16:43:19 -06:00
Cliff Parsons
c18ee59aff Fix postgresql database backup issue
Currently postgresql database backup job will fail due to not having
correct permissions on the mounted PVC. This patchset corrects the
permissions on the PVC mount so that the backup pods can write to the
/var/backup directory structure.

Another problem was that pg_dumpall was not able to get the correct
password from the admin_user.conf. This may be due to the extra lines
in the file, so this patchset reads it differently in order to find
the password. This was a change to the backup and restore scripts.

Also there are a number of small corrections made to the error handling
for both backup and restore scripts, to be consistent with the MariaDB
backup/restore scripts.

Change-Id: Ica361764c591099e16d03a0988f73c6976583ceb
2020-02-10 17:38:10 +00:00
Brian Wickersham
41924e1618 [ceph-client] Enable Nautilus PG autoscaler for all ceph pools
enabling pg autoscaler across all pools will ensure pg_num is
automatically adjusted.

https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/

Change-Id: Ic2f635700a32c0b7e8c67ed9571efa520638474c
2020-02-07 21:38:24 +00:00
Tin Lam
aa48b16896 Add train release support
This patch set adds in needed override to support OpenStack Train
release by moving the libvirt version to > 3.0.0.

Change-Id: I36097544024df5c6dfc87a032bd8383be98f1a3a
Signed-off-by: Tin Lam <tin@irrational.io>
2020-02-07 08:56:21 -06:00
Parsons, Cliff (cp769u)
ef9d8392f2 Fix MariaDB Single Database Restore
This patchset fixes a serious database restoration problem where the
user is trying to restore a single database, but in the process of
restoring the database, the script inadvertently also removes all
tables from the other databases.

The root cause was that the mysql "--one-database" restore option
achieves the single database restoration, but somehow corrupts the
other databases. The new approach taken in this patchset is to
create a temporary database user which only has permission to
restore the chosen database, and that will leave the other databases
unharmed. This approach, which can be applied for restoring
individual databases and even database tables, was recommended in (1).
After the database is restored, the temporary user is deleted.

(1) https://mariadb.com/kb/en/restoring-data-from-dump-files/

Also improved some of the error handling as well.

Change-Id: I805c605ed2b424640ad6a0a379b1c0b9c0004e94
2020-02-06 16:17:28 +00:00
Gage Hugo
86e56b2aee Address bandit gate failures
This change addresses the results that were found when running
bandit against the templated python files in the various charts.

This also makes the bandit gate only run when python template
files are changed as well as makes the job voting.

Change-Id: Ia158f5f9d6d791872568dafe8bce69575fece5aa
2020-02-04 15:33:17 -06:00
Taylor, Stephen (st053q)
92dfac645a [Ceph Nautilus] Fix _checkPGs.py.tpl for Nautilus compatibility
The output of 'ceph pg ls-by-pool' changed format in Nautilus,
which caused the checkPGs.py script to fail in some scenarios.
This change addresses that format change and fixes Nautilus
compatibility in the script. Mimic compatibility is maintained.

Change-Id: I11d8337b548f959d0a4b58b7e8f76720a0371e73
2020-02-04 08:23:17 -07:00
Zuul
3dd0eb0cdf Merge "Fluentd: Update kernel and auth inputs to use systemd" 2020-01-31 22:20:22 +00:00
Zuul
26208fa3a7 Merge "Add ability to add rally cleanup script" 2020-01-31 03:29:01 +00:00
Zuul
210a5187af Merge "Prevent splitbrain during full Galera restart" 2020-01-31 03:29:00 +00:00
Tin Lam
a6b1bd293d Add ability to add rally cleanup script
This patch set provides a way to specify clean up scripts for rally tests
to clean up orphaned resources in the event of rally test failures.

Change-Id: Ifc988002711d34186975988abb33ecd8a9a2fba4
Signed-off-by: Tin Lam <tin@irrational.io>
2020-01-30 21:44:49 +00:00
Chris Wedgwood
578511cd39 [htk] Increase job default backoffLimit to 1000
Sometimes jobs fail, the default of 6 retries is far too brief to get
logs (which are purged after the final failure); as we need the jobs
to succeed always, having a much higher default here seems prudent.

Change-Id: I7f20a3eb9a98669ae4af657d36a776830b82dfca
2020-01-30 19:52:54 +00:00
Chinasubbareddy Mallavarapu
eacf937221 [ceph-osd] Fix issues with ceph osd init sript
This is to fix the logic to find osd id for wal lvm and also
to find correct lvm device for osd disk.

Change-Id: Id4ee1dbd5c82dcbe9893f81c3ad3b9e18d1f9509
2020-01-30 09:35:41 +00:00
Chinasubbareddy Mallavarapu
63e43d98b7 [ceph-osd] Fix to check osd disk name instead of disk path
This is to fix the logic to use osd device name instaed of whole disk path
while osd initilizing.
also correct the ceph osd ls command to use correct keyring.

Change-Id: I90f0c3fd5d1e1b835326b1c690582990f7ca15cb
2020-01-29 21:31:22 -06:00
Zuul
792b016677 Merge "[ceph-osd] Wait for devices to initialize the osd" 2020-01-29 23:06:17 +00:00
Chinasubbareddy Mallavarapu
9a18198fca [ceph-osd] Wait for devices to initialize the osd
This is to wait for all the osd devices before initializing and also
to add few more checks to make sure disk is used or not .

Change-Id: I68e1d4c8c1ade39f856c69333585dfcba3ea35ab
2020-01-29 14:33:51 -06:00
Huang, Sophie (sh879n)
d135e2c964 Update audit user access for Mariadb
The audit user is granted SELECT permission
for all Mariadb databases and tables.

Change-Id: I621325e4a9d27d3ab0d0bc30b4926ea0fa3fd17e
2020-01-29 18:11:45 +00:00
Zuul
376bd5c066 Merge "Add audit database user for audit purposes" 2020-01-28 23:03:45 +00:00
Koffi Nogbe
914ea2bd60 Add audit database user for audit purposes
This commit adds an audit user to the postgresql database which
will have only SELECT privileges on the postgresql database tables.
This is accomplished by setting up audit user creation parameters
in the Patroni bootstrap environment settings, according to (1).

(1) https://patroni.readthedocs.io/en/latest/ENVIRONMENT.html

Change-Id: Idf1cd90b5d093f12fa4a3c5c794d4b5bbc6c8831
2020-01-28 16:48:29 +00:00
Kabanov, Dmitrii
844d2cd16d [Ceph-rgw] Add bootstrap job
The PS adds bootstrap job for ceph-rgw chart.

Change-Id: I3055e1afe8072277166b8a659c940320720a0588
2020-01-28 01:49:57 +00:00
Zuul
de5dd82ff8 Merge "Update overrides used in apparmor nonvoting check" 2020-01-27 21:32:55 +00:00
Zuul
4572110bc3 Merge "[Ceph] Fix values.yaml" 2020-01-27 21:29:57 +00:00
Zuul
c228b0c454 Merge "[LDAP] Remove duplicate manifests: keys" 2020-01-27 17:58:20 +00:00
Doug Aaser
cf7b8dbb3d Add explicit admin user to Patroni
In this PS we explicitly define the admin user rather than letting
patroni use the default username and password.

Change-Id: I9885314902c3a60e709f96e2850a719ff9586b3d
2020-01-24 21:14:32 +00:00
Oleksii Grudev
b0bb8dfa7a Prevent splitbrain during full Galera restart
This patch introduces new cluster status "reboot"
which is set by leader node hence other nodes will
start mysql without "--wsrep-new-cluster" option.
Before this following situation took place:

All pods go down one by one with some offset;
First and second nodes have max seqno;
The script on the first node detects there are no active
backends and starts timeout loop;
The script on the second node detects there are no active
backends and starts timeout loop (with approx. 20 sec offset
from first node) ;
Timeout loop finishes on first node, it checks highest seqno
and lowest hostname  and wins the ability to start cluster.
Mysql is started with “--wsrep-new-cluster” parameter.
Seqno is set to “-1” for this node after mysql startup;
Periodic job syncs values from grastate file to configmap;
Timeout loop finishes on second node. It checks node with
highest seqno and lowest hostname and since seqno is already
“-1” for first node, the second node decides that it should
lead the cluster startup and executes mysql with “--wsrep-new-cluster”
option as well which leads to split brain

Change-Id: Ic63fd916289cb05411544cb33d5fdeed1352b380
2020-01-23 18:45:18 +02:00