2651 Commits

Author SHA1 Message Date
diwakar thyagaraj
a8d9477a56 [FIX] Fix Prometheus Job
Change-Id: Icc3eafccfd2f919858d35f5e1ebbc768705c3139
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-07-09 16:02:59 +00:00
Kabanov, Dmitrii
eecf56b8a9 [Ceph-client, ceph-osd] Update helm test
The PS updates helm test and replaces "expected_osds" variable
by the amount of OSDs available in the cluster (ceph-client).
Also the PS updates the logic of calculation of minimum amount of OSDs.

Change-Id: Ic8402d668d672f454f062bed369cac516ed1573e
2020-07-09 15:53:49 +00:00
diwakar thyagaraj
5f59695ad4 Enable apparmor to Ceph post-apply pods
Logs : https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d16/739849/5/experimental/openstack-helm-infra-apparmor/d167181/primary/objects/namespaced/ceph/pods/ceph-osd-post-apply-zr55t.yaml

Change-Id: Ic5d4fe83ad16a7fc551162275ee3aa34c543ec18
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-07-09 15:11:48 +00:00
Zuul
0d5aeaacbb Merge "[Ceph-osd] Avoid using lsblk/blkid." 2020-07-09 07:30:37 +00:00
Zuul
781cfcd195 Merge "[Ceph-OSD] Wait for pods before running "post-apply" job." 2020-07-08 23:44:41 +00:00
Zuul
664575e703 Merge "Add missing security context to elasticsearch pods/containers" 2020-07-08 23:20:09 +00:00
Zuul
28eb431c32 Merge "Fix ALLOW_UNAUTHENTICATED for bionic kubeadm-AIO" 2020-07-08 21:41:50 +00:00
Zuul
26c4c01ae5 Merge "allocate_data_node function improvement" 2020-07-08 21:41:48 +00:00
Zuul
b15d9a103b Merge "Add openstack-helm-single-16GB-node nodeset" 2020-07-08 21:11:08 +00:00
Andrii Ostapenko
a0ca4a3bb9
Fix ALLOW_UNAUTHENTICATED for bionic kubeadm-AIO
Change-Id: I6bf1f483999a10322362aa18bd43bc09cef7ffe9
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-08 14:12:34 -05:00
Kabanov, Dmitrii
4841f53ca6 [Ceph-osd] Avoid using lsblk/blkid.
The PS improves performance by replacing lsblk/blkid (In some cases blkid may be pretty slow).
Also it allows to avoid deadlocks when there are RBDs mapped on the host.

Change-Id: If607e168515f55478e9e55e421738d2d00269d3f
2020-07-08 19:02:54 +00:00
Kabanov, Dmitrii
ea038c5c85 [Ceph-OSD] Wait for pods before running "post-apply" job.
The PS updates "post-apply" job and adds execution of "wait_for_pods"
function as the first step of the job.

Change-Id: I98644981094cb4fb7cc348b80628006ab59cb77f
2020-07-08 19:02:38 +00:00
Zuul
c61fc590fb Merge "Node Exporter: Add rootfs mount argument" 2020-07-08 18:59:11 +00:00
Zuul
9e8d998500 Merge "Add missing security context to ceph-rgw and ceph-osd pods/containers" 2020-07-08 18:53:53 +00:00
Zuul
7dafa84dfe Merge "Add missing security context to promethues and postgresql pods/containers" 2020-07-08 18:14:35 +00:00
Zuul
5cabafbc74 Merge "Fix application name for grafana session sync" 2020-07-08 18:14:33 +00:00
Zuul
1a70211147 Merge "Add Apparmor for prometheus os exporter ks-user Job" 2020-07-08 14:18:26 +00:00
Andrii Ostapenko
0242d97437
Add openstack-helm-single-16GB-node nodeset
Based on 8 CPU 16GB memory ubuntu-bionic-expanded label

Change-Id: I1ef27858b5b02d367eea1c24447aefa2b6712458
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-08 07:18:20 -05:00
Zuul
b8ec250d01 Merge "MariaDB backup and restore with grants of all users" 2020-07-07 23:46:48 +00:00
KHIYANI, RAHUL (rk0850)
00a64aa807 Add missing security context to ceph-rgw and ceph-osd pods/containers
This updates the ceph-rgw and ceph-osd chart to include the pod
security context on the pod template.

This also adds the container security context to set
readOnlyRootFilesystem flag to true

Change-Id: I1b78b7a0fc413acdb5ea2dc295a0026616d7cac1
2020-07-07 18:08:58 -05:00
KHIYANI, RAHUL (rk0850)
a43f479e6c Fix application name for grafana session sync
Implement helm-toolkit snippet for grafana add-home-dashboard
which adds security context template at pod/container

Change-Id: I12a5fd6c5043079f830eb36043f5b0ca495a3e93
2020-07-07 16:50:41 -05:00
Huang, Sophie (sh879n)
a23a60921a MariaDB backup and restore with grants of all users
This patchset captures the grants of all the MariaDB users
in the backup tarball and restores the grants during the
all databases restore.
Percona tool pt-show-grants is installed to the image to
accomplish the task in this PS:
https://review.opendev.org/#/c/739149/

Change-Id: I26882956f96c961b6202b1004b8cf0faee6e73eb
2020-07-07 21:22:03 +00:00
KHIYANI, RAHUL (rk0850)
b400a6c41d Add missing security context to promethues and postgresql pods/containers
This updates the chart to include the pod security context
on the pod template.

This also adds the container security context to set
readOnlyRootFilesystem flag to true

Change-Id: Icb7a9de4d98bac1f0bcf6181b6e88695f4b09709
2020-07-07 21:20:36 +00:00
Andrii Ostapenko
41f02d3c98
Fix service account name for ceph-mon keyring generator
Fix issues introduced by https://review.opendev.org/#/c/735648
with extra 'ceph-' in service_account and security context not
rendered for keyring generator containers.

Change-Id: Ie53b3407dbd7345d37c92c60a04f3badf735f6a6
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-07 15:06:23 -05:00
willxz
e89c1c3c06 allocate_data_node function improvement
- Remove "if" condition of allocate_data_node
- Dealy 5 seconds for wait_to_join initial check to start
- Set 60 minutes timeout for wait_to_join function

Change-Id: Ie42af89551bd8804b87fe936c676e85130564187
2020-07-07 12:28:21 -04:00
Andrii Ostapenko
824f168efc Undo octal-values restriction together with corresponding code
Unrestrict octal values rule since benefits of file modes readability
exceed possible issues with yaml 1.2 adoption in future k8s versions.
These issues will be addressed when/if they occur.

Also ensure osh-infra is a required project for lint job, that matters
when running job against another project.

Change-Id: Ic5e327cf40c4b09c90738baff56419a6cef132da
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-07 15:42:53 +00:00
Zuul
88b79920db Merge "[rabbitmq] Upgrade to 3.7.26" 2020-07-07 02:57:14 +00:00
KHIYANI, RAHUL (rk0850)
eec5635f43 Add missing security context to elasticsearch pods/containers
This updates the elasticsearch chart to include the pod
security context on the pod template.

This also adds the container security context to set
readOnlyRootFilesystem flag to true

Change-Id: I8d1057f242b741fd297eca7475eb3bfb5e383f1c
2020-07-07 01:09:23 +00:00
diwakar thyagaraj
cc020bdfca Add Apparmor for prometheus os exporter ks-user Job
1) Updated docker image for heat to point to Stein and Bionic
 2) Enabled Apparmor Job for prometheus-openstack exporter.

Change-Id: I1ee8acb848ece3c334b087309d452d5137ea0798
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-07-07 00:23:18 +00:00
Steven Fitzpatrick
ddc35266c9 Node Exporter: Add rootfs mount argument
Change-Id: I0a144e2a05b9617d2cb46bcb56c746ca05743c1b
2020-07-06 12:43:59 -05:00
Andrii Ostapenko
2b4cf6a2d9 Completely switch to python3 for developers installation
This addresses an issue with using py2 as interpreter while
installing required dependencies with py3.

Also switch kubeadm-aio image to bionic.

Change-Id: I5a9e6678c45fad8288aa6971f57988b46001c665
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-03 09:47:43 +00:00
Zuul
03273bd61d Merge "Fix ara installation" 2020-07-02 19:25:03 +00:00
Zuul
5b9c79604a Merge "Fix developers kubeadm installation" 2020-07-02 19:17:55 +00:00
Chinasubbareddy Mallavarapu
bfe7a99a61 [CEPH] Make ceph-volume as default deployment tool
This is to make ceph-volume as default deployment tool
since support for ceph-disk got deprectated from Nautilus version of
ceph.

Change-Id: I10f42fd0cb43a951f480594d269fd998de5678bf
2020-07-02 15:05:03 +00:00
Andrii Ostapenko
ecb58b85be
Fix ara installation
Using the latest ara supporting ansible 2.5.5

Change-Id: Id44948986609093b709e23e0d9f9eddd690fa2b8
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-01 23:56:03 -05:00
Andrii Ostapenko
b49541f300
Fix developers kubeadm installation
Waiting for kube-apiserver is failing with not finding python
executable.

Change-Id: Ib0ff95088c658fec3180f071269041faa7da2ecf
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-01 23:51:45 -05:00
Zuul
7cf8c6f893 Merge "Fix drop databases issue in Postgresql restore" 2020-07-01 23:06:25 +00:00
Zuul
af5a742a17 Merge "Add generate openAPIV3Schema schema for DaemonJob CRD." 2020-07-01 21:56:31 +00:00
Zuul
fc7bb216d2 Merge "Fix return code when backup to remote rgw fails" 2020-07-01 21:46:25 +00:00
Zuul
646ce9bd4c Merge "Make mariadb chart compatible with mariadb 10.4.13" 2020-07-01 05:35:56 +00:00
Zuul
06a0244ccc Merge "Updating nagios cluster role for rbd monitoring" 2020-06-30 21:02:53 +00:00
Cliff Parsons
4964ea2a76 Fix drop databases issue in Postgresql restore
Recently, the Postgresql backups were modified to generate drop database
commands (--clean pgdumpall option). Also for single database restore,
a DROP DATABASE command was added before the restore so that the
database could be restored without duplicate rows. However, if there are
existing database connections (by the applications or other users), then
the drop database commands will fail. So for the duration of the restore
database operation, the databases being restored need to have their
existing connections dropped and new connections prevented until the
database(s) restored, then connections should be re-allowed.

Also found a problem with psql returning 0 (success code) even though
there were errors during its execution. The solution is to check the
output for errors and if there are any, dump out the log file for the
user to see and let the user know there are errors.

Lastly, a problem was found with the single database restortion, where
the database dump for a single database was being incorrectly extracted
from the psql dump file, resulting in the database not being restored
correctly (most of the db being wiped out). This patchset fixes that
issue as well.

Change-Id: I4db3f6ac7e9fe7cce6a432dfba056e17ad1e3f06
2020-06-30 19:39:00 +00:00
Cliff Parsons
1508324ce7 Fix return code when backup to remote rgw fails
In the database backup framework (_backup_main.sh.tpl), the
backup_databases function exits with code 1 if the store_backup_remotely
function fails to send the backup to the remote RGW. This causes the pod
to fail and be restarted by the cronjob, over and over until the backoff
retries limit (6 by default) is reached, so it creates many copies of
the same backup on the file system, and the default k8s behavior is to
delete the job/pods once the backoff limit has been exceeded, so it then
becomes more difficult to troubleshoot (although we may have logs in
elasticsearch). This patch changes the return code to 0 so that the pod
will not fail in that scenario. The error logs generated should be
enough to flag the failure (via Nagios or whatever alerting system is
being used).

Change-Id: Ie1c3a7aef290bf6de4752798821d96451c1f2fa5
2020-06-30 16:29:38 +00:00
Luna Das
64c744d756 Add generate openAPIV3Schema schema for DaemonJob CRD.
change docker image to point to the latest metacontroller image.
change python image to point to version 3.7
add updateStrategy to CompositeController.
add replicas config to DaemonJobController via zuul gate.

Change-Id: I2a48bc6472017802267980fe474d81886113fcda
2020-06-30 01:13:41 +05:30
Chinasubbareddy Mallavarapu
3bde9f5b90 [CEPH] OSH-INFRA: use loopback devices for ceph osds
- This is to make use of loopback devices for ceph osds since
support for directory backed osds going to deprecate.

- Move to bluestore from filestore for ceph-osds.
- Seperate DB and WAL partitions from data so that gates will validate
  the scenario where we will have fast storage disk for DB and WAL.

Change-Id: Ief6de17c53d6cb57ef604895fdc66dc6c604fd89
2020-06-29 14:09:32 +00:00
Zuul
b1e66fd308 Merge "Add more fields with verbose description to CompositeController." 2020-06-27 05:29:44 +00:00
Luna Das
594645ce39 Add more fields with verbose description to CompositeController.
Change-Id: Ib6d9db5a8b1be9c3fa6b4cb988c576a71599a274
2020-06-27 00:15:09 +05:30
Taylor, Stephen (st053q)
153c9ec6f0 [ceph-osd] Liveness probe success in preboot state with noup flag
OSDs fail the liveness probe if they can't make it to the 'active'
state. The noup flag keeps OSDs in the 'preboot' state, which
prevents the liveness probe from succeeding. This change adds an
additional check in the liveness probe to allow it to succeed if
the noup flag is set and OSDs are in the 'preboot' state.

Change-Id: I8df5954f7bc4ef4374e19344b6e0a9130764d60c
2020-06-26 11:37:18 -05:00
Mykyta Karpin
1482193fd4 Make mariadb chart compatible with mariadb 10.4.13
since mariadb 10.4.13 definer of view mysql.user is not root
but mariadb.sys user. So when we remove it we break mysql_upgrade,
it fails to fix views. It is safe not to remove it because
the account by default is locked and cannot login.

Change-Id: I5183d7cbb09e18d0e87e0aef8c59bb71ec2f1cb5
Related-Bug: https://jira.mariadb.org/browse/MDEV-22542
2020-06-26 05:11:55 +00:00
DeJaeger, Darren (dd118r)
64cd0faf6a Adjust rabbitmq probes to better reflect its actual state
This PS looks to make a few small tweaks to the rabbitmq probes so
that its health and readiness is more reflective of what is actually
happening inside the container. We were previously seeing instances
of the pod marked as ready before it actually was.

Change-Id: If48ec02d4050f7385e71c2e6fe0fff8f59667af4
2020-06-26 05:10:04 +00:00