The PS updates helm test and replaces "expected_osds" variable
by the amount of OSDs available in the cluster (ceph-client).
Also the PS updates the logic of calculation of minimum amount of OSDs.
Change-Id: Ic8402d668d672f454f062bed369cac516ed1573e
The PS improves performance by replacing lsblk/blkid (In some cases blkid may be pretty slow).
Also it allows to avoid deadlocks when there are RBDs mapped on the host.
Change-Id: If607e168515f55478e9e55e421738d2d00269d3f
The PS updates "post-apply" job and adds execution of "wait_for_pods"
function as the first step of the job.
Change-Id: I98644981094cb4fb7cc348b80628006ab59cb77f
Based on 8 CPU 16GB memory ubuntu-bionic-expanded label
Change-Id: I1ef27858b5b02d367eea1c24447aefa2b6712458
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This updates the ceph-rgw and ceph-osd chart to include the pod
security context on the pod template.
This also adds the container security context to set
readOnlyRootFilesystem flag to true
Change-Id: I1b78b7a0fc413acdb5ea2dc295a0026616d7cac1
Implement helm-toolkit snippet for grafana add-home-dashboard
which adds security context template at pod/container
Change-Id: I12a5fd6c5043079f830eb36043f5b0ca495a3e93
This patchset captures the grants of all the MariaDB users
in the backup tarball and restores the grants during the
all databases restore.
Percona tool pt-show-grants is installed to the image to
accomplish the task in this PS:
https://review.opendev.org/#/c/739149/
Change-Id: I26882956f96c961b6202b1004b8cf0faee6e73eb
This updates the chart to include the pod security context
on the pod template.
This also adds the container security context to set
readOnlyRootFilesystem flag to true
Change-Id: Icb7a9de4d98bac1f0bcf6181b6e88695f4b09709
Fix issues introduced by https://review.opendev.org/#/c/735648
with extra 'ceph-' in service_account and security context not
rendered for keyring generator containers.
Change-Id: Ie53b3407dbd7345d37c92c60a04f3badf735f6a6
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
- Remove "if" condition of allocate_data_node
- Dealy 5 seconds for wait_to_join initial check to start
- Set 60 minutes timeout for wait_to_join function
Change-Id: Ie42af89551bd8804b87fe936c676e85130564187
Unrestrict octal values rule since benefits of file modes readability
exceed possible issues with yaml 1.2 adoption in future k8s versions.
These issues will be addressed when/if they occur.
Also ensure osh-infra is a required project for lint job, that matters
when running job against another project.
Change-Id: Ic5e327cf40c4b09c90738baff56419a6cef132da
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This updates the elasticsearch chart to include the pod
security context on the pod template.
This also adds the container security context to set
readOnlyRootFilesystem flag to true
Change-Id: I8d1057f242b741fd297eca7475eb3bfb5e383f1c
1) Updated docker image for heat to point to Stein and Bionic
2) Enabled Apparmor Job for prometheus-openstack exporter.
Change-Id: I1ee8acb848ece3c334b087309d452d5137ea0798
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
This addresses an issue with using py2 as interpreter while
installing required dependencies with py3.
Also switch kubeadm-aio image to bionic.
Change-Id: I5a9e6678c45fad8288aa6971f57988b46001c665
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This is to make ceph-volume as default deployment tool
since support for ceph-disk got deprectated from Nautilus version of
ceph.
Change-Id: I10f42fd0cb43a951f480594d269fd998de5678bf
Using the latest ara supporting ansible 2.5.5
Change-Id: Id44948986609093b709e23e0d9f9eddd690fa2b8
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
Waiting for kube-apiserver is failing with not finding python
executable.
Change-Id: Ib0ff95088c658fec3180f071269041faa7da2ecf
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
Recently, the Postgresql backups were modified to generate drop database
commands (--clean pgdumpall option). Also for single database restore,
a DROP DATABASE command was added before the restore so that the
database could be restored without duplicate rows. However, if there are
existing database connections (by the applications or other users), then
the drop database commands will fail. So for the duration of the restore
database operation, the databases being restored need to have their
existing connections dropped and new connections prevented until the
database(s) restored, then connections should be re-allowed.
Also found a problem with psql returning 0 (success code) even though
there were errors during its execution. The solution is to check the
output for errors and if there are any, dump out the log file for the
user to see and let the user know there are errors.
Lastly, a problem was found with the single database restortion, where
the database dump for a single database was being incorrectly extracted
from the psql dump file, resulting in the database not being restored
correctly (most of the db being wiped out). This patchset fixes that
issue as well.
Change-Id: I4db3f6ac7e9fe7cce6a432dfba056e17ad1e3f06
In the database backup framework (_backup_main.sh.tpl), the
backup_databases function exits with code 1 if the store_backup_remotely
function fails to send the backup to the remote RGW. This causes the pod
to fail and be restarted by the cronjob, over and over until the backoff
retries limit (6 by default) is reached, so it creates many copies of
the same backup on the file system, and the default k8s behavior is to
delete the job/pods once the backoff limit has been exceeded, so it then
becomes more difficult to troubleshoot (although we may have logs in
elasticsearch). This patch changes the return code to 0 so that the pod
will not fail in that scenario. The error logs generated should be
enough to flag the failure (via Nagios or whatever alerting system is
being used).
Change-Id: Ie1c3a7aef290bf6de4752798821d96451c1f2fa5
change docker image to point to the latest metacontroller image.
change python image to point to version 3.7
add updateStrategy to CompositeController.
add replicas config to DaemonJobController via zuul gate.
Change-Id: I2a48bc6472017802267980fe474d81886113fcda
- This is to make use of loopback devices for ceph osds since
support for directory backed osds going to deprecate.
- Move to bluestore from filestore for ceph-osds.
- Seperate DB and WAL partitions from data so that gates will validate
the scenario where we will have fast storage disk for DB and WAL.
Change-Id: Ief6de17c53d6cb57ef604895fdc66dc6c604fd89
OSDs fail the liveness probe if they can't make it to the 'active'
state. The noup flag keeps OSDs in the 'preboot' state, which
prevents the liveness probe from succeeding. This change adds an
additional check in the liveness probe to allow it to succeed if
the noup flag is set and OSDs are in the 'preboot' state.
Change-Id: I8df5954f7bc4ef4374e19344b6e0a9130764d60c
since mariadb 10.4.13 definer of view mysql.user is not root
but mariadb.sys user. So when we remove it we break mysql_upgrade,
it fails to fix views. It is safe not to remove it because
the account by default is locked and cannot login.
Change-Id: I5183d7cbb09e18d0e87e0aef8c59bb71ec2f1cb5
Related-Bug: https://jira.mariadb.org/browse/MDEV-22542
This PS looks to make a few small tweaks to the rabbitmq probes so
that its health and readiness is more reflective of what is actually
happening inside the container. We were previously seeing instances
of the pod marked as ready before it actually was.
Change-Id: If48ec02d4050f7385e71c2e6fe0fff8f59667af4