Fix issues introduced by https://review.opendev.org/#/c/735648
with extra 'ceph-' in service_account and security context not
rendered for keyring generator containers.
Change-Id: Ie53b3407dbd7345d37c92c60a04f3badf735f6a6
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
Unrestrict octal values rule since benefits of file modes readability
exceed possible issues with yaml 1.2 adoption in future k8s versions.
These issues will be addressed when/if they occur.
Also ensure osh-infra is a required project for lint job, that matters
when running job against another project.
Change-Id: Ic5e327cf40c4b09c90738baff56419a6cef132da
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This addresses an issue with using py2 as interpreter while
installing required dependencies with py3.
Also switch kubeadm-aio image to bionic.
Change-Id: I5a9e6678c45fad8288aa6971f57988b46001c665
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This is to make ceph-volume as default deployment tool
since support for ceph-disk got deprectated from Nautilus version of
ceph.
Change-Id: I10f42fd0cb43a951f480594d269fd998de5678bf
Using the latest ara supporting ansible 2.5.5
Change-Id: Id44948986609093b709e23e0d9f9eddd690fa2b8
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
Waiting for kube-apiserver is failing with not finding python
executable.
Change-Id: Ib0ff95088c658fec3180f071269041faa7da2ecf
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
Recently, the Postgresql backups were modified to generate drop database
commands (--clean pgdumpall option). Also for single database restore,
a DROP DATABASE command was added before the restore so that the
database could be restored without duplicate rows. However, if there are
existing database connections (by the applications or other users), then
the drop database commands will fail. So for the duration of the restore
database operation, the databases being restored need to have their
existing connections dropped and new connections prevented until the
database(s) restored, then connections should be re-allowed.
Also found a problem with psql returning 0 (success code) even though
there were errors during its execution. The solution is to check the
output for errors and if there are any, dump out the log file for the
user to see and let the user know there are errors.
Lastly, a problem was found with the single database restortion, where
the database dump for a single database was being incorrectly extracted
from the psql dump file, resulting in the database not being restored
correctly (most of the db being wiped out). This patchset fixes that
issue as well.
Change-Id: I4db3f6ac7e9fe7cce6a432dfba056e17ad1e3f06
In the database backup framework (_backup_main.sh.tpl), the
backup_databases function exits with code 1 if the store_backup_remotely
function fails to send the backup to the remote RGW. This causes the pod
to fail and be restarted by the cronjob, over and over until the backoff
retries limit (6 by default) is reached, so it creates many copies of
the same backup on the file system, and the default k8s behavior is to
delete the job/pods once the backoff limit has been exceeded, so it then
becomes more difficult to troubleshoot (although we may have logs in
elasticsearch). This patch changes the return code to 0 so that the pod
will not fail in that scenario. The error logs generated should be
enough to flag the failure (via Nagios or whatever alerting system is
being used).
Change-Id: Ie1c3a7aef290bf6de4752798821d96451c1f2fa5
change docker image to point to the latest metacontroller image.
change python image to point to version 3.7
add updateStrategy to CompositeController.
add replicas config to DaemonJobController via zuul gate.
Change-Id: I2a48bc6472017802267980fe474d81886113fcda
- This is to make use of loopback devices for ceph osds since
support for directory backed osds going to deprecate.
- Move to bluestore from filestore for ceph-osds.
- Seperate DB and WAL partitions from data so that gates will validate
the scenario where we will have fast storage disk for DB and WAL.
Change-Id: Ief6de17c53d6cb57ef604895fdc66dc6c604fd89
OSDs fail the liveness probe if they can't make it to the 'active'
state. The noup flag keeps OSDs in the 'preboot' state, which
prevents the liveness probe from succeeding. This change adds an
additional check in the liveness probe to allow it to succeed if
the noup flag is set and OSDs are in the 'preboot' state.
Change-Id: I8df5954f7bc4ef4374e19344b6e0a9130764d60c
since mariadb 10.4.13 definer of view mysql.user is not root
but mariadb.sys user. So when we remove it we break mysql_upgrade,
it fails to fix views. It is safe not to remove it because
the account by default is locked and cannot login.
Change-Id: I5183d7cbb09e18d0e87e0aef8c59bb71ec2f1cb5
Related-Bug: https://jira.mariadb.org/browse/MDEV-22542
This patch set:
- allows options in the bootstrap job to load the proper TLS secret into
the proper envvar so the openstack client can connect properly to
perform bootstrap;
- adds in certificates to make rally work properly with TLS endpoints;
- adds methods to handle TLS secret volume and volumeMount;
- updates ingress to handle secure backends.
Change-Id: I322cda393f18bfeed0b9f8b1827d101f60d6bdeb
Signed-off-by: Tin Lam <tin@irrational.io>
Some updates to rgw config like zone or zonegroup changes that can
be done during bootstrap process require rgw restart.
Add restart job which when enabled will use
'kubectl rollout restart deployment'
in order to restart rgw
This will be more useful in greenfield scenarios where
we need to setup zone/zonegroups right after rgw svc up which
needs to restart rgw svc.
Change-Id: I6667237e92a8b87a06d2a59c65210c482f3b7302
Below enhancements are made to Mariadb backup:
1) Used new helm-toolkit function to send/retrieve Mariadb
backups to/from RGW via OpenStack Swift API.
2) Modified the backup script such that the database backup
tarball can be sent to RGW.
3) Added a keystone user for RGW access.
4) Added a secret for OpenStack Swift API access.
5) Changed the cronjob image and runAsUser
6) Modified the restore script so that archives stored remotely
on RGW can be used for the restore data source.
7) Added functions to the restore script to retrieve data
from an archive for tables, table rows and table schema of a databse
8) Added a secret containing all the backup/restore related
configuration needed for invoking the backup/restore operation
from a different application or namespace.
Change-Id: Iadb9438fe419cded374897b43337039609077e61
This PS fixes:
1) Removes printing of the word "Done" after the restore/list command
executes, which is not needed and clutters the output.
2) Fixes problem with list_tables related to command output.
3) Fixes parameter ordering problem with list_rows and list_schema
4) Adds the missing menu/parameter parsing code for list_schema
5) Fixes backup-restore secret and handling of PD_DUMPALL_OPTIONS.
6) Fixes single db restore, which wasn't dropping the database, and
ended up adding duplicate rows.
7) Fixes cronjob deficiencies - added security context and init containers,
fixed backup related service account related typos.
8) Fixes get_schema so that it only finds the table requested, rather
than other tables that also start with the same substring.
9) Fixes swift endpoint issue where it sometimes returns the wrong
endpoint, due to bad grep command.
Change-Id: I0e3ab81732db031cb6e162b622efaf77bbc7ec25
This patchset is required for the patch set https://review.opendev.org/#/c/737629.
The kuberntes python api requires these permissions, for this script to work properly.
Change-Id: I69f2ca40ab6068295a4cb2d85073183ca348af1e
This adds a chart for the node problem detector. This chart
will help provide additional insight into the status of the
underlying infrastructure of a deployment.
Updated the chart with new yamllint checks.
Change-Id: I21a24b67b121388107b20ab38ac7703c7a33f1c1
Signed-off-by: Steve Wilkerson <sw5822@att.com>
osh-infra currently has a duplicate linter playbook that is not
being used, since the other is used for both osh and osh-infra.
This change removes the duplicate entry and playbook.
Change-Id: If7040243a45f2166973dc5f0c8cd793431916942
Reverting this ps since we tried to solve the problem here for
the old clients prior to nautilus but nautilus clients thinks
its v2 port and try to communicate with server and getting some
warnings as shown below:
lets make v2 port as default and ovverride mon_host config for
old clients prior to nautilus as we did in this ps
(https://review.opendev.org/#/c/711648/).
better solution will be moving out of old ceph clients by changing
the images wherever old ceph clients are installed.
log:
+ ceph auth get-or-create client.cinder mon 'profile rbd' osd
'profile rbd' -o /tmp/tmp.k9PBzKOyCq.keyring
2020-06-19 15:56:13.100 7febee088700 -1 --2-
172.29.0.139:0/2835096817 >> v2:172.29.0.141:6790/0 conn(0x7febe816b4d0
0x7febe816b990 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0
rx=0 tx=0)._handle_peer_banner peer v2:172.29.0.141:6790/0 is using msgr V1 protocol
This reverts commit acde91c87d5e233d1180544df919cb6603e306a9.
Change-Id: I08ef968b3e80c80b973ae4ec1f80ba1618f0e0a5
With the latest infra update, the images used no longer contain
python by default and projects are expected to use the new
ensure roles to use packages as needed.
This change adds some of the ensure roles to a few playbooks,
additional cleanup can be done using these in future changes.
Change-Id: Ie14ab297e71195d4fee070af253edf4d25ee5d27
Currently there are conditions that can prevent Bluestore OSDs
from deploying correctly if the disk used was previously deployed
as an OSD in another Ceph cluster. This change fixes the
ceph-volume OSD init script so it can handle these situations
correctly if OSD_FORCE_REPAIR is set.
Additionally, there is a race condition that may occur which
causes logical volumes to not get tagged with all of the
necessary metadata for OSDs to function. This change fixes
that issue as well.
Change-Id: I869ba97d2224081c99ed1728b1aaa1b893d47c87
This change modifies the linting job to not run when a patchset
only modifies openstack-helm documentation.
Change-Id: I0ed0fd5fff10d81dd34351b7da930d1a340b10d8
Currently OSDs are added by the ceph-osd chart with zero weight
and they get reweighted to proper weights in the ceph-client chart
after all OSDs have been deployed. This causes a problem when a
deployment is partially completed and additional OSDs are added
later. In this case the ceph-client chart has already run and the
new OSDs don't ever get weighted correctly. This change weights
OSDs properly as they are deployed instead. As noted in the
script, the noin flag may be set during the deployment to prevent
rebalancing as OSDs are added if necessary.
Added the ability to set and unset Ceph cluster flags in the
ceph-client chart.
Change-Id: Ic9a3d8d5625af49b093976a855dd66e5705d2c29