2523 Commits

Author SHA1 Message Date
Andrii Ostapenko
41f02d3c98
Fix service account name for ceph-mon keyring generator
Fix issues introduced by https://review.opendev.org/#/c/735648
with extra 'ceph-' in service_account and security context not
rendered for keyring generator containers.

Change-Id: Ie53b3407dbd7345d37c92c60a04f3badf735f6a6
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-07 15:06:23 -05:00
Andrii Ostapenko
824f168efc Undo octal-values restriction together with corresponding code
Unrestrict octal values rule since benefits of file modes readability
exceed possible issues with yaml 1.2 adoption in future k8s versions.
These issues will be addressed when/if they occur.

Also ensure osh-infra is a required project for lint job, that matters
when running job against another project.

Change-Id: Ic5e327cf40c4b09c90738baff56419a6cef132da
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-07 15:42:53 +00:00
Zuul
88b79920db Merge "[rabbitmq] Upgrade to 3.7.26" 2020-07-07 02:57:14 +00:00
Andrii Ostapenko
2b4cf6a2d9 Completely switch to python3 for developers installation
This addresses an issue with using py2 as interpreter while
installing required dependencies with py3.

Also switch kubeadm-aio image to bionic.

Change-Id: I5a9e6678c45fad8288aa6971f57988b46001c665
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-03 09:47:43 +00:00
Zuul
03273bd61d Merge "Fix ara installation" 2020-07-02 19:25:03 +00:00
Zuul
5b9c79604a Merge "Fix developers kubeadm installation" 2020-07-02 19:17:55 +00:00
Chinasubbareddy Mallavarapu
bfe7a99a61 [CEPH] Make ceph-volume as default deployment tool
This is to make ceph-volume as default deployment tool
since support for ceph-disk got deprectated from Nautilus version of
ceph.

Change-Id: I10f42fd0cb43a951f480594d269fd998de5678bf
2020-07-02 15:05:03 +00:00
Andrii Ostapenko
ecb58b85be
Fix ara installation
Using the latest ara supporting ansible 2.5.5

Change-Id: Id44948986609093b709e23e0d9f9eddd690fa2b8
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-01 23:56:03 -05:00
Andrii Ostapenko
b49541f300
Fix developers kubeadm installation
Waiting for kube-apiserver is failing with not finding python
executable.

Change-Id: Ib0ff95088c658fec3180f071269041faa7da2ecf
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-01 23:51:45 -05:00
Zuul
7cf8c6f893 Merge "Fix drop databases issue in Postgresql restore" 2020-07-01 23:06:25 +00:00
Zuul
af5a742a17 Merge "Add generate openAPIV3Schema schema for DaemonJob CRD." 2020-07-01 21:56:31 +00:00
Zuul
fc7bb216d2 Merge "Fix return code when backup to remote rgw fails" 2020-07-01 21:46:25 +00:00
Zuul
646ce9bd4c Merge "Make mariadb chart compatible with mariadb 10.4.13" 2020-07-01 05:35:56 +00:00
Zuul
06a0244ccc Merge "Updating nagios cluster role for rbd monitoring" 2020-06-30 21:02:53 +00:00
Cliff Parsons
4964ea2a76 Fix drop databases issue in Postgresql restore
Recently, the Postgresql backups were modified to generate drop database
commands (--clean pgdumpall option). Also for single database restore,
a DROP DATABASE command was added before the restore so that the
database could be restored without duplicate rows. However, if there are
existing database connections (by the applications or other users), then
the drop database commands will fail. So for the duration of the restore
database operation, the databases being restored need to have their
existing connections dropped and new connections prevented until the
database(s) restored, then connections should be re-allowed.

Also found a problem with psql returning 0 (success code) even though
there were errors during its execution. The solution is to check the
output for errors and if there are any, dump out the log file for the
user to see and let the user know there are errors.

Lastly, a problem was found with the single database restortion, where
the database dump for a single database was being incorrectly extracted
from the psql dump file, resulting in the database not being restored
correctly (most of the db being wiped out). This patchset fixes that
issue as well.

Change-Id: I4db3f6ac7e9fe7cce6a432dfba056e17ad1e3f06
2020-06-30 19:39:00 +00:00
Cliff Parsons
1508324ce7 Fix return code when backup to remote rgw fails
In the database backup framework (_backup_main.sh.tpl), the
backup_databases function exits with code 1 if the store_backup_remotely
function fails to send the backup to the remote RGW. This causes the pod
to fail and be restarted by the cronjob, over and over until the backoff
retries limit (6 by default) is reached, so it creates many copies of
the same backup on the file system, and the default k8s behavior is to
delete the job/pods once the backoff limit has been exceeded, so it then
becomes more difficult to troubleshoot (although we may have logs in
elasticsearch). This patch changes the return code to 0 so that the pod
will not fail in that scenario. The error logs generated should be
enough to flag the failure (via Nagios or whatever alerting system is
being used).

Change-Id: Ie1c3a7aef290bf6de4752798821d96451c1f2fa5
2020-06-30 16:29:38 +00:00
Luna Das
64c744d756 Add generate openAPIV3Schema schema for DaemonJob CRD.
change docker image to point to the latest metacontroller image.
change python image to point to version 3.7
add updateStrategy to CompositeController.
add replicas config to DaemonJobController via zuul gate.

Change-Id: I2a48bc6472017802267980fe474d81886113fcda
2020-06-30 01:13:41 +05:30
Chinasubbareddy Mallavarapu
3bde9f5b90 [CEPH] OSH-INFRA: use loopback devices for ceph osds
- This is to make use of loopback devices for ceph osds since
support for directory backed osds going to deprecate.

- Move to bluestore from filestore for ceph-osds.
- Seperate DB and WAL partitions from data so that gates will validate
  the scenario where we will have fast storage disk for DB and WAL.

Change-Id: Ief6de17c53d6cb57ef604895fdc66dc6c604fd89
2020-06-29 14:09:32 +00:00
Zuul
b1e66fd308 Merge "Add more fields with verbose description to CompositeController." 2020-06-27 05:29:44 +00:00
Luna Das
594645ce39 Add more fields with verbose description to CompositeController.
Change-Id: Ib6d9db5a8b1be9c3fa6b4cb988c576a71599a274
2020-06-27 00:15:09 +05:30
Taylor, Stephen (st053q)
153c9ec6f0 [ceph-osd] Liveness probe success in preboot state with noup flag
OSDs fail the liveness probe if they can't make it to the 'active'
state. The noup flag keeps OSDs in the 'preboot' state, which
prevents the liveness probe from succeeding. This change adds an
additional check in the liveness probe to allow it to succeed if
the noup flag is set and OSDs are in the 'preboot' state.

Change-Id: I8df5954f7bc4ef4374e19344b6e0a9130764d60c
2020-06-26 11:37:18 -05:00
Mykyta Karpin
1482193fd4 Make mariadb chart compatible with mariadb 10.4.13
since mariadb 10.4.13 definer of view mysql.user is not root
but mariadb.sys user. So when we remove it we break mysql_upgrade,
it fails to fix views. It is safe not to remove it because
the account by default is locked and cannot login.

Change-Id: I5183d7cbb09e18d0e87e0aef8c59bb71ec2f1cb5
Related-Bug: https://jira.mariadb.org/browse/MDEV-22542
2020-06-26 05:11:55 +00:00
Tin Lam
7cb3ef69ae feat(tls): add tls support to helm-toolkit
This patch set:

- allows options in the bootstrap job to load the proper TLS secret into
  the  proper envvar so the openstack client can connect properly to
  perform bootstrap;
- adds in certificates to make rally work properly with TLS endpoints;
- adds methods to handle TLS secret volume and volumeMount;
- updates ingress to handle secure backends.

Change-Id: I322cda393f18bfeed0b9f8b1827d101f60d6bdeb
Signed-off-by: Tin Lam <tin@irrational.io>
2020-06-26 00:32:57 +00:00
Chris Wedgwood
6d032c3971 [rabbitmq] Upgrade to 3.7.26
Staying current.  Many bugfixes.

Change-Id: Ib95c30380d89c336774d5c74e02ce5cbd9efb5d7
2020-06-25 23:32:50 +00:00
Zuul
5e316a9ba0 Merge "Mariadb backup/restore enhancements" 2020-06-25 18:48:08 +00:00
Zuul
e48feaefb2 Merge "[ceph-rgw] Add rwg restart job" 2020-06-25 17:17:26 +00:00
Zuul
b4c66cea6a Merge "Fix problems with DB utilities in HTK and Postgresql" 2020-06-25 16:17:17 +00:00
Alexander Vlasov
70b0b9b266 [ceph-rgw] Add rwg restart job
Some updates to rgw config like zone or zonegroup changes that can
be done during bootstrap process require rgw restart.
Add restart job which when enabled will use
'kubectl rollout restart deployment'
in order to restart rgw

This will be more useful in greenfield scenarios where
we need to setup zone/zonegroups right after rgw svc up which
needs to restart rgw svc.

Change-Id: I6667237e92a8b87a06d2a59c65210c482f3b7302
2020-06-25 13:15:56 +00:00
Zuul
9655817eae Merge "Remove duplicate lint job entry and script" 2020-06-25 04:11:51 +00:00
Huang, Sophie (sh879n)
573ac49939 Mariadb backup/restore enhancements
Below enhancements are made to Mariadb backup:
1) Used new helm-toolkit function to send/retrieve Mariadb
   backups to/from RGW via OpenStack Swift API.
2) Modified the backup script such that the database backup
   tarball can be sent to RGW.
3) Added a keystone user for RGW access.
4) Added a secret for OpenStack Swift API access.
5) Changed the cronjob image and runAsUser
6) Modified the restore script so that archives stored remotely
   on RGW can be used for the restore data source.
7) Added functions to the restore script to retrieve data
   from an archive for tables, table rows and table schema of a databse
8) Added a secret containing all the backup/restore related
   configuration needed for invoking the backup/restore operation
   from a different application or namespace.

Change-Id: Iadb9438fe419cded374897b43337039609077e61
2020-06-24 21:13:21 +00:00
Cliff Parsons
1da7a5b0f8 Fix problems with DB utilities in HTK and Postgresql
This PS fixes:
1) Removes printing of the word "Done" after the restore/list command
   executes, which is not needed and clutters the output.
2) Fixes problem with list_tables related to command output.
3) Fixes parameter ordering problem with list_rows and list_schema
4) Adds the missing menu/parameter parsing code for list_schema
5) Fixes backup-restore secret and handling of PD_DUMPALL_OPTIONS.
6) Fixes single db restore, which wasn't dropping the database, and
   ended up adding duplicate rows.
7) Fixes cronjob deficiencies - added security context and init containers,
   fixed backup related service account related typos.
8) Fixes get_schema so that it only finds the table requested, rather
   than other tables that also start with the same substring.
9) Fixes swift endpoint issue where it sometimes returns the wrong
   endpoint, due to bad grep command.

Change-Id: I0e3ab81732db031cb6e162b622efaf77bbc7ec25
2020-06-24 19:16:04 +00:00
Singh, Jasvinder (js581j)
fd8cdb66af Updating nagios cluster role for rbd monitoring
This patchset is required for the patch set https://review.opendev.org/#/c/737629.
The kuberntes python api requires these permissions, for this script to work properly.

Change-Id: I69f2ca40ab6068295a4cb2d85073183ca348af1e
2020-06-23 17:59:17 -04:00
Zuul
401d4e70ce Merge "Add node-problem-detector chart" 2020-06-22 23:11:27 +00:00
Steve Wilkerson
a31bb2b049 Add node-problem-detector chart
This adds a chart for the node problem detector. This chart
will help provide additional insight into the status of the
underlying infrastructure of a deployment.

Updated the chart with new yamllint checks.

Change-Id: I21a24b67b121388107b20ab38ac7703c7a33f1c1
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2020-06-22 13:00:55 -05:00
Gage Hugo
16676b5b63 Remove duplicate lint job entry and script
osh-infra currently has a duplicate linter playbook that is not
being used, since the other is used for both osh and osh-infra.
This change removes the duplicate entry and playbook.

Change-Id: If7040243a45f2166973dc5f0c8cd793431916942
2020-06-22 12:31:25 -05:00
chinasubbareddy mallavarapu
91f60d2884 Revert "[ceph-client] Update ceph-mon port."
Reverting this ps since we tried to solve the  problem here for
the old clients prior to nautilus but nautilus clients thinks
its v2 port and try to communicate with server and getting some
warnings as shown below:

lets make v2 port as default and ovverride  mon_host config for
old clients prior  to nautilus as we did in this ps
(https://review.opendev.org/#/c/711648/).

better solution will be moving out of old ceph clients by changing
the images wherever old ceph clients are installed.

log:

+ ceph auth get-or-create client.cinder mon 'profile rbd' osd
'profile rbd' -o /tmp/tmp.k9PBzKOyCq.keyring
2020-06-19 15:56:13.100 7febee088700 -1 --2-
172.29.0.139:0/2835096817 >> v2:172.29.0.141:6790/0 conn(0x7febe816b4d0
0x7febe816b990 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0
rx=0 tx=0)._handle_peer_banner peer v2:172.29.0.141:6790/0 is using msgr V1 protocol

This reverts commit acde91c87d5e233d1180544df919cb6603e306a9.

Change-Id: I08ef968b3e80c80b973ae4ec1f80ba1618f0e0a5
2020-06-19 22:16:16 +00:00
Gage Hugo
26350f37aa Add new python roles to playbooks
With the latest infra update, the images used no longer contain
python by default and projects are expected to use the new
ensure roles to use packages as needed.

This change adds some of the ensure roles to a few playbooks,
additional cleanup can be done using these in future changes.

Change-Id: Ie14ab297e71195d4fee070af253edf4d25ee5d27
2020-06-19 18:07:13 +00:00
Zuul
e4167fd248 Merge "[ceph-osd] Allow ceph-volume to deploy OSDs on dirty disks" 2020-06-18 22:12:26 +00:00
Tin Lam
587182c779 fix(ovs): add capability to openvswitch
While OpenVSwitch works in the gate using kubernetes 1.16, running this
in kubernetes 1.18 causes a permission denied error while executing
chroot in an init container script [0]. This adds the SYS_CHROOT
capability to address the error.

[0] https://opendev.org/openstack/openstack-helm-infra/src/branch/master/openvswitch/templates/bin/_openvswitch-vswitchd-init-modules.sh.tpl#L18-L20

Change-Id: I62c01678cce6cd4e98418ed5518613ccd5eecbf9
Signed-off-by: Tin Lam <tin@irrational.io>
2020-06-18 17:07:40 +00:00
Zuul
eaaf0062e4 Merge "(fix) Changed pip to pip3" 2020-06-18 15:47:03 +00:00
Zuul
ee12b4c5db Merge "Don't run linter on docs changes" 2020-06-18 15:47:01 +00:00
Brian Wickersham
567a7c6c1e [ceph-osd] Allow ceph-volume to deploy OSDs on dirty disks
Currently there are conditions that can prevent Bluestore OSDs
from deploying correctly if the disk used was previously deployed
as an OSD in another Ceph cluster. This change fixes the
ceph-volume OSD init script so it can handle these situations
correctly if OSD_FORCE_REPAIR is set.

Additionally, there is a race condition that may occur which
causes logical volumes to not get tagged with all of the
necessary metadata for OSDs to function. This change fixes
that issue as well.

Change-Id: I869ba97d2224081c99ed1728b1aaa1b893d47c87
2020-06-18 14:04:02 +00:00
Zuul
0a35fd827e Merge "Enable key-duplicates and octal-values yamllint checks" 2020-06-18 04:49:03 +00:00
Zuul
017f16274d Merge "ceph-osd: Log the script name, lineno and funcname" 2020-06-18 04:01:58 +00:00
Zuul
7935018d8f Merge "Don't rely on pip and tox installed on zuul node" 2020-06-18 03:31:44 +00:00
Zuul
6217a5eda3 Merge "[ceph-osd, ceph-client] Weight OSDs as they are added" 2020-06-18 02:22:53 +00:00
Gage Hugo
16ff2531e4 Don't rely on pip and tox installed on zuul node
Change-Id: I3b715a4cc5ae064b458694ab98feb2b6cc226e65
2020-06-18 01:00:31 +00:00
Zuul
16414767e0 Merge "Enable yamllint rules for templates" 2020-06-18 00:09:28 +00:00
Gage Hugo
6b5d1a1d4a Don't run linter on docs changes
This change modifies the linting job to not run when a patchset
only modifies openstack-helm documentation.

Change-Id: I0ed0fd5fff10d81dd34351b7da930d1a340b10d8
2020-06-17 18:06:34 -05:00
Stephen Taylor
59b825ae48 [ceph-osd, ceph-client] Weight OSDs as they are added
Currently OSDs are added by the ceph-osd chart with zero weight
and they get reweighted to proper weights in the ceph-client chart
after all OSDs have been deployed. This causes a problem when a
deployment is partially completed and additional OSDs are added
later. In this case the ceph-client chart has already run and the
new OSDs don't ever get weighted correctly. This change weights
OSDs properly as they are deployed instead. As noted in the
script, the noin flag may be set during the deployment to prevent
rebalancing as OSDs are added if necessary.

Added the ability to set and unset Ceph cluster flags in the
ceph-client chart.

Change-Id: Ic9a3d8d5625af49b093976a855dd66e5705d2c29
2020-06-17 21:49:39 +00:00