3380 Commits

Author SHA1 Message Date
Gage Hugo
477eed26bf Fix indentation
The prometheus-blackbox-exporter chart current fails to install
with helm v3 due to an invalid indentation with metadata labels.

This change fixes the indentation to the correct amount in order
to successfully build and install when using helm v3.

Change-Id: I95942fe49b39a052dd83060b597807f6a52627e4
2022-03-22 15:35:17 -05:00
Ritchie, Frank (fr801x)
ec69dd0ef9 Exec libvirt even when creating secrets
With "hostPid: true" we want the entrypoint process to be libvirtd not a wrapper so that process lifecycle management works as expected.

The fix for now is

  * start libvirtd
  * create secrets (libvirtd needs to be running for this)
  * kill it

then start it again using exec so libvirtd is the entrypoint pid
and container lifecycle should work as expected.

Change-Id: I9ef8a66da0fba70e8db4be3301833263de0617e8
2022-03-22 14:08:26 +00:00
Zuul
b34409b7c3 Merge "Fix elasticsearch-data shutdown" 2022-03-21 18:16:13 +00:00
Zuul
b2254e3eb0 Merge "Fix elasticsearch cronjob rendering" 2022-03-21 18:16:10 +00:00
Gage Hugo
a1bd832b0f Fix comparison error with mariadb and helm v3
The mariadb chart currently fails to deploy due to
differences in handling comparison between helm v2
and v3. This change updates the comparison to work
in both versions.

Change-Id: I9143a16f3011c0c0ae5420e6ec41ad7745a28cab
2022-03-19 01:21:26 +00:00
Markin, Sergiy (sm515x)
848f392b3a [DATABASE] MariaDB de-clustering
Adjust chart behavior in case only one mariadb instance is present and replication is disabled.

Change-Id: Ifa540580cf9d5755b83dbb949555ec814dda2744
2022-03-17 17:34:42 +00:00
Phil Sphicas
03e7fedb2b Fix elasticsearch-data shutdown
The shutdown script for the elasticsearch-data container uses a trap
handler to run the steps outlined in the rolling restart procedure [0].
However, when trying to kill the elasticsearch process (step 3), the
script sends the TERM signal to itself.

The traps are handled recursively, causing the entire termination grace
period to be exhausted before the pod is finally removed.

This change updates the trap handler to terminate the child process(es)
instead, and wait for their completion.

0: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/restart-cluster.html

Change-Id: I0c92ea5cce345cff951f044026a2179dcbd5a3e2
2022-03-16 16:04:15 -07:00
Phil Sphicas
c3da3a6f79 Fix elasticsearch cronjob rendering
The pod security context for the elasticsearch cron jobs is in the wrong
location, causing an error when installing or upgrading the chart.

    ValidationError(CronJob.spec.jobTemplate.spec):
        unknown field "securityContext" in io.k8s.api.batch.v1.JobSpec

This change fixes the rendering.

Change-Id: I0e04b1ba27113d4b7aeefa2035b2b29c45be455a
2022-03-16 15:58:31 -07:00
Sigunov, Vladimir (vs422h)
81179cb2e3 [ceph-mgr] Prevents repeated creation of ceph-mgr service account
Under some circumstances, armada job attempts to recreate an existing
Service Account for ceph-mgr. This patchset aims to remediate the issue.

Change-Id: I69bb9045c0e2f24dc2fa9e94ab6a09a58221e1f5
2022-03-16 13:32:50 -04:00
Phil Sphicas
3a10c5ba95 ingress: Add option to assign VIP as externalIP
Some CNIs support the advertisement of service IPs into BGP, which may
provide an alternative to managing the VIP as an interface on the host.

This change adds an option to assign the ingress VIP as an externalIP to
the ingress service. For example:

    network:
      vip:
        manage: false
        addr: 172.18.0.1/32           # (with or without subnet mask)
        assign_as_external_ip: true

Change-Id: I1eeb07a1f94ef8efcb21f3373e0d5f86be725b33
2022-03-11 11:48:09 -08:00
Stephen Taylor
3b9aa44ac5 [ceph-client] More robust naming of clusterrole-checkdns
Currently if multiple instances of the ceph-client chart are
deployed in the same Kubernetes cluster, the releases will
conflict because the clusterrole-checkdns ClusterRole is a global
resources and has a hard-coded name. This change scopes the
ClusterRole name by release name to address this.

Change-Id: I17d04720ca301f643f6fb9cf5a9b2eec965ef537
2022-03-10 07:21:54 -07:00
Stephen Taylor
77a94d4630 [ceph-mon] Release-specific ceph-templates configmap name
This change corrects the ceph-templates configmap name to be
release-specific like the other configmaps in the chart. This
allows for more robustness in downstream implementations.

Change-Id: I1d09d14f9ba94dbbe11d8a80776f57b9cdf41210
2022-03-08 07:57:08 -07:00
Sigunov, Vladimir (vs422h)
80fe5d81cc [CEPH] Less agressive checks in mgr deployment
Ceph cluster needs only one active manager to function properly.
This PS converts ceph-client-tests rules related to ceph-mgr deployment
from error into warning if the number of standby mgrs is less
than expected.

Change-Id: I53c83c872b95da645da69eabf0864daff842bbd1
2022-03-04 16:39:52 -05:00
Zuul
ebfd04448e Merge "Add force_boot command to rabbit start template" 2022-03-03 14:59:47 +00:00
Zuul
9b9863abc3 Merge "Fix field validation error" 2022-03-01 19:52:09 +00:00
Stephen Taylor
37c237fb78 [ceph-mon] Correct configmap names for all resources
The recent name changes to the ceph-mon configmaps did not get
propagated to all resources in the chart. The hard-coded names in
the unchanged cases were correct and resources deployed
successfully, but this change corrects those configmap names across
all resources for the sake of robustness.

Change-Id: I3195e5ba2726892a7b6e0c31c0fac43bae4aa399
2022-03-01 07:33:31 -07:00
Sigunov, Vladimir (vs422h)
1da245f608 [DATABASE] Maintain minimum given number of backups
Modifies the backup script in the way that there will always be
a minimum given number of days of backups in both local, and remote
(if applicable) locations, regardless the date that the backups
are taken.

Change-Id: I19d5e592905ce83acdba043f68ca4d0b042de065
2022-02-28 14:46:04 -05:00
Stephen Taylor
ad09539f71 [ceph-mon] Change configmap names to be based on release names
This change makes the ceph-mon configmap names dynamic based on
release name to match how the ceph-osd chart is naming configmaps.
The new ceph-mon post-apply job needs this in some cases in order
not to have conflicting configmap names in separate releases.

Change-Id: Id26d0a8310ccff80a608e25d2b0a74a41f9e6a55
2022-02-24 15:24:10 -07:00
Lo, Chi (cl566n)
2fc1ce4a14 Removing -x from database backup script
The set -x has produced 6 identical log strings every time the
log_backup_error_exit function is called.  Prometheus is using
the occurrence and number of some logs over a period of time to
evaluate database backup failure or not.  Only one log should be
generated when a particular database backup scenario failed.

Upon discussion with database backup and restore SME, it is
recommended to remove the set -x once and for all.

Change-Id: I846b5c16908f04ac40ee8f4d87d3b7df86036512
2022-02-23 16:42:29 -08:00
Zuul
110575049b Merge "Add DNS sanity checks to k8s deploy script" 2022-02-23 21:18:08 +00:00
Gage Hugo
f01f35a524 Fix field validation error
The metacontroller chart currently has the field
terminationGracePeriodSeconds in an invalid spot in the template
which causes a chart building error when using helm v3. This
change moves the field to the correct position in the template.

Change-Id: Ief454115f67af35f8dfb570d8315de82d97b536d
2022-02-21 09:58:14 -06:00
Anderson, Craig (ca846m)
feeab3291c Add DNS sanity checks to k8s deploy script
Check that k8s DNS is working, and terminate at the beginning if this is
not the case.

Change-Id: I30867671f39dd9d80f46f5a4381adc9d34df7ab7
2022-02-18 00:15:59 -08:00
Sigunov, Vladimir (vs422h)
728c340dc0 [CEPH] Discovering ceph-mon endpoints
This is a code improvement to reuse ceph monitor doscovering function
in different templates. Calling the mentioned above function from
a single place (helm-infra snippets) allows less code maintenance
and simlifies further development.

Rev. 0.1 Charts version bump for ceph-client, ceph-mon, ceph-osd,
ceph-provisioners and helm-toolkit
Rev. 0.2 Mon endpoint discovery functionality added for
the rados gateway. ClusterRole and ClusterRoleBinding added.
Rev. 0.3 checkdns is allowed to correct ceph.conf for RGW deployment.
Rev. 0.4 Added RoleBinding to the deployment-rgw.
Rev. 0.5 Remove _namespace-client-ceph-config-manager.sh.tpl and
         the appropriate job, because of duplicated functionality.
         Related configuration has been removed.
Rev. 0.6 RoleBinding logic has been changed to meet rules:
    checkdns namespace - HAS ACCESS -> RGW namespace(s)

Change-Id: Ie0af212bdcbbc3aa53335689deed9b226e5d4d89
2022-02-11 14:30:43 -07:00
Zuul
6063c8f14f Merge "[ceph-mon] Add a post-apply job to restart mons after mgrs" 2022-02-11 21:25:41 +00:00
Stephen Taylor
ae17a61836 [ceph-mon] Add a post-apply job to restart mons after mgrs
If the OnDelete pod restart strategy is used for the ceph-mon
daemonset, run a post-apply job to restart the ceph-mon pods one
at a time. Otherwise the mons could restart before the mgrs, which
can be problematic in some upgrade scenarios.

Change-Id: I57f87130e95088217c3cfe73512caaae41d3ef22
2022-02-10 12:43:23 -07:00
Ritchie, Frank (fr801x)
c0282d430c Rename prometheus metric
The metric ceph_pool_bytes_used has changed to ceph_pool_stored.

https: //tracker.ceph.com/issues/39932
Change-Id: Iab5cf2b318ce538e72b4592dedd8f0e489741797
2022-02-08 11:24:32 -06:00
Stephen Taylor
ea2c0115c4 Move ceph-mgr deployment to the ceph-mon chart
This change moves the ceph-mgr deployment from the ceph-client
chart to the ceph-mon chart. Its purpose is to facilitate the
proper Ceph upgrade procedure, which prescribes restarting mgr
daemons before mon daemons.

There will be additional work required to implement the correct
daemon restart procedure for upgrades. This change only addresses
the move of the ceph-mgr deployment.

Change-Id: I3ac4a75f776760425c88a0ba1edae5fb339f128d
2022-02-05 05:02:18 +00:00
Zuul
f69ea0ea86 Merge "Use bandit 1.7.1 to avoid Python version issues" 2022-02-04 23:20:14 +00:00
Zuul
37a9ff01a1 Merge "memcached: switch to sidecar" 2022-02-04 22:23:05 +00:00
Stephen Taylor
4296b7d486 Use bandit 1.7.1 to avoid Python version issues
The following error is appearing when the bandit playbook is used:
bandit requires Python '>=3.7' but the running Python is 3.6.9

This change specifies bandit 1.7.1 in the playbook, which is
compatible with Python 3.5+

Change-Id: I3b43ed6de3a90af49cfc7124fdee542831f73f40
2022-02-04 11:57:04 -07:00
Maik Catrinque
a0206d9626 Add force_boot command to rabbit start template
Currently, if a multi-node cluster is shut down unexpectedly,
RabbitMQ is not able to boot and sync with the other nodes.

The purpose of this change is to add the possibility to use the
rabbitmqctl force_boot command to recover RabbitMQ cluster from
an unexpected shut down.

Test plan:
PASS: Shutdown and start a multi-node RabbitMQ cluster

Regression:
PASS: OpenStack can be applied successfully
PASS: RabbitMQ nodes can join the RabbitMQ cluster

Story: 2009784
Task: 44290

Ref:
[0] https://www.rabbitmq.com/rabbitmqctl.8.html#force_boot

Signed-off-by: Maik Catrinque <maik.wandercatrinqueandrade@windriver.com>
Co-authored-by: Andrew Martins Carletti <Andrew.MartinsCarletti@windriver.com>
Change-Id: I56e966ea64e8881ba436213f0c9e1cbe547098e3
2022-02-04 10:38:54 -03:00
Mohammed Naser
696e37e3f7 memcached: switch to sidecar
Instead of running the exporter as a seperate deployemnt that talks
to the service, which will NOT be reporting reliable information if
you have more than 1 replica of memcached, this patch insteads moves
things into a sidecar model that runs in the same pod and exposes
the service.

Change-Id: Ia4801b47f44df91db10886f7cb4e8e174557aded
2022-01-28 03:07:05 -05:00
Sophie Huang
25d1eedc59 Postgresql: Enhance postgresql backup
Pick up the helm-toolkit DB backup enhancement in postgresql
to add capability to retry uploading backup to remote server.

Change-Id: I041d83211f08a8d0c9c22a66e16e6b7652bfc7d9
2022-01-25 20:58:27 +00:00
Sophie Huang
11ac37056b [helm-toolkit] add log strings for alert generation
Log string prefixes are added to different error logs
for the generation of alert.

Change-Id: I483cf08e09b2b56a68414f4cc3ade4c3e3cdd9aa
2022-01-08 00:00:16 +00:00
Marlin Cremers
9d7baa9aa8 feat(helm-toolkit): add support for image pull secrets
At the moment it is very difficult to pull images from a private
registry that hasn't been configured on Kubernetes nodes as there
is no way to specify imagePullSecrets on pods.

This change introduces a snippet that can return a set of image
pull secrets using either a default or a per pod value. It also
adds this new snippet to the manifests for standard job types.

Change-Id: I710e1feffdf837627b80bc14320751f743e048cb
2021-12-21 09:03:08 +01:00
Zuul
336766d262 Merge "Mariadb: Enhance mariadb backup" 2021-11-29 22:06:32 +00:00
Stephen Taylor
cb73c61b4e [ceph-osd] Remove wait for misplaced objects during OSD restarts
The wait for misplaced objects during the ceph-osd post-apply job
was added to prevent I/O disruption in the case where misplaced
objects cause multiple replicas in common failure domains. This
concern is only valid before OSD restarts begin because OSD
failures during the restart process won't cause replicas that
violate replication rules to appear elsewhere.

This change keeps the wait for misplaced objects prior to beginning
OSD restarts and removes it during those restarts. The wait during
OSD restarts now only waits for degraded objects to be recovered
before proceeding to the next failure domain.

Change-Id: Ic82c67b43089c7a2b45995d1fd9c285d5c0e7cbc
2021-11-23 12:46:49 -07:00
Gupta, Sangeet (sg774j)
47795919cb Mariadb: Enhance mariadb backup
* Add capability to retry uploading backup to remote server configured
  number of times and delay the retires randomly between configured
  minimum/maximum seconds.
* Enhanced error checking, logging and retrying logic.

Change-Id: Ida3649420bdd6d39ac6ba7412c8c7078a75e0a10
2021-11-20 02:06:28 +00:00
Zuul
4665ebd35f Merge "Added Grafana iDRAC dashboard" 2021-11-10 23:00:44 +00:00
Lo, Chi (cl566n)
92818273e3 Added Grafana iDRAC dashboard
This patchset also refactor the handling of dashboards yaml
files so that multiple configmaps, grouped by functionality
will be created.

Change-Id: I9849e2a2744e1d2ae895d3e18647b9b3a1c38b12
2021-11-10 21:04:23 +00:00
PRIYA, FNU (fp048v)
fddbb0a059 Set Security Context to ks-user job
We need flexibility to add securityContext to ks-user job at pod and containerlevel,
so that it can be executed without elevated privileges.

Change-Id: Ibd8abdc10906ca4648bfcaa91d0f122e56690606
2021-11-08 09:45:11 -06:00
Zuul
15e3d30ba2 Merge "Correct private key size input for Certificates and remove minor version support" 2021-11-05 22:54:31 +00:00
Zuul
12c5f029be Merge "Fix Python exceptions" 2021-11-03 22:33:24 +00:00
Gupta, Sangeet (sg774j)
186155c296 Correct private key size input for Certificates and remove minor version support
In cert-manager v1 API, the private key size "keySize" was updated to "size"
under "privateKey".
Support of minor (less than v1) API version is also removed for certificates.

Change-Id: If3fa0e296b8a1c2ab473e67b24d4465fe42a5268
2021-11-03 14:27:23 +00:00
Gage Hugo
ddb377df6d Test linting osh on helm-toolkit changes
Since most of the charts in both openstack-helm and
this repo use helm-toolkit, changes in helm-toolkit
have the possibility of impacting charts in the
openstack-helm repo and will not be caught in testing
here.

This change adds a conditional linter to lint the
charts in the openstack-helm repo if any changes
to helm-toolkit are made.

Change-Id: I0f6a935eca53d966c01e0902e546ea132a636a9d
2021-11-02 22:46:43 +00:00
Zuul
b2dd2f77e9 Merge "Revert "Set Security Context to ks-user job"" 2021-11-02 14:59:09 +00:00
Zuul
cb1974adf6 Merge "[ceph-osd] Update log-runner container for MAC" 2021-11-01 23:24:50 +00:00
Gage Hugo
55e7706f7e Revert "Set Security Context to ks-user job"
This reverts commit 5407b547bbb08397e41cceec4cf88d7ae9cbf9fc.

Reason for revert: This outputs duplicate securityContext entries,
breaking the yamllinter in osh. This needs a slight rework.

Change-Id: I0c892be5aba7ccd6e3c378e4e45a79d2df03c06a
2021-11-01 22:35:00 +00:00
Zuul
fd0372ef53 Merge "Set Security Context to ks-user job" 2021-11-01 17:40:57 +00:00
Zuul
59bf12d3e2 Merge "[ceph-client] Consolidate mon_host discovery" 2021-11-01 17:17:05 +00:00