The prometheus-blackbox-exporter chart current fails to install
with helm v3 due to an invalid indentation with metadata labels.
This change fixes the indentation to the correct amount in order
to successfully build and install when using helm v3.
Change-Id: I95942fe49b39a052dd83060b597807f6a52627e4
With "hostPid: true" we want the entrypoint process to be libvirtd not a wrapper so that process lifecycle management works as expected.
The fix for now is
* start libvirtd
* create secrets (libvirtd needs to be running for this)
* kill it
then start it again using exec so libvirtd is the entrypoint pid
and container lifecycle should work as expected.
Change-Id: I9ef8a66da0fba70e8db4be3301833263de0617e8
The mariadb chart currently fails to deploy due to
differences in handling comparison between helm v2
and v3. This change updates the comparison to work
in both versions.
Change-Id: I9143a16f3011c0c0ae5420e6ec41ad7745a28cab
The shutdown script for the elasticsearch-data container uses a trap
handler to run the steps outlined in the rolling restart procedure [0].
However, when trying to kill the elasticsearch process (step 3), the
script sends the TERM signal to itself.
The traps are handled recursively, causing the entire termination grace
period to be exhausted before the pod is finally removed.
This change updates the trap handler to terminate the child process(es)
instead, and wait for their completion.
0: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/restart-cluster.html
Change-Id: I0c92ea5cce345cff951f044026a2179dcbd5a3e2
The pod security context for the elasticsearch cron jobs is in the wrong
location, causing an error when installing or upgrading the chart.
ValidationError(CronJob.spec.jobTemplate.spec):
unknown field "securityContext" in io.k8s.api.batch.v1.JobSpec
This change fixes the rendering.
Change-Id: I0e04b1ba27113d4b7aeefa2035b2b29c45be455a
Under some circumstances, armada job attempts to recreate an existing
Service Account for ceph-mgr. This patchset aims to remediate the issue.
Change-Id: I69bb9045c0e2f24dc2fa9e94ab6a09a58221e1f5
Some CNIs support the advertisement of service IPs into BGP, which may
provide an alternative to managing the VIP as an interface on the host.
This change adds an option to assign the ingress VIP as an externalIP to
the ingress service. For example:
network:
vip:
manage: false
addr: 172.18.0.1/32 # (with or without subnet mask)
assign_as_external_ip: true
Change-Id: I1eeb07a1f94ef8efcb21f3373e0d5f86be725b33
Currently if multiple instances of the ceph-client chart are
deployed in the same Kubernetes cluster, the releases will
conflict because the clusterrole-checkdns ClusterRole is a global
resources and has a hard-coded name. This change scopes the
ClusterRole name by release name to address this.
Change-Id: I17d04720ca301f643f6fb9cf5a9b2eec965ef537
This change corrects the ceph-templates configmap name to be
release-specific like the other configmaps in the chart. This
allows for more robustness in downstream implementations.
Change-Id: I1d09d14f9ba94dbbe11d8a80776f57b9cdf41210
Ceph cluster needs only one active manager to function properly.
This PS converts ceph-client-tests rules related to ceph-mgr deployment
from error into warning if the number of standby mgrs is less
than expected.
Change-Id: I53c83c872b95da645da69eabf0864daff842bbd1
The recent name changes to the ceph-mon configmaps did not get
propagated to all resources in the chart. The hard-coded names in
the unchanged cases were correct and resources deployed
successfully, but this change corrects those configmap names across
all resources for the sake of robustness.
Change-Id: I3195e5ba2726892a7b6e0c31c0fac43bae4aa399
Modifies the backup script in the way that there will always be
a minimum given number of days of backups in both local, and remote
(if applicable) locations, regardless the date that the backups
are taken.
Change-Id: I19d5e592905ce83acdba043f68ca4d0b042de065
This change makes the ceph-mon configmap names dynamic based on
release name to match how the ceph-osd chart is naming configmaps.
The new ceph-mon post-apply job needs this in some cases in order
not to have conflicting configmap names in separate releases.
Change-Id: Id26d0a8310ccff80a608e25d2b0a74a41f9e6a55
The set -x has produced 6 identical log strings every time the
log_backup_error_exit function is called. Prometheus is using
the occurrence and number of some logs over a period of time to
evaluate database backup failure or not. Only one log should be
generated when a particular database backup scenario failed.
Upon discussion with database backup and restore SME, it is
recommended to remove the set -x once and for all.
Change-Id: I846b5c16908f04ac40ee8f4d87d3b7df86036512
The metacontroller chart currently has the field
terminationGracePeriodSeconds in an invalid spot in the template
which causes a chart building error when using helm v3. This
change moves the field to the correct position in the template.
Change-Id: Ief454115f67af35f8dfb570d8315de82d97b536d
This is a code improvement to reuse ceph monitor doscovering function
in different templates. Calling the mentioned above function from
a single place (helm-infra snippets) allows less code maintenance
and simlifies further development.
Rev. 0.1 Charts version bump for ceph-client, ceph-mon, ceph-osd,
ceph-provisioners and helm-toolkit
Rev. 0.2 Mon endpoint discovery functionality added for
the rados gateway. ClusterRole and ClusterRoleBinding added.
Rev. 0.3 checkdns is allowed to correct ceph.conf for RGW deployment.
Rev. 0.4 Added RoleBinding to the deployment-rgw.
Rev. 0.5 Remove _namespace-client-ceph-config-manager.sh.tpl and
the appropriate job, because of duplicated functionality.
Related configuration has been removed.
Rev. 0.6 RoleBinding logic has been changed to meet rules:
checkdns namespace - HAS ACCESS -> RGW namespace(s)
Change-Id: Ie0af212bdcbbc3aa53335689deed9b226e5d4d89
If the OnDelete pod restart strategy is used for the ceph-mon
daemonset, run a post-apply job to restart the ceph-mon pods one
at a time. Otherwise the mons could restart before the mgrs, which
can be problematic in some upgrade scenarios.
Change-Id: I57f87130e95088217c3cfe73512caaae41d3ef22
The metric ceph_pool_bytes_used has changed to ceph_pool_stored.
https: //tracker.ceph.com/issues/39932
Change-Id: Iab5cf2b318ce538e72b4592dedd8f0e489741797
This change moves the ceph-mgr deployment from the ceph-client
chart to the ceph-mon chart. Its purpose is to facilitate the
proper Ceph upgrade procedure, which prescribes restarting mgr
daemons before mon daemons.
There will be additional work required to implement the correct
daemon restart procedure for upgrades. This change only addresses
the move of the ceph-mgr deployment.
Change-Id: I3ac4a75f776760425c88a0ba1edae5fb339f128d
The following error is appearing when the bandit playbook is used:
bandit requires Python '>=3.7' but the running Python is 3.6.9
This change specifies bandit 1.7.1 in the playbook, which is
compatible with Python 3.5+
Change-Id: I3b43ed6de3a90af49cfc7124fdee542831f73f40
Currently, if a multi-node cluster is shut down unexpectedly,
RabbitMQ is not able to boot and sync with the other nodes.
The purpose of this change is to add the possibility to use the
rabbitmqctl force_boot command to recover RabbitMQ cluster from
an unexpected shut down.
Test plan:
PASS: Shutdown and start a multi-node RabbitMQ cluster
Regression:
PASS: OpenStack can be applied successfully
PASS: RabbitMQ nodes can join the RabbitMQ cluster
Story: 2009784
Task: 44290
Ref:
[0] https://www.rabbitmq.com/rabbitmqctl.8.html#force_boot
Signed-off-by: Maik Catrinque <maik.wandercatrinqueandrade@windriver.com>
Co-authored-by: Andrew Martins Carletti <Andrew.MartinsCarletti@windriver.com>
Change-Id: I56e966ea64e8881ba436213f0c9e1cbe547098e3
Instead of running the exporter as a seperate deployemnt that talks
to the service, which will NOT be reporting reliable information if
you have more than 1 replica of memcached, this patch insteads moves
things into a sidecar model that runs in the same pod and exposes
the service.
Change-Id: Ia4801b47f44df91db10886f7cb4e8e174557aded
Pick up the helm-toolkit DB backup enhancement in postgresql
to add capability to retry uploading backup to remote server.
Change-Id: I041d83211f08a8d0c9c22a66e16e6b7652bfc7d9
At the moment it is very difficult to pull images from a private
registry that hasn't been configured on Kubernetes nodes as there
is no way to specify imagePullSecrets on pods.
This change introduces a snippet that can return a set of image
pull secrets using either a default or a per pod value. It also
adds this new snippet to the manifests for standard job types.
Change-Id: I710e1feffdf837627b80bc14320751f743e048cb
The wait for misplaced objects during the ceph-osd post-apply job
was added to prevent I/O disruption in the case where misplaced
objects cause multiple replicas in common failure domains. This
concern is only valid before OSD restarts begin because OSD
failures during the restart process won't cause replicas that
violate replication rules to appear elsewhere.
This change keeps the wait for misplaced objects prior to beginning
OSD restarts and removes it during those restarts. The wait during
OSD restarts now only waits for degraded objects to be recovered
before proceeding to the next failure domain.
Change-Id: Ic82c67b43089c7a2b45995d1fd9c285d5c0e7cbc
* Add capability to retry uploading backup to remote server configured
number of times and delay the retires randomly between configured
minimum/maximum seconds.
* Enhanced error checking, logging and retrying logic.
Change-Id: Ida3649420bdd6d39ac6ba7412c8c7078a75e0a10
This patchset also refactor the handling of dashboards yaml
files so that multiple configmaps, grouped by functionality
will be created.
Change-Id: I9849e2a2744e1d2ae895d3e18647b9b3a1c38b12
We need flexibility to add securityContext to ks-user job at pod and containerlevel,
so that it can be executed without elevated privileges.
Change-Id: Ibd8abdc10906ca4648bfcaa91d0f122e56690606
In cert-manager v1 API, the private key size "keySize" was updated to "size"
under "privateKey".
Support of minor (less than v1) API version is also removed for certificates.
Change-Id: If3fa0e296b8a1c2ab473e67b24d4465fe42a5268
Since most of the charts in both openstack-helm and
this repo use helm-toolkit, changes in helm-toolkit
have the possibility of impacting charts in the
openstack-helm repo and will not be caught in testing
here.
This change adds a conditional linter to lint the
charts in the openstack-helm repo if any changes
to helm-toolkit are made.
Change-Id: I0f6a935eca53d966c01e0902e546ea132a636a9d
This reverts commit 5407b547bbb08397e41cceec4cf88d7ae9cbf9fc.
Reason for revert: This outputs duplicate securityContext entries,
breaking the yamllinter in osh. This needs a slight rework.
Change-Id: I0c892be5aba7ccd6e3c378e4e45a79d2df03c06a