If grep does not find a match, it return 1 which fails the shell
script. Hence made it return true if no match is found.
Also, removed returning of error from the script becasue any failure
will cause the job to re-run which may re-renew certificates and
restart the pods again. And this can continue if the error persists.
Chaange-Id: I2a38b59789fd522e8163ff9b12ff847eb1fe2f3a
Change-Id: Ica456ef6c5bec2bd29f51aaeef7b5ce5e8681beb
Elasticsearch is TLS enabled. Prometheus-elasticsearch-exporter
needs to be configured to use cacert when communicating with Elasticsearch.
Change-Id: I4a87226fed541777df78733f3650363859ff01b8
The version of helm 2 that OSH has been using was older and seems
to have been removed from the googleapi repo that the jobs are
setup to use, this was causing job failures.
This change updates the version to the latest v2 release.
Change-Id: I675f539b24ea9c2355ac9eacc7dd8122c5236e5f
This chart creates a cronjob which monitors the expiry of the
certificates created by jetstack cert-manager. It rotates the
certificates and restarts the pods that mounts the certificate
secrets so that the new certificate can take effect.
Change-Id: I492b5f319cf0f2e7ccbbcf516953e17aafc1c59f
- As it will be a security violation to mount anything under /var
partition to pods , changing the mount propagation to HostToContainer
Change-Id: If7a27304507a9d1bcb9efcef4fc1146f77080a4f
This patch set adds a new Alertmanager dashboard to Grafana. Note
that a new configmap is created for this instead of using the
same configmap which includes all the dashboards. Using the same
configmap will eventually run into issue with configmap size limitation.
Change-Id: I10561c0b0b464c3b67d4a738f9f2cb70ef601b3d
This PS updates the mon-check reap-zombies python script to consider
the more recent Ceph changes, including the fact that there is now
a v1 and v2 backend. In addition, it executes the reap-zombies script
with the python3 binary, as the basic 'python' binary does not exist
in the container.
Change-Id: Id079671f03cc5ddbe694f2aa8c9d2480dc573983
This change updates the namespace-config chart to (optionally) create
RBAC rules allowing service accounts in the namespace 'use' access to an
existing Pod Security Policy in the cluster. The policy is specified as:
podSecurityPolicy:
existingPsp: name-of-existing-psp
This aligns with the PSP deprecation guidance provided to date [0],
which suggests easing the transition to the "PSP Replacement Policy" by
establishing the standard PSPs (Restricted, Baseline, and Privileged),
assigning a cluster-wide default, and binding more-permissive policies
as needed in certain namespaces.
[0] https://kubernetes.io/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/
Change-Id: I46da230abf822e0cc3553561fd779444439c34a7
Wherever possible, the ceph-provisioner containers need to run
with the least amount of privilege required. In some cases there
are privileges granted but are not needed. This patchset modifies
those container's security contexts to reduce them to only what
is needed.
Change-Id: I74bd31df4af5cacc26834e645b0816bf285e8428
Wherever possible, the ceph-osd containers need to run with the
least amount of privilege required. In some cases there are
privileges granted but are not needed. This patchset modifies
those container's security contexts to reduce them to only what
is needed.
Change-Id: I0d6633efae7452fee4ce98d3e7088a55123f0a78
This is to add check to find out empty ceph mon endpoint while
generating ceph etc configmap for clients.
Change-Id: I6579a268c5f4bc458120dda66667988e5a529ee9
This change adds /var/crash as a host-path volume mount for
ceph-osd pods in order to facilitate core dump capture when
ceph-osd daemons crash.
Change-Id: Ie517c64e08b11504f71d7d570394fbdb2ac8e54e
There is an additional error status 'Service Unavailable' which can
indicate the service is temporary unavailable. Adding that error
status to the retry list in case the issue is resolved during the
backup timeframe.
Change-Id: I9e2fc1a9b33dea3858de06b10d512da98a635015
This PS enables overriding liveness/readiness probes configurations
for libvirt pods via values.yaml. In addition, updating the values
for some of the fields of the probes as the default values seem to
be too aggresive.
Change-Id: I64033a1d67461851d8f2d86905ef7068c2ec43b6
Co-authored-by: Huy Tran <ht095u@att.com>
Change-Id: Ib10379829e2989d3de385ad6d1944565b2f9953f
This ps updates the following:
- Add preStop action to allow rabbitmq node a chance to more
graceful shutdown
- Add support for RABBITMQ_FEATURE_FLAG in preparation for
future 3.8.x upgrade.
Change-Id: I25d1e4fdb9dee370382e97a5a97b2b098f5ef11f
Job wait cluster was failing due to the field immutability which was
resulting in the manual delete of the job for every helm upgrade to be
successful. Reason being job being upgraded before the other manifest
that are required been updated. It can be avoided by using helm-hook
post-install and post-upgrade which will force the job manifest to be
applied only after all other manifest are applied. Hook annotation is
provided "5" so that the if other jobs are annotated, exporter job will
be last to created in case hooks are added to the other jobs in chart.
Also helm3_hook value is used for condition.
Change-Id: Ib83f1d4bef6300c2b76aa54f08927b74346184c7
exporter-jpb-create-user was failing due to the field immutability
which was resulting in the manual delete of the job for every helm
upgrade to be successful. Reason being job being upgraded before the
other manifest that are required been updated. It can be avoided by
using helm-hook post-install and post-upgrade which will force the
job manifest to be applied only after all other manifest are applied.
Hook annotation is provided "5" so that the if other jobs are annotated,
exporter job will be last to created.
helm3_hook value is used for the condition which will enable the disable
of the hook.
Change-Id: I2039abb5bad07a19fd09fc5e245485c3c772beca
This change configures Ceph daemon pods so that
/var/lib/ceph/crash maps to a hostPath location that persists
when the pod restarts. This will allow for post-mortem examination
of crash dumps to attempt to understand why daemons have crashed.
Change-Id: I53277848f79a405b0809e0e3f19d90bbb80f3df8
This patch removes the dependency on cfssl to generate certificates and
removes unused constructs in the script.
Change-Id: Ia933420157f456bf99a6ec5416e6dbb63bfa5258
Signed-off-by: Tin Lam <t@lam.wtf>
The broker attribute should use a comma separated list with the port
definition included
Example: kafka1:9092,kafka2:9092,kafka:9092
The kafka client will connect to the first available host this
will provide resiliency if a host is not available
Change-Id: I5f82e96f2aa274379b6d808291d4b5109709bf72
Some minor improvements are made in this patchset:
1) Move osd_disk_prechecks to the very beginning to make sure the
required variables are set before running the bulk of the script.
2) Specify variables in a more consistent manner for readability.
3) Remove variables from CLI commands that are not used/set.
Change-Id: I6167b277e111ed59ccf4415e7f7d178fe4338cbd
A deployment that specifies a placement target with "delete: true"
should delete that placement target if it exists. For a clean
deployment the expectation is that the placement target should be
created and immediately deleted; however, the check for existence
happens before its creation and the delete doesn't execute as a
result. This change adds a recheck for existence immediately after
creation to remedy that.
Change-Id: I26f7fa79c5c851070e94af758d0a0438aa7efa52