This updates the Elasticsearch entrypoint override for the data
nodes to include a signal to kill the process after the trap to
drain each data node completes
Change-Id: Iccd4342fe16d06787cb24342d9a57e4de12e6980
This updates the Elasticsearch client and data pod dependencies
to allow for sequential bring up of the cluster components. As
we want the order to be master->client->data, we add the discovery
service endpoint as a dependency for the client pods and add both
the discovery and client service endpoints as dependencies for
the data pods
Change-Id: Iec6d6f259dc8b7b4f2309b492409cc0e5feab669
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the default fluentd image to use the fluentd image
built with the systemd input plugin from the openstack-helm-images
repository
Change-Id: I7c75cd19d62f3dbc3fa4708642119f1781e58677
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Elastic Curator cron job to include configuration
for successful and failed job history limits, similar to the other
cron jobs we deploy. This also moves the key for configuring the
cron schedule from under .Values.conf.curator to a new top level
jobs key to maintain consistency
This also fixes an indentation issue with the deployment overrides
for Curator as well as adds the overrides for the Armada job
Change-Id: I9c720df9677215bdd2bf18be77959bd5f671c0ca
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Elasticsearch chart to include a specific start
script for the Elasticsearch data nodes that includes a trap on
signals that removes a data node from allocation eligible nodes
before shutting down. This results in all shards being moved from
a node on shut down to alleviate issues with planned down nodes,
such as during upgrade scenarios
Change-Id: I22f4957f90e4113831a8ddf48691cb14f811c1e5
This fixes typos in the cluster wait script to ensure the messages
reflect the types of nodes being checked
Change-Id: I5964b5517b3099fbfe8d574b2ca869d366c9bb17
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This patchset aims to add HA Clustering support for Postgres. HA Clustering
provides automatic failover in the event of the database going down in addition
to keeping replicas of the database for rebuilding in the event of a node
going down. To achieve this clustering we use
[Patroni](https://github.com/zalando/patroni) which offers HA clustering
support for Postgres.
Patroni is a daemon that runs in the background and keeps track of which
node in your cluster is currently the leader node and routes all traffic
on the Postgresql endpoint to that node. If the leader node goes down,
Patroni holds an election to chose a new leader and updates the endpoint
to route traffic accordingly. All communication between nodes is done by
a Patroni created endpoint, seperate from the externally facing Postgres
endpoint.
Note that, although the postgresql helm chart can be upgraded from
non-patroni to patroni clustering, the previous `postgresql`
endpoints object (which is not directly managed by helm) must be
deleted via an out-of-band mechanism so that it may be replaced by the
patroni-managed endpoints. If Postgres itself is leveraged for the
deployment process, this must be done with careful timing. Note that
the old endpoints had a port named "db", and the new endpoints has
a port named "postgresql".
- Picking up patchset: https://review.openstack.org/#/c/591663
Co-authored-by: Tony Sorrentino <as1413@att.com>
Co-authored-by: Randeep Jalli <rj2083@att.com>
Co-authored-by: Pete Birley <pete@port.direct>
Co-authored-by: Matt McEuen <mm9745@att.com>
Change-Id: I721b745017dc1ea7ae05dfd9f8d5dd08d0965985
This adds required changes to the Fluentd chart to allow for
deploying Fluentd as either a deployment or a daemonset. This
follows the pattern laid out by the ingress chart. This also
updates the single and multinode jobs to deploy fluentd as both
a daemonset and a deployment for validation
Change-Id: I84353a2daa2ce56ff59882a8d33203286ed27e06
Signed-off-by: Steve Wilkerson <sw5822@att.com>
The wait for pods is not consistently used in the
openstack-support scenario.
This is a problem, as some helm charts deploys are
basically masking issues that can arise.
This should fix it.
Change-Id: Ib3e8f16bea701bf20375d4deec7c7869e7bf85c2
This patch set removes an unused import that is not python3 compatible.
Change-Id: I360989c8eb23065d8e655d4583eb97338244412d
Signed-off-by: Tin Lam <tin@irrational.io>
This patch set is based on [0] that also fixed up a handful of Bionic-
specific changes required for the gates to pass.
[0] https://review.openstack.org/#/c/649698/
Co-Authored-By: ghanshyam <gmann@ghanshyammann.com>
Change-Id: I217a27c53eec2a51ddbea7226a23042558c5946b
This begins to split the fluent-logging chart into two separate
charts, one for fluentbit and one for fluentd. This is to help
isolate each chart and its dependencies better, and to treat each
service as its own entity.
This also moves the job for creating Elasticsearch templates to
the Elasticsearch chart, as the elasticsearch chart should have
ownership of creating the templates for its indices.
This also performs some general cleanup of values keys that are
not currently used
Change-Id: I827277d5faa62b8b59c5960330703d23c297ca47
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This adds the helm-toolkit function for defining the update
strategy for the elasticsearch-data statefulset and sets the chart
default to RollingUpdate
Change-Id: Ia10ea7bf000474e597bdb36778118a96d85b93c1
This updates the ClusterRole object for fluentd by removing a
duplicate rules: key and also adds 'get' to the list of verbs for
the "" apiGroups (as it's required for the kubernetes metadata
plugin)
Change-Id: Ia901d9fe9a0784038f0f882297c64afcce58ca7e
This is to update correct user id in security context for
cephfs provisoner pod as there is no user with 99 exist
in the container.
Change-Id: I1bbe46df555b35b8afe636327fa83015fd784db0
This removes the utilities for generating the fluentd, fluentbit,
and parser configuration files from yaml and moves to instead
consume the configuration files as strings from the values.yaml.
This allows for easier creation and maintenance of configuration
files for fluentd and fluentbit, as the utilities became unwieldy
with complex configuration files.
This necessitated the removal of the core test executed by the
charts helm tests, but this would be required as we move to split
the charts regardless
Change-Id: Ied4a76bbdf58b54a6d702db04a7120b64f54dcac
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Elasticsearch chart to use the elasticsearch-s3
image built from the openstack-helm-images Dockerfile instead of
using the previous image from a personal repository
Change-Id: I4d6b18aea11920de33ce1f4b63d39c18cd2b98d3
This is to remove invalid key "userSecretName" for
cephfs storageclass as we are having toruble to provision
a pvc with cephfs storageclass with "userSecretName" key .
Failed to provision volume with StorageClass
"cephfs": invalid option "userSecretName"
Change-Id: Ide52987c9f8ef8fc2327bf30747395e70dc05f99
In an Edge environment without a distributed storage environment, it
should be able to store rabbitmq data in the local path as well.
This patch added an option to use it in a more diverse environment.
Change-Id: Ia3c0dfaa58c237e424197f1406bd66fb991bea18
Story: 2005753
Task: 33455
Changed use_local_path_for_single_pod to use_local_path_for_single_pod_cluster
in values.yaml. It was a bug
Change-Id: I88c3fe6c2bbab87baec3ec7d1d94501d6fd741eb
This fixes the elasticsearch-logging service by removing the
LoadBalancer type configuration from the service template. This
was mistakenly added in a previous change
Change-Id: Id2f866147c2dcccc10c83bd54094d54cf3bd227b
There is currently an issue with deploying single
pod mysql clusters in which restarting or killing
the pod will result in a crashloopbackoff.
The mysql data is indeed lost and the start script
(thinking the cluster was alive before
due to the grastate configmap) tries to restore
the cluster instead of bootstrapping it.
Due to this, if the mysql pod is killed or restarted
in the CI, we will lose all the mysql data, will not
recover, and this results in a broken environment.
When volume.use_local_path_for_single_pod.enabled value
is set to true, which we will apply on single node/single
pod testing, this patch will deploy a local volume
for mysql at the location specified under
volume.use_local_path_for_single_pod.host_path
The data will be kept intact in case
there is a pod restart, as it can read the data again,
and recover itself.
When it is false, which is the default for non-CI,
nothing changes, and an empty dir is used. This
data WILL be lost upon restart, so it is advised
to use volumes instead for production purposes,
by setting Values.volume.enabled to true.
task: 28729
Change-Id: I6ec0bd1087eb06b92ced7dc56ff5b6a156aad433