This updates the Nagios chart to use the helm-toolkit template
renderer snippet for generating the Nagios configuration files.
This was done to make the exposure of the configuration files
simpler for those who are more familiar with traditional Nagios
configuration files, as well as allowing for values overrides for
adding custom host names or custom object definitions to nagios
objects (as Nagios doesn't easily allow for this via environment
accessible macros).
Change-Id: I84d5c83d84d6438af5f3ab57997e80e8b1fc8312
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This is to update logic for pool min_size parameter as this is
not getting changed when replication changes from intilization.
Change-Id: I30f99aaf92c3dc83afce10534b1d2ac9402b7fa7
* Update the version of nfs-provisioner to the latest image.
* Allow nfs-provisioner user to manage endpoints, this is
required because the newest version uses `leaderelection`
package from k8s, this packages leverages labels on endpoints
to track leader election information.
Change-Id: Ie2727bd6bcc26e57875bea38f0f665d4a0e85bd7
This adds selenium tests for the grafana chart to the helm test
pod to help ensure the Grafana deployment is functional and
accessible
Change-Id: Idc8d97e5111628d1ed4f25145086d54c5e0136e7
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This changes the user from root to the nobody user instead
in ceph-osd chart wherever needed
This also permits read-only filesystems to back the containers by setting
the default to true
Change-Id: Ia777bf212e0e3414909c70a4bd839e12d4919bb2
This patch simplifies the resource snippet in helm-toolkit to allow for
specifying hugepage limits. Specifically, this patch replaces the
individual checks for specific system resources (e.g., cpu, memory) by
just copying over the entire resource component as defined in a
values.yaml or a corresponding overwrite.
This change is a prerequisite for enable hugepage handling in other
charts such as openvswitch or postgresql.
Change-Id: I786ff6c7aa5fb6b08b54d2e21878551e5e1e3818
This updates the etcd chart to include the pod
security context on the pod template.
This also adds the container security context to set
readOnlyRootFilesystem to true
Change-Id: I9bf05ab5c21f9afbe269e1566cfecd20b3c086c0
This updates the apparmor job to account for the splitting of the
fluent-logging chart, as it was missed during that change. Now,
the apparmor job will deploy fluentbit as well as fluentd deployed
as a daemonset running as a collecting agent
Change-Id: Iefa50f474b57a10c5e7e5a9032c7b23d26d97640
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the cluster-wait job script to include a sleep for
when no nodes of a given type are detected. This check was
previously executed only when a node count of (0 < x < expected)
was detected. This update reduces the number of queries executed
against the Elasticsearch http endpoint
Change-Id: I15cb39250a5ab9a7f6df0d62c35289a55e109dbd
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the default fluentd configuration to include
recommended settings for preventing the elasticsearch plugin from
reloading the connection after 10000 requests (default for the
ruby gem). This also updates the configuration overrides for the
fluentd-daemonset deployment to provide input parity with the
default fluentbit configuration by adding inputs for the docker
and kubelet systemd units, inputs for ceph, libvirt, kernel logs,
and auth logs on the host. Finally, this updates the fluentd
template to include environment variables for the host name and
the fluentd pod name so they can be added to logged events through
fluentd filter plugins
Change-Id: I21f7a89a325c44f8b058ff01a20191bea1a210b4
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This removes ReadonlyRootfs from
Elasticsearch data pods as this is
required in order for the data pods
to recover from outages
Change-Id: I603d3a25b6580eab20e2b20e1b1cd0cf740c7ab2
this updates the Elasticsearch cluster wait and snapshot repo jobs
to include values overrides for the job backoff limits and the
active deadline seconds field. This allows for tweaking beyond the
standard defaults for kubernetes jobs
Change-Id: I1f95a635ab4dfdb3718d5d4fa668c64a9095e899
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This moves Fluentd to use the helm-toolkit endpoint lookup for
using the fqdn for the Elasticsearch hostname instead of the
standard short host name
Change-Id: Ibe640979002331693f0a9b6155c9014572294664
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Elasticsearch entrypoint override for the data
nodes to include a signal to kill the process after the trap to
drain each data node completes
Change-Id: Iccd4342fe16d06787cb24342d9a57e4de12e6980
This updates the Elasticsearch client and data pod dependencies
to allow for sequential bring up of the cluster components. As
we want the order to be master->client->data, we add the discovery
service endpoint as a dependency for the client pods and add both
the discovery and client service endpoints as dependencies for
the data pods
Change-Id: Iec6d6f259dc8b7b4f2309b492409cc0e5feab669
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the default fluentd image to use the fluentd image
built with the systemd input plugin from the openstack-helm-images
repository
Change-Id: I7c75cd19d62f3dbc3fa4708642119f1781e58677
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Elastic Curator cron job to include configuration
for successful and failed job history limits, similar to the other
cron jobs we deploy. This also moves the key for configuring the
cron schedule from under .Values.conf.curator to a new top level
jobs key to maintain consistency
This also fixes an indentation issue with the deployment overrides
for Curator as well as adds the overrides for the Armada job
Change-Id: I9c720df9677215bdd2bf18be77959bd5f671c0ca
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Elasticsearch chart to include a specific start
script for the Elasticsearch data nodes that includes a trap on
signals that removes a data node from allocation eligible nodes
before shutting down. This results in all shards being moved from
a node on shut down to alleviate issues with planned down nodes,
such as during upgrade scenarios
Change-Id: I22f4957f90e4113831a8ddf48691cb14f811c1e5
This fixes typos in the cluster wait script to ensure the messages
reflect the types of nodes being checked
Change-Id: I5964b5517b3099fbfe8d574b2ca869d366c9bb17
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This patchset aims to add HA Clustering support for Postgres. HA Clustering
provides automatic failover in the event of the database going down in addition
to keeping replicas of the database for rebuilding in the event of a node
going down. To achieve this clustering we use
[Patroni](https://github.com/zalando/patroni) which offers HA clustering
support for Postgres.
Patroni is a daemon that runs in the background and keeps track of which
node in your cluster is currently the leader node and routes all traffic
on the Postgresql endpoint to that node. If the leader node goes down,
Patroni holds an election to chose a new leader and updates the endpoint
to route traffic accordingly. All communication between nodes is done by
a Patroni created endpoint, seperate from the externally facing Postgres
endpoint.
Note that, although the postgresql helm chart can be upgraded from
non-patroni to patroni clustering, the previous `postgresql`
endpoints object (which is not directly managed by helm) must be
deleted via an out-of-band mechanism so that it may be replaced by the
patroni-managed endpoints. If Postgres itself is leveraged for the
deployment process, this must be done with careful timing. Note that
the old endpoints had a port named "db", and the new endpoints has
a port named "postgresql".
- Picking up patchset: https://review.openstack.org/#/c/591663
Co-authored-by: Tony Sorrentino <as1413@att.com>
Co-authored-by: Randeep Jalli <rj2083@att.com>
Co-authored-by: Pete Birley <pete@port.direct>
Co-authored-by: Matt McEuen <mm9745@att.com>
Change-Id: I721b745017dc1ea7ae05dfd9f8d5dd08d0965985
This adds required changes to the Fluentd chart to allow for
deploying Fluentd as either a deployment or a daemonset. This
follows the pattern laid out by the ingress chart. This also
updates the single and multinode jobs to deploy fluentd as both
a daemonset and a deployment for validation
Change-Id: I84353a2daa2ce56ff59882a8d33203286ed27e06
Signed-off-by: Steve Wilkerson <sw5822@att.com>
The wait for pods is not consistently used in the
openstack-support scenario.
This is a problem, as some helm charts deploys are
basically masking issues that can arise.
This should fix it.
Change-Id: Ib3e8f16bea701bf20375d4deec7c7869e7bf85c2
This patch set removes an unused import that is not python3 compatible.
Change-Id: I360989c8eb23065d8e655d4583eb97338244412d
Signed-off-by: Tin Lam <tin@irrational.io>
This patch set is based on [0] that also fixed up a handful of Bionic-
specific changes required for the gates to pass.
[0] https://review.openstack.org/#/c/649698/
Co-Authored-By: ghanshyam <gmann@ghanshyammann.com>
Change-Id: I217a27c53eec2a51ddbea7226a23042558c5946b
This begins to split the fluent-logging chart into two separate
charts, one for fluentbit and one for fluentd. This is to help
isolate each chart and its dependencies better, and to treat each
service as its own entity.
This also moves the job for creating Elasticsearch templates to
the Elasticsearch chart, as the elasticsearch chart should have
ownership of creating the templates for its indices.
This also performs some general cleanup of values keys that are
not currently used
Change-Id: I827277d5faa62b8b59c5960330703d23c297ca47
Signed-off-by: Steve Wilkerson <sw5822@att.com>