This updates the Grafana chart to support the definition of
multiple datasources. This moves to defining a template in the
chart's values.yaml file that allows for inline gotpl for
defining an arbitrary number of datasources. This also updates the
grafana dashboards to include a selector for the Prometheus
datasource to use via a drop down selector. This is vetted out in
the federated monitoring job
Change-Id: I55171fed5c2b343130d135d0b42bc96ff11c4712
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This makes the keystone-auth job nonvoting, until adequate work
can be done to help make the job more reliable. At the moment,
this job seems to be responsible for the majority of the gate job
failures due to what seems to be limitations with the single node
nodesets available
Change-Id: I08f1f10b79e9a5fd82ef7c6d887a03ccb55cceed
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This patch set add command to clean up a rally environment after a helm
test's execution is completed.
Change-Id: I652ee4930e7afb8b278250a0432086a2963a528c
Signed-off-by: Tin Lam <tin@irrational.io>
This patch set places logic to generate kubernetes egress network policy
rule based on the dependencies specified in values.yaml. This also sets
up the necessary default network policy for the OSH gate.
Change-Id: I1ac649cc9debb5d1f4ea0a32f506dcda4d8b8536
Signed-off-by: Tin Lam <tin@irrational.io>
This updates charts that consume images built from osh-images to
use tags other than the :latest tags. This will be followed up
with the definition of jobs to allow for vetting out of updated
images, as reliance on :latest tags assumes any change merged into
osh-images will result in functionally correct behavior (which has
shown to not be the case traditionally)
Change-Id: I181aa56ed187604dc7583d8081e53cc69eb27310
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This adds the experimental jobs back to osh-infra, as they were
erroneously disabled via comments in a previously merged change
Change-Id: Id92c24223f8c22f1a0ff82b62c222b2920ecd929
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This change replaces direct references to the exporter port
in values.yaml with calls to helm-toolkit lookup functions.
The referenced port number under the network key is removed,
as the helm-toolkit function will return the port number under
the endpoints key.
Change-Id: Ib6f533c49af5a88fca377920d28d5468d7387892
This updates the Grafana chart to support the definition of
arbitrary environment variables to support scenarios where
additional information may be required at runtime for things like
datasource and dashboard provisioning
Change-Id: I95e4abe9030116a440c6d78a1d14dbcaaf743b40
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Prometheus chart to support federation. This
moves to defining the Prometheus configuration file via a template
in the values.yaml file instead of through raw yaml. This allows
for overriding the chart's default configuration wholesale, as
this would be required for a hierarchical federated setup. This
also strips out all of the default rules defined in the chart for
the same reason. There are example rules defined for the various
aspects of OSH's infrastructure in the prometheus/values_overrides
directory that are executed as part of the normal CI jobs. This
also adds a nonvoting federated-monitoring job that vets out the
ability to federate prometheus in a hierarchical fashion with
extremely basic overrides
Change-Id: I0f121ad5e4f80be4c790dc869955c6b299ca9f26
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the gather-prom-metrics role to include gathering
metrics from the active ceph-mgr endpoint
Change-Id: Icb5d27b6a070e9065f6276725bf06dec7d2cbc0d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the podManagementPolicy to 'Parallel' for Prometheus
and Alertmanager, as there's no need to handle deploying these
two services in a sequential manner
Change-Id: I2f33b9651bed20c4cb2e0c477ae2227cbf9310cf
Signed-off-by: Steve Wilkerson <sw5822@att.com>
When the ingress pod (in routed mode, using a managed vip) moves from
one host to another, it is sometimes observed that: 1. the vip interface
is not removed on the original host, and 2. in some network topologies,
the switch fabric is unable to find the new pod.
This change updates the ingress deployment as follows:
Adds a 5s sleep before the shutdown of the ingress container in order to
allow the preStop action of the ingress-vip container to run completely.
Updates the start action of the ingress-vip-init container to check if
the vip is part of an existing connected subnet, and if so, sends a few
gratuitous ARP messages to let the switch fabric to build its ARP cache.
Change-Id: I784906865358566f42157dc2133569e4cb270cfa
ceph-disk has been deprecated and ceph-volume
is available from luminous release. uplifting
ceph-osd charts to use ceph-volume with support
of all below combinations
Filestore:
ceph-disk to ceph-volume
ceph-volume to ceph-volume
Bluestore: (including db, wal combinations)
ceph-disk to ceph-volume
ceph-volume to ceph-volume
support for different osds to run different stores
and upgrade with db, wal combinations
cross upgrade from store isn't supported
Story: ceph-volume-support
Signed-off-by: Kranthi Guttikonda <kranthi.guttikonda@att.com>
Co-Authored-By: Chinasubbareddy Mallavarapu <cr3938@att.com>
Change-Id: Id8b2e1bda0d35fef2cffed6a5ca5876f3888a1c7
Currently when updating configuration for mariadb, ingress pods also
are being restarted, however there were no reasons for this.
Change-Id: I398e20541a0e2337e9a5d100f3ef6ce4ad7d0284
This removes the elasticsearch-ldap.sh script from the single node
osh-infra-logging job, as this step does not provide any real
value and is tightly coupled to the elasticsearch version used.
This sort of validation should be reserved for smoke tests in
future helm tests for charts
Change-Id: I7ca4805a8809568cb09c8bab6c239c008528fd6a
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This is to create a different folder for ceph-disk based deplyoments
so that it will be easy to maintain when we introduce ceph-volume.
Separate folder for both the tools gives us flexibilty to develop or
fix the issues and commit the code to respective folders without breaking
other tool-based deployments.
Change-Id: Ib0099d292a8692dc6676eb5ed624d5d1ef677cfe
This updates the Prometheus version deployed by default from
2.3.2 to 2.12.0
Change-Id: Ic10e02a6b136a7f65fb686f5ef1adf1bcf6a9a9d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Grafana version deployed by default from 5.0.0 to
6.2.0
Change-Id: I39b5405cc3f3fe7754ed6544a8388ff912a4ef58
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This proposes adding a zookeeper chart to osh-infra that aligns
with the design patterns laid out by the other charts in osh-infra
and osh.
Change-Id: I25edc58fc951e7f81f7275ade6cf9c97e0afae02
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Co-Authored-By: Steven Fitzpatrick <steven.fitzpatrick@att.com>
This updates the ceph health check command in Nagios to use the
updated plugin that determines the active ceph-mgr instance
endpoint to use before querying for ceph's health. This results in
more robust and reliable reporting of ceph's overall health
Depends-On: https://review.opendev.org/#/c/693900/
Change-Id: I5eeb076e5af3c820dbdcc3cc321cefcb5f85ef8d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Trivial change. This patch set cleans up a python script.
- Move the comment to a helm-template comment so the python comments do
not get rendered by helm.
- Remove an unused python module.
Change-Id: Id287ddae8904d2cfa88725277bb97cf027a942c3
Signed-off-by: Tin Lam <tin@irrational.io>
This patch set will implement the grafana metrics related changes
required for kubernetes version upgrade to 1.16. Updates are mostly
specific to cadvisor metric labels. It is to make sure all
existing metrics are scraped and available in Prometheus so that
these can be consumed by Grafana & Nagios.
Change-Id: I74369ac49dd3f7d9f3682dd5318a3818a4d3f178