This patch set places logic to generate kubernetes egress network policy
rule based on the dependencies specified in values.yaml. This also sets
up the necessary default network policy for the OSH gate.
Change-Id: I1ac649cc9debb5d1f4ea0a32f506dcda4d8b8536
Signed-off-by: Tin Lam <tin@irrational.io>
This updates charts that consume images built from osh-images to
use tags other than the :latest tags. This will be followed up
with the definition of jobs to allow for vetting out of updated
images, as reliance on :latest tags assumes any change merged into
osh-images will result in functionally correct behavior (which has
shown to not be the case traditionally)
Change-Id: I181aa56ed187604dc7583d8081e53cc69eb27310
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This adds the experimental jobs back to osh-infra, as they were
erroneously disabled via comments in a previously merged change
Change-Id: Id92c24223f8c22f1a0ff82b62c222b2920ecd929
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This change replaces direct references to the exporter port
in values.yaml with calls to helm-toolkit lookup functions.
The referenced port number under the network key is removed,
as the helm-toolkit function will return the port number under
the endpoints key.
Change-Id: Ib6f533c49af5a88fca377920d28d5468d7387892
This updates the Grafana chart to support the definition of
arbitrary environment variables to support scenarios where
additional information may be required at runtime for things like
datasource and dashboard provisioning
Change-Id: I95e4abe9030116a440c6d78a1d14dbcaaf743b40
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Prometheus chart to support federation. This
moves to defining the Prometheus configuration file via a template
in the values.yaml file instead of through raw yaml. This allows
for overriding the chart's default configuration wholesale, as
this would be required for a hierarchical federated setup. This
also strips out all of the default rules defined in the chart for
the same reason. There are example rules defined for the various
aspects of OSH's infrastructure in the prometheus/values_overrides
directory that are executed as part of the normal CI jobs. This
also adds a nonvoting federated-monitoring job that vets out the
ability to federate prometheus in a hierarchical fashion with
extremely basic overrides
Change-Id: I0f121ad5e4f80be4c790dc869955c6b299ca9f26
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the gather-prom-metrics role to include gathering
metrics from the active ceph-mgr endpoint
Change-Id: Icb5d27b6a070e9065f6276725bf06dec7d2cbc0d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the podManagementPolicy to 'Parallel' for Prometheus
and Alertmanager, as there's no need to handle deploying these
two services in a sequential manner
Change-Id: I2f33b9651bed20c4cb2e0c477ae2227cbf9310cf
Signed-off-by: Steve Wilkerson <sw5822@att.com>
When the ingress pod (in routed mode, using a managed vip) moves from
one host to another, it is sometimes observed that: 1. the vip interface
is not removed on the original host, and 2. in some network topologies,
the switch fabric is unable to find the new pod.
This change updates the ingress deployment as follows:
Adds a 5s sleep before the shutdown of the ingress container in order to
allow the preStop action of the ingress-vip container to run completely.
Updates the start action of the ingress-vip-init container to check if
the vip is part of an existing connected subnet, and if so, sends a few
gratuitous ARP messages to let the switch fabric to build its ARP cache.
Change-Id: I784906865358566f42157dc2133569e4cb270cfa
ceph-disk has been deprecated and ceph-volume
is available from luminous release. uplifting
ceph-osd charts to use ceph-volume with support
of all below combinations
Filestore:
ceph-disk to ceph-volume
ceph-volume to ceph-volume
Bluestore: (including db, wal combinations)
ceph-disk to ceph-volume
ceph-volume to ceph-volume
support for different osds to run different stores
and upgrade with db, wal combinations
cross upgrade from store isn't supported
Story: ceph-volume-support
Signed-off-by: Kranthi Guttikonda <kranthi.guttikonda@att.com>
Co-Authored-By: Chinasubbareddy Mallavarapu <cr3938@att.com>
Change-Id: Id8b2e1bda0d35fef2cffed6a5ca5876f3888a1c7
Currently when updating configuration for mariadb, ingress pods also
are being restarted, however there were no reasons for this.
Change-Id: I398e20541a0e2337e9a5d100f3ef6ce4ad7d0284
This removes the elasticsearch-ldap.sh script from the single node
osh-infra-logging job, as this step does not provide any real
value and is tightly coupled to the elasticsearch version used.
This sort of validation should be reserved for smoke tests in
future helm tests for charts
Change-Id: I7ca4805a8809568cb09c8bab6c239c008528fd6a
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This is to create a different folder for ceph-disk based deplyoments
so that it will be easy to maintain when we introduce ceph-volume.
Separate folder for both the tools gives us flexibilty to develop or
fix the issues and commit the code to respective folders without breaking
other tool-based deployments.
Change-Id: Ib0099d292a8692dc6676eb5ed624d5d1ef677cfe
This updates the Prometheus version deployed by default from
2.3.2 to 2.12.0
Change-Id: Ic10e02a6b136a7f65fb686f5ef1adf1bcf6a9a9d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the Grafana version deployed by default from 5.0.0 to
6.2.0
Change-Id: I39b5405cc3f3fe7754ed6544a8388ff912a4ef58
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This proposes adding a zookeeper chart to osh-infra that aligns
with the design patterns laid out by the other charts in osh-infra
and osh.
Change-Id: I25edc58fc951e7f81f7275ade6cf9c97e0afae02
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Co-Authored-By: Steven Fitzpatrick <steven.fitzpatrick@att.com>
This updates the ceph health check command in Nagios to use the
updated plugin that determines the active ceph-mgr instance
endpoint to use before querying for ceph's health. This results in
more robust and reliable reporting of ceph's overall health
Depends-On: https://review.opendev.org/#/c/693900/
Change-Id: I5eeb076e5af3c820dbdcc3cc321cefcb5f85ef8d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Trivial change. This patch set cleans up a python script.
- Move the comment to a helm-template comment so the python comments do
not get rendered by helm.
- Remove an unused python module.
Change-Id: Id287ddae8904d2cfa88725277bb97cf027a942c3
Signed-off-by: Tin Lam <tin@irrational.io>
This patch set will implement the grafana metrics related changes
required for kubernetes version upgrade to 1.16. Updates are mostly
specific to cadvisor metric labels. It is to make sure all
existing metrics are scraped and available in Prometheus so that
these can be consumed by Grafana & Nagios.
Change-Id: I74369ac49dd3f7d9f3682dd5318a3818a4d3f178
AppArmor annotations require the container name to be applied properly.
Before this change, when overrides are not used, the container name is
ceph-osd-default. When overrides are used, the container name is of the
form ceph-osd-HOSTNAME-SHA, but with an identical HOSTNAME and SHA for
all the daemonsets. However, it is not possible to predict this value,
and as a result, the AppArmor profiles are not applied.
This change removes the customization of the container name, and sets
it to ceph-osd-default, allowing AppArmor annotations to be consistently
applied using:
pod:
mandatory_access_control:
type: apparmor
ceph-osd-default:
ceph-osd-default: localhost/profilename
Change-Id: I8b6eda00f77ec7393a4311309f3ff76908d06ae6
The patch adds Network Policy ingress rules for RabbitMQ
and Prometheus RabbitMQ exporter.
It also fixes name generation for network policies,
to make sure they do not contain a prohibited '_' symbol,
which may appear in some label names.
Change-Id: I9821983b61d90e73e62c5ac669eefeb4ba9999d2
It was observed in some charts' values.yaml that the values defining
lifecycle upgrade parameters were incorrectly placed.
This change aims to correct these instances by adding a deployment-
type subkey corresponding with the deployment types identified in
the chart's templates dir, and indenting the values appropriately.
Change-Id: Id5437b1eeaf6e71472520f1fee91028c9b6bfdd3
This updates the ingress objects to move them back to the
extensions API. While 1.16 moves them under the networking
api, they're still rendered and deployed as extensions/ objects.
This move prevents issues from arising where older versions of
kubernetes might still be deployed during an upgrade, as the
move to the networking API is nonfunctional at this time
Change-Id: I814bbc833b5b9f79f34aefc60b9c1f9890bca826
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Pods for some of the CronJobs do not have correct
application and component labels applied, they are
unable to start if Network Policies are enabled.
Related-Change: Ie4eed0e9829419b4b2e40e9b712b73a86d6fc3d2
Change-Id: Ieee874bf837c7947e3681e0447d150174c99d880