50 Commits

Author SHA1 Message Date
Pete Birley
0bf3674539 Revert "Add Egress Helm-toolkit function & enforce the nework policy at OSH-INFRA"
This reverts commit 8d33a2911cda0c9e88406b9eeacbd8dfa70286f2.

Change-Id: Ic861b9bf9b337449b47a3558da8355e7a5bcacee
2018-12-16 04:21:46 +00:00
Mike Pham
8d33a2911c Add Egress Helm-toolkit function & enforce the nework policy at OSH-INFRA
This PS implements the helm toolkit function to generate the
Egress in kubernetes network policy manifest based on overrideable values.
It also enbale the K8s network policy at Osh-infra gate.

Change-Id: Icbe2a18c98dba795d15398dcdcac64228f6a7b4c
2018-12-14 16:32:40 -05:00
Zuul
b591e0754a Merge "Add Nagios Elasticsearch Query Command" 2018-12-06 20:50:28 +00:00
Huang, Scott (sh2725)
bd05126309 Add Nagios Elasticsearch Query Command
Change-Id: I74a965a5397101793cae71228a6a5bd442bf9f5a
2018-12-03 09:09:03 -05:00
Steve Wilkerson
439079693d Nagios: Update image tag
This updates the Nagios image tag to include the updated plugin
for querying Elasticsearch for alerting on logged events

Change-Id: Idd61d82463b79baab0e94c20b32da1dc6a8b3634
2018-11-26 08:29:22 -06:00
Steve Wilkerson
dfb4654fba Nagios: Configuration updates
This moves to update the host used for the ceph health checks, as
we should be checking the ceph-mgr service directly for ceph
metrics instead of trying to curl the host directly.

This also changes the ceph_health_check to use the base-os
hostgroup instead of the placeholder ceph-mgr host group, as we're
just executing a simple check against the ceph-mgr service.

This also adds default configuration values for the
max_concurrent_checks (60) and check_workers (4) values instead
of leaving them at the defaults Nagios uses (0 and # cores,
respectively)

Change-Id: Ib4072fcd545d8c05d5e9e4a93085a8330be6dfe0
2018-11-09 13:28:50 -06:00
Steve Wilkerson
325b3cea4d Nagios: Update host check mechanism
This updates the Nagios image to use a tag that includes a fix for
the service discovery mechanism used for updating host checks.
After moving the Nagios chart to either run in shared or host PID
namespaces, the service discovery mechanism no longer worked due
to the plugin attempting to restart PID 1 instead of determining
the appropriate PID to restart.

For reference, see:
https://review.gerrithub.io/#/c/att-comdev/nagios/+/432205/

Change-Id: Ie01c3a93dd109a9dc99cfac5d27991583546605a
2018-11-09 09:12:16 -06:00
Zuul
b55e9b10a7 Merge "Nagios: Add session affinity to ingress" 2018-11-09 04:45:36 +00:00
Steve Wilkerson
2c6aa8ad1b Nagios: Add session affinity to ingress
This adds session affinity to Nagios's ingress. This allows for
the use of cookies for Nagios's session affinity

Change-Id: I6054a92f644dc533dd06d35a2541fb44d46cba88
2018-11-09 02:07:39 +00:00
Steve Wilkerson
ba22b0e726 Nagios: Update ceph_health check
The ceph_health check in Nagios incorrectly sets the warning and
error level to 0. The ceph_health_status metric's value of 0
indicates the cluster is healthy, while 1 indicates a warning and
2 indicates an error state. The Nagios check for ceph_health is
updated to reflect these values

Change-Id: Iffe80f1c34f6edee6370dd7e707e5f55f83f1ec1
2018-11-06 14:51:40 -06:00
Steve Wilkerson
69196031cd Nagios: Ensure processes are reaped
This moves Nagios to run as child processes of either
the pause container or use the hosts init system (for k8s <1.10)
to prevent defunct process sprawl

Change-Id: I6a93d446577674b0b012f9567d5e6a5794ebc44b
2018-11-02 08:12:24 -05:00
Steve Wilkerson
8d6cfd72d0 Nagios: Remove Nagios log monitors
This removes the checks for Nagios to query Elasticsearch for
logged events. The current plugin in the image is resulting in
unstable behavior, and should be removed until this plugins been
improved

Change-Id: If1bdd954956f063ac1eebbb94d1128df8b8d2695
2018-10-25 05:21:22 +00:00
Huang, Scott (sh2725)
b99d39dd95 [467551] Mount Nagios Logfile
Mount Nagios logfile to host to enable log streaming to elasticsearch

Change-Id: I297f61067c0ff3e870e14b124a5c6fdd49e12b01
2018-10-21 15:37:40 +00:00
Zuul
1f9c8d7f42 Merge "Nagios: Update image with Elasticsearch plugin headers" 2018-10-15 17:58:17 +00:00
Steve Wilkerson
19248c11e9 Nagios: Update image with Elasticsearch plugin headers
This updates the Nagios image to include an update to the
Elasticsearch plugin that adds the appropriate headers to the
request sent to Elasticsearch. As Elasticsearch >=6.0 no longer
tries to determine the request data type, we need to explicitly
tell Elasticsearch the request body is JSON. Since we use
Elasticsearch 5.6.4 as default, this change will make the
deprecation warnings for the 6.0 breaking change go away.

Change-Id: I0dbd8859ca8d0bd0893832b4edd92742e575598b
2018-10-15 14:20:22 +00:00
Tin Lam
92e68d33ea Add network policy toolkit function
This patch set implements the helm toolkit function to generate a
kubernetes network policy manifest based on overrideable values.
This also adds a chart that shuts down all the ingress and egress
traffics in the namespace. This can be used to ensure the
whitelisted network policy works as intended.

Additionally, implementation is done for some infrastructure charts.

Change-Id: I78e87ef3276e948ae4dd2eb462b4b8012251c8c8
Co-Authored-By: Mike Pham <tp6510@att.com>
Signed-off-by: Tin Lam <tin@irrational.io>
2018-10-15 13:50:50 +00:00
rakesh-patnaik
db0d653b4d Monitor postgresql, Openstack virt resources, api, logs, pod and nodes status
Fixing opebstack API monitors

Adding additional neutron services monitors
Adding new Pod CrashLoopBaackOff status check
Adding new Host readiness check

Updated the nagios image reference(https://review.gerrithub.io/c/att-comdev/nagios/+/420590 - Pending)

This updated image provides a mechanism for querying Elasticsearch
with the goal of triggering alerts based on specified applications
and log levels.

Finally, this moves the endpoints resulting from the authenticated
endpoint lookups required for Nagios to the nagios secret instead
of handled via plain text environment variables

Change-Id: I517d8e6e6e8fa1d359382be8a131a8e45bf243e2
2018-09-21 08:22:13 +00:00
Pete Birley
bb3ff98d53 Add release uuid to pods and rc objects
This PS adds the ability to attach a release uuid to pods and rc
objects as desired. A follow up ps will add the ability to add arbitary
annotations to the same objects.

Change-Id: Iceedba457a03387f6fc44eb763a00fd57f9d84a5
Signed-off-by: Pete Birley <pete@port.direct>
2018-09-13 05:35:35 +00:00
Scott Huang
bc54e72fd3 Monitor Cinder API and Scheduler
Change-Id: I159facb491d9a722d8c067ead25c470f00b83939
2018-09-07 15:12:32 +00:00
Zuul
86d67cede3 Merge "Nagios: Use public endpoint for ldap" 2018-08-31 22:00:37 +00:00
Pete Birley
1d8caeede6 Nagios: Use public endpoint for ldap
This PS updates the nagios chart to use the public endpoint for
ldap in the apache config.

Change-Id: Ia7c1881a15fda3100fb006e9cf1d06d22dcd6a8d
Signed-off-by: Pete Birley <pete@port.direct>
2018-08-30 13:47:37 -05:00
Steve Wilkerson
9a311475ba Charts: Use secrets for configs in chart
This updates the osh-infra charts to use a secret for their
configuration files instead of a configmap, allowing for the
storage of sensitive information

Change-Id: Ia32587162288df0b297c45fd43b55cef381cb064
2018-08-24 15:56:53 -05:00
Steve Wilkerson
8652e14acb Add auth for prometheus
This adds authentication to Prometheus with an apache reverse
proxy, similar to elasticsearch, kibana and nagios. This adds an
admin user and password via htpasswd along with adding ldap
support.

This required modifying the grafana chart to configure the
prometheus datasource's basic auth credentials in the data sources
provisioning configuration file by checking whether basic auth is
enabled and injecting the username/password defined in the
corresponding endpoint definition.

This also modifies the nagios chart to use the authenticated
endpoint for prometheus, which is required for nagios to
successfully query the prometheus endpoint for its service
checking mechanism

Change-Id: Ia4ccc3c44a89b2c56594be1f4cc28ac07169bf8c
2018-08-08 18:49:45 +00:00
Seungkyu Ahn
a430533e6a Quoting node_select_value in Ingress Controller
In most cases, the ingress controller's nodeSelector key and value
are "node-role.kubernetes.io/ingress" and "true".
Using quote to treat the nodeSelector value as a string.

Change-Id: Ie1745629b90795e4d888d85f35565e6d6350e09b
2018-08-01 02:39:05 +00:00
Steve Wilkerson
6f6c6b8b99 Nagios/Kibana: Update configmap annotations
This changes the ordering of the configmap annotations for kibana,
as older versions of helm require the configmap with the values
template definition for the apache proxy to be listed last. This
was addressed in the elasticsearch-client template but missed in
kibana.

This also adds the configmap hash annotations to the nagios chart
as they were previously missing. It also places them in the
correct order as above

Change-Id: I13befe8684d975f310f2723c5172b8a0f9f365d6
2018-07-30 12:33:17 -05:00
Steve Wilkerson
4f78e1f6fc Drive apache proxy configuration via values templates
This proposes defining the apache proxy hosts entirely via values
templates. While complicated on its face, this gives flexibility
by allowing the ability to define the desired authentication
mechanism via values templates. These options can range from
using http basic auth for development purposes to defining more
complex ldap configurations without a need to modify the chart
directly

Change-Id: Ief1b6890444ff90cc9c0ca872087af74836c0771
Signed-off-by: Pete Birley <pete@port.direct>
2018-07-30 07:52:26 -05:00
Steve Wilkerson
7ea9a075ba Nagios: Update image reference to include discovery fix
This updates the Nagios image tag to include a version that fixes
the service discovery bug that resulted in duplicate host group
entries. The duplicate host group entries would prevent Nagios
from restarting, resulting in the service never coming back up
when duplicate host groups were identified and added

Change-Id: I555c525e47deffd95eeb5a7276c00cf044e61e3a
2018-07-10 14:40:55 -05:00
Steve Wilkerson
c26a1b53f6 Update TLS secret templates, remove nagios readiness probe
This updates the TLS secret templates to include the backend
service in the dict supplied to the manifest template, as it is
required for the TLS secret to render correctly.

This also removes the readiness probe from the nagios container in
the deployment for the nagios chart, as it wasn't functioning as
intended due to the port not being available for the probe

Change-Id: Iabcfd40c74938e0497d08ffeeebc98ab722fa660
2018-06-27 18:56:45 -05:00
Steve Wilkerson
b823954787 Ingress: Add initial TLS Support for osh-infra public endpoints
Adds support for TLS on overriden fqdns for public endpoints for
the services that have them in openstack-helm-infra. Currently this
implementation is limited, in that it does not provide support for
dynamically loading CAs into the containers, or specifying them manually
via configuration. As a result only well known or CA's added manually
to containers will be recognised.

Change-Id: I4ab4bbe24b6544b64cd365467e8efb2a421ac3f4
2018-06-26 14:47:19 -05:00
Steve Wilkerson
cb7bf2c0b3 Add missing readiness probes to openstack-helm-infra charts
This adds missing readiness probes to the following charts in
openstack-helm-infra: elasticsearch, fluent-logging, kibana,
nagios, prometheus-kube-state-metrics, prometheus-node-exporter,
and prometheus-openstack-exporter

Change-Id: I6a2635b08667c31eadb1b05ba848c658935a17e5
2018-06-26 12:25:36 +00:00
Steve Wilkerson
2dd5bf0594 Update ordering of auth providers in apache reverse proxy
This updates the ordering of the basic auth providers in the
elasticsearch and nagios chart to check the file provider first
before going out to check the configured ldap server.

Change-Id: I47ff8a1c7b2cefa8425914c5d4d7a76aa8d43216
Signed-off-by: Steve Wilkerson <wilkers.steve@gmail.com>
2018-06-25 12:43:06 -05:00
Zuul
1051065c2c Merge "Daemonsets: Use current kubernetes daemonset api version" 2018-06-14 16:24:33 +00:00
Zuul
0c9eae2d84 Merge "Nagios: update functions to live in correct locations" 2018-06-14 00:55:48 +00:00
Pete Birley
fa629cdbbd Daemonsets: Use current kubernetes daemonset api version
This PS moves to use the current ga version for kubernetes daemonsets,
additionally any remaining deployments that were using the
`extensions/v1beta1` have been updated to `apps/v1`.

Story: 2002205
Task: 21735

Change-Id: If9703162dc472af1e6096bf2b9062802fd5ce8ab
Signed-off-by: Pete Birley <pete@port.direct>
2018-06-13 21:53:18 +00:00
Steve Wilkerson
561780f347 PVC monitoring: Add alerting rules and service check for PVCs
This adds a basic check for capacity utilization for persistent
volume claims. To accomplish this, it adds a basic alerting rule
to prometheus that triggers after a persistent volume's usage
exceeds 80%, and triggers 5 minutes after that state has been
reached.  In addition, there is a service check added to the
nagios chart that will query Prometheus to check if the alarm
for that threshhold is firing for any of the volume claims.

Change-Id: I862c860ac479a715733202f679bb151885d7aa7c
2018-06-12 14:28:24 +00:00
Pete Birley
c48e47b47a Nagios: update functions to live in correct locations
This PS simply moves functions within the chart to their correct location.

Change-Id: Ia3d693713903d226a864dcdcf9884dee67f07d2b
Signed-off-by: Pete Birley <pete@port.direct>
2018-06-11 22:14:44 -05:00
Steve Wilkerson
c7d0317768 Add nagios cgi.cfg file control to values.yaml
This adds the ability to drive the CGI configuration for
nagios via values, similar to the other nagios configuration
entities

Change-Id: I8e9de21d141e0a87cdda11c4a778abec210277f3
2018-05-24 11:24:37 -07:00
Rakesh Patnaik
52c980b10c Prometheus alerts, nagios defn - rabbitmq,mariadb,ES
Change-Id: I71bc9f42aebc268ad2383a5a36a3405fc47c6c9e
2018-05-20 15:16:57 +00:00
Rakesh Patnaik
69cd66b7c9 Nagios notificiation on alerts and ceph monitoring
Change-Id: I782f54b5ad8159e7a4375d336a42524f380e65d2
2018-05-20 15:16:42 +00:00
Steve Wilkerson
db89ab8204 Add ldap support to nagios
This adds an apache reverse proxy to the nagios chart, similar
to elasticsearch and kibana. It also adds authentication to
nagios via ldap

Change-Id: I7b17703b5d4c1e041691ffceb984a9f5951cbeb9
2018-05-15 09:21:18 -05:00
Zuul
cc06b57b42 Merge "Nagios chart modifications to use prometheus alert metric for monitoring" 2018-04-22 23:25:24 +00:00
Rakesh Patnaik
adab0e1e30 Nagios chart modifications to use prometheus alert metric for monitoring
Change-Id: I6bb3c7176a725d8f26f3c11ebfb1f6d1d430ab96
2018-04-19 10:55:44 -05:00
Steve Wilkerson
e166432a98 Add manifest for image_repo_sync job
This ps proposes adding a common template for the image_repo_sync
jobs for consumption by the charts

Change-Id: I48476d1e4fd94bd1b08b13b46983e3d999f8d8ca
2018-04-19 14:10:08 +00:00
Zuul
49e9084679 Merge "OSH-Infra: Update labels for chart components" 2018-04-18 18:47:08 +00:00
Zuul
626b94e0c8 Merge "Helm-Toolkit: Kubernetes Entrypoint, simplify image dependencies" 2018-04-17 15:11:00 +00:00
Steve Wilkerson
7757400edc OSH-infra: move charts to use ingress manifest in htk
This moves all relevant charts in osh-infra to use the htk manifest
template for ingresses, bringing them in line with the charts in
openstack-helm

Change-Id: Ic9c3cc6f0051fa66b6f88ec2b2725698b36ce824
2018-04-13 15:41:12 -05:00
Steve Wilkerson
aaffc4caf0 OSH-Infra: Update labels for chart components
This ps adds more granular node selectors for the charts in osh
infra to match what is currently done in osh

Change-Id: I8957a95053b9fb3ea329fd37ff049cd223a7695d
2018-04-13 08:44:33 -05:00
Pete Birley
b9336ca613 Helm-Toolkit: Kubernetes Entrypoint, simplify image dependencies
This PS simplify the logic for dyanmicly merging the image management
depenencies into pod deps when active.

Change-Id: I0cf6c93173bc5fbce697ac15be8697d3b1326d0a
2018-04-13 08:42:37 -05:00
Steve Wilkerson
1ebce2424e Nagios: Configure ports with endpoint port lookups
This ps updates the nagios chart to use endpoint port lookups for
port configuration, bringing it in line with the other charts

Change-Id: I500b4741d50132f6c316ded660981e2af8b71e7a
2018-04-02 09:32:15 -05:00
Steve Wilkerson
99befc2484 Nagios Chart
This adds the nagios chart to osh-infra to provide additional
monitoring functionality. It uses helper functions to consume
yaml definitions for services, commands, hosts and hostgroups
to generate the required configurations for those entities in
nagios's configuration

Change-Id: I6238bb8cb1e5c8dc48594ddea50693f3e7b0a176
2018-03-23 13:45:40 +00:00