This removes the min_block_duration and max_block_duration flags
from the Prometheus chart, as the suggested best practice is to
use the defaults (2h min, 10% of retention time as max).
This also updates the scrape target configuration for cadvisor to
match the upstream example endpoint for kubernetes versions 1.7.3
and later
Change-Id: I200969d6c4da9d17d0a7d3a34a114ccc5f5ee70f
This updates the Prometheus version to 2.3.2, which includes a fix
for memory leak issues with the kubernetes client and also adds a
dashboard for evaluating prometheus rule evaluation performance
Change-Id: I7b9e7bee114fa149db3733c0dacfefae36be7fa8
This adds authentication to Prometheus with an apache reverse
proxy, similar to elasticsearch, kibana and nagios. This adds an
admin user and password via htpasswd along with adding ldap
support.
This required modifying the grafana chart to configure the
prometheus datasource's basic auth credentials in the data sources
provisioning configuration file by checking whether basic auth is
enabled and injecting the username/password defined in the
corresponding endpoint definition.
This also modifies the nagios chart to use the authenticated
endpoint for prometheus, which is required for nagios to
successfully query the prometheus endpoint for its service
checking mechanism
Change-Id: Ia4ccc3c44a89b2c56594be1f4cc28ac07169bf8c
In most cases, the ingress controller's nodeSelector key and value
are "node-role.kubernetes.io/ingress" and "true".
Using quote to treat the nodeSelector value as a string.
Change-Id: Ie1745629b90795e4d888d85f35565e6d6350e09b
This updates the default command line flags for Prometheus. It
explicitly sets the HTTP administrative settings to false and
gives a brief explanation of the security concerns associated
with enabling them
This also removes the honor_labels setting where set to false, as
false is the default setting for honor_labels
Change-Id: I69acdbce604864882d642e44c09a5f0b9c454a61
This updates the TLS secret templates to include the backend
service in the dict supplied to the manifest template, as it is
required for the TLS secret to render correctly.
This also removes the readiness probe from the nagios container in
the deployment for the nagios chart, as it wasn't functioning as
intended due to the port not being available for the probe
Change-Id: Iabcfd40c74938e0497d08ffeeebc98ab722fa660
Adds support for TLS on overriden fqdns for public endpoints for
the services that have them in openstack-helm-infra. Currently this
implementation is limited, in that it does not provide support for
dynamically loading CAs into the containers, or specifying them manually
via configuration. As a result only well known or CA's added manually
to containers will be recognised.
Change-Id: I4ab4bbe24b6544b64cd365467e8efb2a421ac3f4
This PS removes the use of the `quote and truncate` approach to
suppress output from gotpl actions in templates and replaces it
with the recommended practice of defining `$_` instead.
Change-Id: I5fedc3471dcbecef37d2fe1302bf9760b3163467
Signed-off-by: Pete Birley <pete@port.direct>
This PS moves to use the current API version for kubernetes rcs'
that were previously using `apps/v1beta1`.
Story: 2002205
Task: 21735
Change-Id: Icb4e7aa2392da6867427a58926be2da6f424bd56
Signed-off-by: Pete Birley <pete@port.direct>
This adds a basic check for capacity utilization for persistent
volume claims. To accomplish this, it adds a basic alerting rule
to prometheus that triggers after a persistent volume's usage
exceeds 80%, and triggers 5 minutes after that state has been
reached. In addition, there is a service check added to the
nagios chart that will query Prometheus to check if the alarm
for that threshhold is firing for any of the volume claims.
Change-Id: I862c860ac479a715733202f679bb151885d7aa7c
This moves the charts in openstack-helm-infra closer towards a
standard structure. It addresses multiple deviations, including:
missing resources for init containers, incorrect indents for
disabled resources in some charts, incorrect indents for volumes
and volumemounts added via values, missing resources for some
helm test templates, missing helm-toolkit image functions, and
moving the resource template declarations to be under the image
template declarations
Change-Id: I4834a5d476ef7fc69c5583caacc0229050f20a76
This updates the prometheus service discovery configuration
to define the openstack-exporter service discovery separate from
the other services. This allows for relabeling the instance label
for the openstack-exporter service, removing the potential for
multiple data series being returned by the single stat panels in
the Grafana dashboards for the openstack services. As the other
services perform as expected when exporter pods restart, they
remain configured the same as before.
Change-Id: Iad4c56d31fb553a9629f5a6fd1eac5464207add4
Signed-off-by: Steve Wilkerson <wilkers.steve@gmail.com>
This updates the prometheus rule for checking for terminated
containers in pods. The previous rule checked for any terminations,
which raised alarms due to completed containers in jobs
being included, which isn't desired behavior. This changes the
expression to check for any containers that have terminated with
a status other than completed
Change-Id: I88e533a56f81f81bd1a81420ecfb7d43ac9e2d0b
This ps removes the namespace selector for discovering alertmanager
instances, as it's not required
Change-Id: Ie4dc40f761096d497293d6d98b2bbb906d382101
Move to v0.3.1 of kubernetes-entrypoint which has 2 breaking changes to
pod dependencies, and also adds support for depending on jobs via
labels.
Change-Id: I2bafc2153ddd46b3833b253a2e7950bccbccf8ed
Updates the service discovery mechanism used by Prometheus to
identify Alertmanager instances to push alerts to. It moves to
use the 'application' label to identify Alertmanager pods instead of
searching for pods by the label 'name', as the previous definition
was resulting in empty results for Alertmanager targets
This also fixes the name of the prometheus label used to track
alerts for kube-controller-manager, as it was defined incorrect
previously
Change-Id: I1fb194550baf803435722e3a01892e49b44259d1
This ps proposes adding a common template for the image_repo_sync
jobs for consumption by the charts
Change-Id: I48476d1e4fd94bd1b08b13b46983e3d999f8d8ca
This moves all relevant charts in osh-infra to use the htk manifest
template for ingresses, bringing them in line with the charts in
openstack-helm
Change-Id: Ic9c3cc6f0051fa66b6f88ec2b2725698b36ce824
This ps adds more granular node selectors for the charts in osh
infra to match what is currently done in osh
Change-Id: I8957a95053b9fb3ea329fd37ff049cd223a7695d
This PS simplify the logic for dyanmicly merging the image management
depenencies into pod deps when active.
Change-Id: I0cf6c93173bc5fbce697ac15be8697d3b1326d0a
This proposes a means for generating the command line flags for
configuring the Prometheus service via the values file instead of
templating out the command line flags used for the service. This
allows flexibility in choosing which flags and values to use when
deploying Prometheus, without needing to modify the chart itself
Change-Id: I74845b96e213403ad743724137a82ce2c78fcd1f
This enables the dynamic generation of the list of rules files for
prometheus, driven by the rules added in the appropriate tree under
.Values.conf.prometheus.rules. This removes the necessity of adding
the file name manually in addition to defining the rules in the
rules tree, which should reduce overhead associated with adding
new rules for prometheus to evaluate
Change-Id: Ib768a252c5ea4f2d099df534c3ffcfb2949d7481
Adds support for a new feature of kubernetes-entrypoint, pod
dependencies, that was added in v0.3.0.
Change-Id: I78d9e0545ca3b837cd2386783386a253f7f5a2d6
This PS moves existing dynamic common dependencies under a
'dynamic.common' key to simplify the yaml tree.
Change-Id: I4332bcfdf11197488e7bd5d8cf4c25565ea1c7b6
This PS moves static dependencies unser a 'static' key to allow
expansion to cover dynamic dependencies.
Change-Id: Ia0e853564955e0fbbe5a9e91a8b8924c703b1b02
The clusterrole name for prometheus wasn't referenced correctly in
the clusterrolebinding, resulting in issues with prometheus
operating correctly
Change-Id: I5b843d8a2b6829356098d71503ffce4a66d3198a
The pvc: key was added back to the prometheus chart as part of the
rbac tidy change. This removes it again
Change-Id: I572a4054d53ce5cb382f8b6608397d4f8a7eabd0
This PS includes the release name in the cluster role to prevent
colision if the chart is deployed multiple times in the same
cluster.
Change-Id: I7166e5ee25b3d4c89879393c5f84c869585a2681
This dynamically adds the rules files for prometheus to the
prometheus-etc configmap, and also dynamically adds volume mounts
to the prometheus statefulset for each rules file
This also removes the empty rules file trees in the prometheus
values.yaml file
Change-Id: I9acbbe57d71a23f69e9e172b2f3ad66985e99574
Adds "helm-toolkit.utils.merge" which is a replacement for the
upstream sprig "merge" function which didn't quite do what we
wanted, specifically it didn't merge slices, it just overrode
one with the other. This PS also updates existing callsites
of the sprig merge with "helm-toolkit.utils.merge".
Change-Id: I456349558d4cf941d1bcb07fc76d0688b0a10782