19 Commits

Author SHA1 Message Date
Lo, Chi (cl566n)
1892fca645 Enable TLS for Prometheus
This patchset enabled TLS path for Prometheus when it acts as
a server.  Note that TLS is not directly terminated at Prometheus.
TLS is terminated at apache proxy which in turn route request
to Prometheus.

Change-Id: I0db366b6237a34da2e9a31345d96ae8f63815fa2
2021-03-17 17:06:07 -07:00
Steven Fitzpatrick
cdd0f33d0c Revert "Prometheus: Render Rules as Templates"
This reverts commit fb7fc87d237ce569666f7bd041adea6007549138.

I first submitted that as a way to add dynamic capability to the
prometheus rules (they infamously don't support ENV variable
substitution there). However this be done easily with another solution,
and would clean up the prometheus chart values significantly.

Change-Id: Ibec512d92490798ae5522468b915b49e7746806a
2020-10-06 15:21:18 +00:00
Steven Fitzpatrick
fb7fc87d23 Prometheus: Render Rules as Templates
This change allows us to substitute values into our rules files.

Example:

- alert: my_region_is_down
  expr: up{region="{{ $my_region }}"} == 0
  
To support this change, rule annotations that used the expansion
{{ $labels.foo }} had to be surrounded with "{{` ... `}}" to render
correctly.

Change-Id: Ia7ac891de8261acca62105a3e2636bd747a5fbea
2020-08-10 18:16:35 +00:00
Andrii Ostapenko
731a6b4cfa Enable yamllint checks
- document-end
- document-start
- empty-lines
- hyphens
- indentation
- key-duplicates
- new-line-at-end-of-file
- new-lines
- octal-values

with corresponding code adjustment.

Change-Id: I92d6aa20df82aa0fe198f8ccd535cfcaf613f43a
2020-05-29 19:49:05 +00:00
Andrii Ostapenko
67d1409a74 Enable yamllint checks
- brackets
- braces
- colon
- commas

with corresponding code adjustment.

Change-Id: I8d294cfa8f358431bee6ecb97396dae66f955b86
2020-05-21 14:04:23 +00:00
diwakar thyagaraj
163c5aa780 Enable Apparmor to all osh-infra test pods
Also Changed container names to static.

Change-Id: I51f53b480d18aaa38a9707429f01052ee122e7e9
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-05-19 15:36:07 +00:00
diwakar thyagaraj
64ac469eb6 Enable Apparmor to Prometheus-init-containers
Change-Id: Ibea27338437c9c039b10bff02a28d60d3f5cf4b1
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-05-08 17:24:54 +00:00
Zuul
01aa16620b Merge "Prometheus: Status Alerts Scalar/Vector Conversion" 2020-02-18 17:35:43 +00:00
Zuul
57ad8ad603 Merge "Prometheus: Ceph Alerts Scalar/Vector Conversion" 2020-02-18 17:35:42 +00:00
Zuul
3c7a9de243 Merge "Prometheus: Node Alerts Scalar/Vector Conversion" 2020-02-18 17:29:48 +00:00
dt241s@att.com
8bd4a2624a [FIX] Add apparmor to prometheus.
This also fixes Elasticsearch apparmor Jobs.

Change-Id: I8f2a9aa12beffe3ca394a2e9dd00aba7e5292f29
2020-02-14 23:13:38 +00:00
Steven Fitzpatrick
a41262e459 Prometheus: Node Alerts Scalar/Vector Conversion
This change converts alert expressions which relied on instant vectors
to use range aggregate functions instead - For just the 'basic_linux'
rules.

Change-Id: I30d6ab71d747b297f522bbeb12b8f4dbfce1eefe
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
2020-02-11 15:14:40 +00:00
Steven Fitzpatrick
f37865d6a0 Prometheus: Ceph Alerts Scalar/Vector Conversion
This change updates the prometheus alerting rules to use ranged vectors
in their expressions, to avoid situations wher missed scrapes would
cause scalar metrics to "go stale" - resetting the alert timer.

Only the ceph alerts are affected by this change.

Change-Id: Ib47866d12616aaa808e6a09c58aa4352e338a152
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
2020-02-11 15:14:35 +00:00
Steven Fitzpatrick
d408bed90d Prometheus: Status Alerts Scalar/Vector Conversion
This change converts alert expressions which relied on instant vectors
to use range aggregate functions instead.

Change-Id: I4df757f961524bed23b6a6ad361779c1749ca2c5
Co-Authored-By: Meghan Heisler <mkheisler93@gmail.com>
2020-02-11 15:14:27 +00:00
Zuul
cc399a08ed Merge "Fix incorrect prometheus alert names in nagios" 2020-01-15 23:43:05 +00:00
Smruti Soumitra Khuntia
2ac08b59b4 Support for local storage
This change adds a means of introducing new storage classes
and local persistent volumes.

Change-Id: I340c75f3d0a1678f3149f3cf62e4ab104823cc49
Co-Authored-By: Steven Fitzpatrick <steven.fitzpatrick@att.com>
2020-01-09 10:24:31 -06:00
Steve Wilkerson
ddd5a74319 Prometheus: Add feature-gate support in deployment scripts
This updates the deployment scripts for Prometheus to leverage the
feature gate functionality rather than bash generation of the list
of override files to use for alerting rules

Change-Id: Ie497ae930f7cc4db690a4ddc812a92e4491cde93
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2020-01-07 22:06:19 +00:00
Steven Fitzpatrick
4fdcff593c Fix incorrect prometheus alert names in nagios
I noticed a some nagios service checks were checking prometheus
alerts which did not exist in our default prometheus configuration.
In one case a prometheus alert did not match the naming convention
of similar alerts.

One nagios service check, ceph_monitor_clock_skew_high, does not
have a corresponding alert  at all, so I've changed it to check the

node_ntmp_clock_skew_high

alert, where a node has the label ceph-mon="enabled".

Change-Id: I2ebf9a4954190b8e2caefc8a61270e28bf24d9fa
2020-01-03 10:30:08 -06:00
Steve Wilkerson
fbd34421f2 Prometheus: Update chart to support federation
This updates the Prometheus chart to support federation. This
moves to defining the Prometheus configuration file via a template
in the values.yaml file instead of through raw yaml. This allows
for overriding the chart's default configuration wholesale, as
this would be required for a hierarchical federated setup. This
also strips out all of the default rules defined in the chart for
the same reason. There are example rules defined for the various
aspects of OSH's infrastructure in the prometheus/values_overrides
directory that are executed as part of the normal CI jobs. This
also adds a nonvoting federated-monitoring job that vets out the
ability to federate prometheus in a hierarchical fashion with
extremely basic overrides

Change-Id: I0f121ad5e4f80be4c790dc869955c6b299ca9f26
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-11-21 12:39:56 +00:00