138 Commits

Author SHA1 Message Date
Pierre Riteau
13b0f3b861 Make external access to monitoring services configurable
Change-Id: Iaf6bf36ae0adce3342981c36c859fc138b172f6b
2022-06-27 11:57:53 +02:00
Zuul
5865b0e9a8 Merge "Support setting Nova API microversion for openstack-exporter" 2022-06-24 08:48:43 +00:00
Pierre Riteau
41fba3c5df Support setting Nova API microversion for openstack-exporter
Starting from v1.5.0 of the exporter, OS_COMPUTE_API_VERSION can be set
to configure the Nova API version to be used [1]. Microversion 2.1 can
be used to keep metrics unmodified from the previous exporter version
deployed by Kolla (v1.3.0).

Support it with prometheus_openstack_exporter_compute_api_version,
defaulting to using the latest version.

[1] https://github.com/openstack-exporter/openstack-exporter/pull/201

Change-Id: I7605a3f9f74effb29ecec3b28e4709fd5f7f8cd4
2022-06-23 17:11:50 +02:00
Pierre Riteau
06223d651b Fix typo in prometheus-node-exporter restart handler
Change-Id: Ib05569a08e2fe21dae31cdacad3b622d17cb5db3
2022-06-22 16:51:49 +02:00
Radosław Piliszek
72b63dfee7 Further Keystone-related cleanups
Per comments on [1].

[1] https://review.opendev.org/c/openstack/kolla-ansible/+/843727

Change-Id: I60162b54bc06e158534d29311d4474b34750c64d
2022-06-20 08:40:03 +00:00
Zuul
4336ffbe44 Merge "Add support for custom alert notification templates" 2022-06-02 10:05:06 +00:00
Radosław Piliszek
7ca9349b09 Do not use keystone_admin_url et al
Following up on [1].
The 3 variables are only introducing noise after we removed
the reliance on Keystone's admin port.

[1] I5099b08953789b280c915a6b7a22bdd4e3404076

Change-Id: I3f9dab93042799eda9174257e604fd1844684c1c
2022-05-28 18:19:01 +02:00
k-s-dean
fcba927d7b talk TLS to openstack exporter via haproxy
Closes-Bug: #1975598
Change-Id: If4c85f8e960141d08a89accdc11a3271f31974c1
2022-05-24 16:23:42 +01:00
Radosław Piliszek
3e75a33ad4 Use the new image naming scheme
Change-Id: Ib4b15ed4feac82d8492b1c0f0238a752eac668e6
2022-05-23 06:37:25 +00:00
Piotr Parczewski
77e811e25c Add support for custom alert notification templates
Change-Id: I9e4e1287ea69960ae3b13734acc3eb0b087a8d86
2022-05-18 12:03:47 +02:00
k-s-dean
656f6cdb08 Put openstack exporter behind HAproxy so only one is queried at a time
Closes-Bug: #1972818

Change-Id: I9e36b9169b6725bf6db953e464fc099087747778
2022-05-12 07:41:57 +00:00
Zuul
2c15d36fed Merge "Adds prometheus_scrape_interval" 2022-04-21 16:55:35 +00:00
Marcin Juszkiewicz
1620ab5be9 drop install_type from image names
We have only one value for install_type now and it gets removed from
image names.

Change-Id: I8bf95fd7aa9dd26b80d618ca0fcb097003b4cb0a
2022-04-20 12:29:12 +02:00
Zuul
27bf4e9351 Merge "Switch prometheus to active/passive mode" 2022-04-15 13:05:19 +00:00
Will Szumski
6906b275ef Switch prometheus to active/passive mode
This uses the same approach as the mariadb role (and others).

Closes-Bug: #1928193
Co-Authored-By: John Garbutt <johng@stackhpc.com>
Change-Id: I79a7a8c80327cfd9ef31d17fe71f450a181a638c
2022-04-15 10:10:50 +00:00
Nathan Taylor
0f2794a075 Adds etcd endpoints as a Prometheus scrape target
Add "enable_prometheus_etcd_integration" configuration parameter which
can be used to configure Prometheus to scrape etcd metrics endpoints.
The default value of "enable_prometheus_etcd_integration" is set to
the combined values of "enable_prometheus" and "enable_etcd".

Change-Id: I7a0b802c5687e2d508e06baf55e355d9761e806f
2022-03-08 08:42:19 -07:00
Zuul
63706667e1 Merge "Add support for deploying Prometheus libvirt exporter" 2022-02-21 21:35:55 +00:00
Pierre Riteau
b210dcd6e2 Configure node-exporter to report correct file system metrics
Without this configuration, all mount points are reporting the same
utilisation metrics [1]. With the rslave option, all root mounts from
the host are visible in the container, so we can remove the bind mounts
for /proc and /sys.

[1] https://github.com/prometheus/node_exporter#docker

Change-Id: I4087dc81f9d1fa5daa24b9df6daf1f9e1ccd702f
Closes-Bug: #1961438
2022-02-18 18:36:22 +01:00
Pierre Riteau
dcba829792 Allow to define extra parameters for Prometheus exporters
The following variables are added:

* prometheus_blackbox_exporter_cmdline_extras
* prometheus_elasticsearch_exporter_cmdline_extras
* prometheus_haproxy_exporter_cmdline_extras
* prometheus_memcached_exporter_cmdline_extras
* prometheus_mysqld_exporter_cmdline_extras
* prometheus_node_exporter_cmdline_extras
* prometheus_openstack_exporter_cmdline_extras

Change-Id: I5da2031b9367115384045775c515628e2acb1aa4
2022-02-18 10:12:22 +01:00
Will Szumski
033db44f1c Adds prometheus_scrape_interval
Grafana requires the scrape interval to be set to be able to compute
$__rate_interval. The default is 15s which does not match the kolla
default of 60s. The symptom of not setting this is that you will see
"no data" when zooming graphs that use rate queries. This occurs as the
interval will be set to a period shorter than the scrape interval.
The recommendation is that you use a common scrape interval for all
jobs. See:

- https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/
- https://stackoverflow.com/questions/66369969/set-scrape-interval-in-provisioned-prometheus-data-source-in-grafana

Change-Id: I7e5c1e20c7b66b64cbd333f669ef8d8da60daaa8
2022-02-14 11:10:44 +00:00
Zuul
2d72fc5da4 Merge "prometheus: add tls_connect blackbox module" 2022-01-28 12:24:35 +00:00
Zuul
2146015cf0 Merge "Revert "Use friendly target names in Prometheus"" 2022-01-25 09:55:44 +00:00
Zuul
dc5eaa4ec7 Merge "Use Volume V3 API in OpenStack exporter" 2022-01-07 19:19:09 +00:00
Doug Szumski
491d418476 Add support for deploying Prometheus libvirt exporter
Add support for deploying the Kolla Prometheus libvirt exporter image to
facilitate gathering metrics from the Nova libvirt service.

Co-Authored-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: Ib27e60c39297b86ae674297370f9543ab08cda05
Partially-Implements: blueprint libvirt-exporter
2022-01-05 13:30:45 +01:00
Angelos Kolaitis
4410ca7802
Use Volume V3 API in OpenStack exporter
Kolla has removed the Volume V2 API by default since OpenStack Wallaby.
However, openstack-exporter attempts to use the Volume V2 API by
default, resulting in clean installs failing to fetch Cinder metrics
in Prometheus.

This patch updates the clouds.yml configuration file for
openstack-exporter to use the Volume V3 API instead.

Closes-Bug: #1938194
Change-Id: Ifbb601be3ef1a1e853d5a7e832adf556c0ae38b9
2022-01-05 13:19:08 +02:00
Pierre Riteau
56fc74f231 Move project_name and kolla_role_name to role vars
Role vars have a higher precedence than role defaults. This allows to
import default vars from another role via vars_files without overriding
project_name (see related bug for details).

Change-Id: I3d919736e53d6f3e1a70d1267cf42c8d2c0ad221
Related-Bug: #1951785
2021-12-31 09:26:25 +00:00
Mark Goddard
c358a2d586 Revert "Use friendly target names in Prometheus"
This reverts commit 4ff65b7661ea06e9fa8631c4eb82232e03af77d7.

Reason for revert: adds assumptions about inventory_hostname being resolvable.

Closes-Bug: #1955563
Change-Id: Ifa2b2ea8622f56c34b8f7f37fee53133272ff925
2021-12-23 15:14:33 +00:00
Zuul
612937de0f Merge "Fix privileges for MariaDB 10.5" 2021-10-11 11:15:04 +00:00
Radosław Piliszek
c7c14e1c43 Fix privileges for MariaDB 10.5
"BINLOG MONITOR" and "SLAVE MONITOR" replace
"REPLICATION CLIENT" (which is now an alias for "BINLOG MONITOR").
The validation in Ansible MySQL collection is too simple to
understand aliases and breaks. Hence, let's use the canonical
names and adapt per service according to its needs.

Change-Id: I1175e4846384accd19942620dc155d0c5728e64b
2021-10-07 09:24:31 +00:00
Zuul
01470fc7e9 Merge "Use friendly target names in Prometheus" 2021-10-06 16:27:21 +00:00
Christian Berendt
4f78c696c2 Do not become root when searching for custom prometheus alert rules files
Change-Id: I6da412d6d3e7d067c8d903ee884711ac509d24aa
2021-10-04 09:49:58 +02:00
Piotr Parczewski
4ff65b7661 Use friendly target names in Prometheus
Change-Id: I16fdb2f93ddb656eeacd3f2b84190f9bdcfaa21c
2021-09-22 11:09:32 +02:00
Zuul
7cf30017ea Merge "Add Alertmanger metric target(s)" 2021-09-20 18:08:56 +00:00
Zuul
83c5d95b47 Merge "Support monitoring Fluentd with Prometheus" 2021-08-27 09:34:12 +00:00
Radosław Piliszek
9ff2ecb031 Refactor and optimise image pulling
We get a nice optimisation by using a filtered loop instead
of task skipping per service with 'when'.

Partially-Implements: blueprint performance-improvements
Change-Id: I8f68100870ab90cb2d6b68a66a4c97df9ea4ff52
2021-08-10 11:57:54 +00:00
Doug Szumski
b692ce7af1 Support monitoring Fluentd with Prometheus
This patch adds support for integrating Prometheus with Fluentd.
This can be used to extract useful information about the status
of Fluentd, such as output buffer capacity and logging rate,
and also to extract metrics from logs via custom Fluentd
configuration. More information can be found here in [1].

[1] https://docs.fluentd.org/monitoring-fluentd/monitoring-prometheus

Change-Id: I233d6dd744848ef1f1589a462dbf272ed0f3aaae
2021-08-09 10:12:20 +01:00
Zuul
1a4a8c1615 Merge "Reduce container metrics cardinality" 2021-08-06 14:47:38 +00:00
Piotr Parczewski
0d79d25fe9 Remove support for Prometheus v1
Change-Id: I0d7c7f47e6653cf2903589a9c86798a8c6404af5
2021-08-05 21:07:22 +02:00
Mark Goddard
94c6b8e220 prometheus: add tls_connect blackbox module
This allows checking of TLS servers. It can be useful to check
RabbitMQ TLS, including certificate expiry.

Change-Id: I2192d3481d790c11b110bf10082b3efeade75463
2021-07-23 10:42:30 +01:00
Piotr Parczewski
c2ae21fd97 Reduce container metrics cardinality
Adds support for passing extra runtime options to cAdvisor.
By default new options disable exporting rarely useful metrics
and labels by cAdvisor. This helps reducing the load on Prometheus
and cAdvisor itself.

Change-Id: I81f3845d6cd03a70a0c8569f8d0ea421027df083
2021-07-08 16:31:44 +02:00
Mark Goddard
ade5bfa302 Use ansible_facts to reference facts
By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.

This change updates all references to Ansible facts within Kolla Ansible
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.

This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.

[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars

Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1
Partially-Implements: blueprint performance-improvements
2021-06-23 10:38:06 +01:00
Radosław Piliszek
640dbb03fa Revert "Reduce container metrics cardinality"
This reverts commit c6259158e3eff4aff9770b7044b0179a7de533aa.

Reason for revert: cAdvisor fails with:

invalid value "percpu,referenced_memory,cpu_topology,resctrl,udp,advtcp,sched,hugetlb,memory_numa,tcp,process" for flag -disable_metrics: unsupported metric "referenced_memory" specified in disable_metrics

Change-Id: I1a0eea5c20f95f38c707401b56b7d2454484377d
2021-06-20 13:58:32 +00:00
Piotr Parczewski
c6259158e3 Reduce container metrics cardinality
Adds support for passing extra runtime options to cAdvisor.
By default new options disable exporting rarely useful metrics
and labels by cAdvisor. This helps reducing the load on Prometheus
and cAdvisor itself.

Change-Id: Id0144e8fa518e3236cb94ba2e3961fb455d36443
2021-06-16 08:10:51 +02:00
Piotr Parczewski
b300f7bc40 Disable Alertmanager's peer gossip in non-HA deployments
Reference:

https://github.com/prometheus/alertmanager#turn-off-high-availability

Closes-Bug: #1926463
Change-Id: I60e1dedeac25fa8fe9538a3a8e582bd8cc9324d7
2021-05-11 14:39:29 +00:00
Piotr Parczewski
5a6cafa210 Add Alertmanger metric target(s)
This commit enables scraping of Alertmanager metrics.

Change-Id: I69f4ac7de0f95eff393d9658af396e3c04824c8f
2021-03-22 12:59:44 +01:00
zhubingbing
f486e4930f prometheus: Collect metrics from rabbitmq
The rabbitmq_prometheus plugin is available in RabbitMQ 3.8.

https://www.rabbitmq.com/prometheus.html

Implements: blueprint rabbitmq-prometheus
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I4d69a93a6c70db8d40626042cdbe773747b238ae
2021-03-15 10:30:08 +00:00
Piotr Parczewski
a50bef0f76 Deprecate Prometheus 1.x
Deprecates support for Prometheus v1.x.
In Xena support for it will be removed from Kolla Ansible.

Change-Id: I027b19621196c698e09f79af294ba1b5dbfc0516
2021-03-02 16:33:35 +01:00
Zuul
031e337898 Merge "Add Prometheus 2.x deployment" 2021-01-15 11:57:52 +00:00
Piotr Parczewski
1bdd8ea984 Add Prometheus 2.x deployment
It is now possible to deploy either 1.x or 2.x version of Prometheus.
The new 2.x version introduces breaking changes in terms of storage
format and command line options.

Change-Id: I80cc6f1947f3740ef04b29839bfa655b14fae146
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2021-01-12 14:17:49 +01:00
Zuul
860c32de76 Merge "Revert "Performance: Use import_tasks in the main plays"" 2020-12-15 19:52:24 +00:00