129 Commits

Author SHA1 Message Date
Radosław Piliszek
3e75a33ad4 Use the new image naming scheme
Change-Id: Ib4b15ed4feac82d8492b1c0f0238a752eac668e6
2022-05-23 06:37:25 +00:00
k-s-dean
656f6cdb08 Put openstack exporter behind HAproxy so only one is queried at a time
Closes-Bug: #1972818

Change-Id: I9e36b9169b6725bf6db953e464fc099087747778
2022-05-12 07:41:57 +00:00
Zuul
2c15d36fed Merge "Adds prometheus_scrape_interval" 2022-04-21 16:55:35 +00:00
Marcin Juszkiewicz
1620ab5be9 drop install_type from image names
We have only one value for install_type now and it gets removed from
image names.

Change-Id: I8bf95fd7aa9dd26b80d618ca0fcb097003b4cb0a
2022-04-20 12:29:12 +02:00
Zuul
27bf4e9351 Merge "Switch prometheus to active/passive mode" 2022-04-15 13:05:19 +00:00
Will Szumski
6906b275ef Switch prometheus to active/passive mode
This uses the same approach as the mariadb role (and others).

Closes-Bug: #1928193
Co-Authored-By: John Garbutt <johng@stackhpc.com>
Change-Id: I79a7a8c80327cfd9ef31d17fe71f450a181a638c
2022-04-15 10:10:50 +00:00
Nathan Taylor
0f2794a075 Adds etcd endpoints as a Prometheus scrape target
Add "enable_prometheus_etcd_integration" configuration parameter which
can be used to configure Prometheus to scrape etcd metrics endpoints.
The default value of "enable_prometheus_etcd_integration" is set to
the combined values of "enable_prometheus" and "enable_etcd".

Change-Id: I7a0b802c5687e2d508e06baf55e355d9761e806f
2022-03-08 08:42:19 -07:00
Zuul
63706667e1 Merge "Add support for deploying Prometheus libvirt exporter" 2022-02-21 21:35:55 +00:00
Pierre Riteau
b210dcd6e2 Configure node-exporter to report correct file system metrics
Without this configuration, all mount points are reporting the same
utilisation metrics [1]. With the rslave option, all root mounts from
the host are visible in the container, so we can remove the bind mounts
for /proc and /sys.

[1] https://github.com/prometheus/node_exporter#docker

Change-Id: I4087dc81f9d1fa5daa24b9df6daf1f9e1ccd702f
Closes-Bug: #1961438
2022-02-18 18:36:22 +01:00
Pierre Riteau
dcba829792 Allow to define extra parameters for Prometheus exporters
The following variables are added:

* prometheus_blackbox_exporter_cmdline_extras
* prometheus_elasticsearch_exporter_cmdline_extras
* prometheus_haproxy_exporter_cmdline_extras
* prometheus_memcached_exporter_cmdline_extras
* prometheus_mysqld_exporter_cmdline_extras
* prometheus_node_exporter_cmdline_extras
* prometheus_openstack_exporter_cmdline_extras

Change-Id: I5da2031b9367115384045775c515628e2acb1aa4
2022-02-18 10:12:22 +01:00
Will Szumski
033db44f1c Adds prometheus_scrape_interval
Grafana requires the scrape interval to be set to be able to compute
$__rate_interval. The default is 15s which does not match the kolla
default of 60s. The symptom of not setting this is that you will see
"no data" when zooming graphs that use rate queries. This occurs as the
interval will be set to a period shorter than the scrape interval.
The recommendation is that you use a common scrape interval for all
jobs. See:

- https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/
- https://stackoverflow.com/questions/66369969/set-scrape-interval-in-provisioned-prometheus-data-source-in-grafana

Change-Id: I7e5c1e20c7b66b64cbd333f669ef8d8da60daaa8
2022-02-14 11:10:44 +00:00
Zuul
2d72fc5da4 Merge "prometheus: add tls_connect blackbox module" 2022-01-28 12:24:35 +00:00
Zuul
2146015cf0 Merge "Revert "Use friendly target names in Prometheus"" 2022-01-25 09:55:44 +00:00
Zuul
dc5eaa4ec7 Merge "Use Volume V3 API in OpenStack exporter" 2022-01-07 19:19:09 +00:00
Doug Szumski
491d418476 Add support for deploying Prometheus libvirt exporter
Add support for deploying the Kolla Prometheus libvirt exporter image to
facilitate gathering metrics from the Nova libvirt service.

Co-Authored-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: Ib27e60c39297b86ae674297370f9543ab08cda05
Partially-Implements: blueprint libvirt-exporter
2022-01-05 13:30:45 +01:00
Angelos Kolaitis
4410ca7802
Use Volume V3 API in OpenStack exporter
Kolla has removed the Volume V2 API by default since OpenStack Wallaby.
However, openstack-exporter attempts to use the Volume V2 API by
default, resulting in clean installs failing to fetch Cinder metrics
in Prometheus.

This patch updates the clouds.yml configuration file for
openstack-exporter to use the Volume V3 API instead.

Closes-Bug: #1938194
Change-Id: Ifbb601be3ef1a1e853d5a7e832adf556c0ae38b9
2022-01-05 13:19:08 +02:00
Pierre Riteau
56fc74f231 Move project_name and kolla_role_name to role vars
Role vars have a higher precedence than role defaults. This allows to
import default vars from another role via vars_files without overriding
project_name (see related bug for details).

Change-Id: I3d919736e53d6f3e1a70d1267cf42c8d2c0ad221
Related-Bug: #1951785
2021-12-31 09:26:25 +00:00
Mark Goddard
c358a2d586 Revert "Use friendly target names in Prometheus"
This reverts commit 4ff65b7661ea06e9fa8631c4eb82232e03af77d7.

Reason for revert: adds assumptions about inventory_hostname being resolvable.

Closes-Bug: #1955563
Change-Id: Ifa2b2ea8622f56c34b8f7f37fee53133272ff925
2021-12-23 15:14:33 +00:00
Zuul
612937de0f Merge "Fix privileges for MariaDB 10.5" 2021-10-11 11:15:04 +00:00
Radosław Piliszek
c7c14e1c43 Fix privileges for MariaDB 10.5
"BINLOG MONITOR" and "SLAVE MONITOR" replace
"REPLICATION CLIENT" (which is now an alias for "BINLOG MONITOR").
The validation in Ansible MySQL collection is too simple to
understand aliases and breaks. Hence, let's use the canonical
names and adapt per service according to its needs.

Change-Id: I1175e4846384accd19942620dc155d0c5728e64b
2021-10-07 09:24:31 +00:00
Zuul
01470fc7e9 Merge "Use friendly target names in Prometheus" 2021-10-06 16:27:21 +00:00
Christian Berendt
4f78c696c2 Do not become root when searching for custom prometheus alert rules files
Change-Id: I6da412d6d3e7d067c8d903ee884711ac509d24aa
2021-10-04 09:49:58 +02:00
Piotr Parczewski
4ff65b7661 Use friendly target names in Prometheus
Change-Id: I16fdb2f93ddb656eeacd3f2b84190f9bdcfaa21c
2021-09-22 11:09:32 +02:00
Zuul
7cf30017ea Merge "Add Alertmanger metric target(s)" 2021-09-20 18:08:56 +00:00
Zuul
83c5d95b47 Merge "Support monitoring Fluentd with Prometheus" 2021-08-27 09:34:12 +00:00
Radosław Piliszek
9ff2ecb031 Refactor and optimise image pulling
We get a nice optimisation by using a filtered loop instead
of task skipping per service with 'when'.

Partially-Implements: blueprint performance-improvements
Change-Id: I8f68100870ab90cb2d6b68a66a4c97df9ea4ff52
2021-08-10 11:57:54 +00:00
Doug Szumski
b692ce7af1 Support monitoring Fluentd with Prometheus
This patch adds support for integrating Prometheus with Fluentd.
This can be used to extract useful information about the status
of Fluentd, such as output buffer capacity and logging rate,
and also to extract metrics from logs via custom Fluentd
configuration. More information can be found here in [1].

[1] https://docs.fluentd.org/monitoring-fluentd/monitoring-prometheus

Change-Id: I233d6dd744848ef1f1589a462dbf272ed0f3aaae
2021-08-09 10:12:20 +01:00
Zuul
1a4a8c1615 Merge "Reduce container metrics cardinality" 2021-08-06 14:47:38 +00:00
Piotr Parczewski
0d79d25fe9 Remove support for Prometheus v1
Change-Id: I0d7c7f47e6653cf2903589a9c86798a8c6404af5
2021-08-05 21:07:22 +02:00
Mark Goddard
94c6b8e220 prometheus: add tls_connect blackbox module
This allows checking of TLS servers. It can be useful to check
RabbitMQ TLS, including certificate expiry.

Change-Id: I2192d3481d790c11b110bf10082b3efeade75463
2021-07-23 10:42:30 +01:00
Piotr Parczewski
c2ae21fd97 Reduce container metrics cardinality
Adds support for passing extra runtime options to cAdvisor.
By default new options disable exporting rarely useful metrics
and labels by cAdvisor. This helps reducing the load on Prometheus
and cAdvisor itself.

Change-Id: I81f3845d6cd03a70a0c8569f8d0ea421027df083
2021-07-08 16:31:44 +02:00
Mark Goddard
ade5bfa302 Use ansible_facts to reference facts
By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.

This change updates all references to Ansible facts within Kolla Ansible
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.

This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.

[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars

Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1
Partially-Implements: blueprint performance-improvements
2021-06-23 10:38:06 +01:00
Radosław Piliszek
640dbb03fa Revert "Reduce container metrics cardinality"
This reverts commit c6259158e3eff4aff9770b7044b0179a7de533aa.

Reason for revert: cAdvisor fails with:

invalid value "percpu,referenced_memory,cpu_topology,resctrl,udp,advtcp,sched,hugetlb,memory_numa,tcp,process" for flag -disable_metrics: unsupported metric "referenced_memory" specified in disable_metrics

Change-Id: I1a0eea5c20f95f38c707401b56b7d2454484377d
2021-06-20 13:58:32 +00:00
Piotr Parczewski
c6259158e3 Reduce container metrics cardinality
Adds support for passing extra runtime options to cAdvisor.
By default new options disable exporting rarely useful metrics
and labels by cAdvisor. This helps reducing the load on Prometheus
and cAdvisor itself.

Change-Id: Id0144e8fa518e3236cb94ba2e3961fb455d36443
2021-06-16 08:10:51 +02:00
Piotr Parczewski
b300f7bc40 Disable Alertmanager's peer gossip in non-HA deployments
Reference:

https://github.com/prometheus/alertmanager#turn-off-high-availability

Closes-Bug: #1926463
Change-Id: I60e1dedeac25fa8fe9538a3a8e582bd8cc9324d7
2021-05-11 14:39:29 +00:00
Piotr Parczewski
5a6cafa210 Add Alertmanger metric target(s)
This commit enables scraping of Alertmanager metrics.

Change-Id: I69f4ac7de0f95eff393d9658af396e3c04824c8f
2021-03-22 12:59:44 +01:00
zhubingbing
f486e4930f prometheus: Collect metrics from rabbitmq
The rabbitmq_prometheus plugin is available in RabbitMQ 3.8.

https://www.rabbitmq.com/prometheus.html

Implements: blueprint rabbitmq-prometheus
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I4d69a93a6c70db8d40626042cdbe773747b238ae
2021-03-15 10:30:08 +00:00
Piotr Parczewski
a50bef0f76 Deprecate Prometheus 1.x
Deprecates support for Prometheus v1.x.
In Xena support for it will be removed from Kolla Ansible.

Change-Id: I027b19621196c698e09f79af294ba1b5dbfc0516
2021-03-02 16:33:35 +01:00
Zuul
031e337898 Merge "Add Prometheus 2.x deployment" 2021-01-15 11:57:52 +00:00
Piotr Parczewski
1bdd8ea984 Add Prometheus 2.x deployment
It is now possible to deploy either 1.x or 2.x version of Prometheus.
The new 2.x version introduces breaking changes in terms of storage
format and command line options.

Change-Id: I80cc6f1947f3740ef04b29839bfa655b14fae146
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2021-01-12 14:17:49 +01:00
Zuul
860c32de76 Merge "Revert "Performance: Use import_tasks in the main plays"" 2020-12-15 19:52:24 +00:00
Mark Goddard
db4fc85c33 Revert "Performance: Use import_tasks in the main plays"
This reverts commit 9cae59be51e8d2d798830042a5fd448a4aa5e7dc.

Reason for revert: This patch was found to introduce issues with fluentd customisation. The underlying issue is not currently fully understood, but could be a sign of other obscure issues.

Change-Id: Ia4859c23d85699621a3b734d6cedb70225576dfc
Closes-Bug: #1906288
2020-12-14 10:36:55 +00:00
Zuul
172bc6eccd Merge "Performance: Use import_tasks in the main plays" 2020-11-24 15:47:35 +00:00
Alban Lecorps
99680b56ef Add override timeout for openstack exporter
Add scrape_timeout option in
prometheus_openstack_exporter job in order
to avoid timeout for large Openstack environment.

Change-Id: If96034e602bee3b3eea34a2656047355e1d17eec
Closes-Bug: #1903547
2020-11-11 11:14:46 +00:00
Radosław Piliszek
9cae59be51 Performance: Use import_tasks in the main plays
Main plays are action-redirect-stubs, ideal for import_tasks.

This avoids 'include' penalty and makes logs/ara look nicer.

Fixes haproxy and rabbitmq not to check the host group as well.

Change-Id: I46136fc40b815e341befff80b54a91ef431eabc0
Partially-Implements: blueprint performance-improvements
2020-10-27 19:09:32 +01:00
Radosław Piliszek
3411b9e420 Performance: optimize genconfig
Config plays do not need to check containers. This avoids skipping
tasks during the genconfig action.

Ironic and Glance rolling upgrades are handled specially.

Swift and Bifrost do not use the handlers at all.

Partially-Implements: blueprint performance-improvements
Change-Id: I140bf71d62e8f0932c96270d1f08940a5ba4542a
2020-10-12 19:30:06 +02:00
Pierre Riteau
6985e9a67c Apply bool filter to all enable_prometheus_* variables
Change-Id: I639145a709f1d3b9882bbdfb20a754646d1f5270
2020-10-09 18:51:38 +02:00
Pierre Riteau
295f8d1b43 Remove unused configuration for prometheus-openstack-exporter
The Prometheus OpenStack exporter was needlessly configured to use the
prometheus Docker volume and change permissions of /data, which does
not exist in the container image.

This must have been copy-pasted from existing Prometheus code.

Change-Id: I96017c17e68ca7a00a2d5ac41f2f43ef87694514
2020-09-01 14:15:52 +02:00
Mark Goddard
b685ac44e0 Performance: replace unconditional include_tasks with import_tasks
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. For unconditionally included tasks, switching to
import_tasks provides a clear benefit.

Benchmarking of include vs. import is available at [1].

This change switches from include_tasks to import_tasks where there is
no condition applied to the include.

[1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md#task-include-and-import

Partially-Implements: blueprint performance-improvements

Change-Id: Ia45af4a198e422773d9f009c7f7b2e32ce9e3b97
2020-08-28 16:12:03 +00:00
Rafael Weingärtner
f425c0678f Standardize use and construction of endpoint URLs
The goal for this push request is to normalize the construction and use
 of internal, external, and admin URLs. While extending Kolla-ansible
 to enable a more flexible method to manage external URLs, we noticed
 that the same URL was constructed multiple times in different parts
 of the code. This can make it difficult for people that want to work
 with these URLs and create inconsistencies in a large code base with
 time. Therefore, we are proposing here the use of
 "single Kolla-ansible variable" per endpoint URL, which facilitates
 for people that are interested in overriding/extending these URLs.

As an example, we extended Kolla-ansible to facilitate the "override"
of public (external) URLs with the following standard
"<component/serviceName>.<companyBaseUrl>".
Therefore, the "NAT/redirect" in the SSL termination system (HAproxy,
HTTPD or some other) is done via the service name, and not by the port.
This allows operators to easily and automatically create more friendly
 URL names. To develop this feature, we first applied this patch that
 we are sending now to the community. We did that to reduce the surface
  of changes in Kolla-ansible.

Another example is the integration of Kolla-ansible and Consul, which
we also implemented internally, and also requires URLs changes.
Therefore, this PR is essential to reduce code duplicity, and to
facility users/developers to work/customize the services URLs.

Change-Id: I73d483e01476e779a5155b2e18dd5ea25f514e93
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
2020-08-19 07:22:17 +00:00