605 Commits

Author SHA1 Message Date
Zuul
63706667e1 Merge "Add support for deploying Prometheus libvirt exporter" 2022-02-21 21:35:55 +00:00
Zuul
83fa907961 Merge "Add support for VMware First Class Disk (FCD)" 2022-02-21 11:07:00 +00:00
Zuul
b668e27356 Merge "Add support for VMware NSXP" 2022-02-18 12:04:41 +00:00
alecorps
812e03f75e Add support for VMware First Class Disk (FCD)
An FCD, also known as an Improved Virtual Disk (IVD) or
Managed Virtual Disk, is a named virtual disk independent of
a virtual machine. Using FCDs for Cinder volumes eliminates
the need for shadow virtual machines.
This patch adds Kolla support.

Change-Id: Ic0b66269e6d32762e786c95cf6da78cb201d2765
2022-02-18 11:15:14 +00:00
Pierre Riteau
dcba829792 Allow to define extra parameters for Prometheus exporters
The following variables are added:

* prometheus_blackbox_exporter_cmdline_extras
* prometheus_elasticsearch_exporter_cmdline_extras
* prometheus_haproxy_exporter_cmdline_extras
* prometheus_memcached_exporter_cmdline_extras
* prometheus_mysqld_exporter_cmdline_extras
* prometheus_node_exporter_cmdline_extras
* prometheus_openstack_exporter_cmdline_extras

Change-Id: I5da2031b9367115384045775c515628e2acb1aa4
2022-02-18 10:12:22 +01:00
Alban Lecorps
458c8b13df Add support for VMware NSXP
NSXP is the OpenStack support for the NSX Policy platform.
This is supported from neutron in the Stein version. This patch
adds Kolla support

This adds a new neutron_plugin_agent type 'vmware_nsxp'. The plugin
does not run any neutron agents.

Change-Id: I9e9d8f07e586bdc143d293e572031368af7f3fca
2022-02-17 08:59:14 +00:00
likui
825ef7acd4 update the default value of node_custom_config
The value of node_custom_config should is {{ node_config }}/config,
when specified using --configdir

Change-Id: I076b7d2c8980ddd3baa28f998f84a6b7005dc352
2022-01-25 16:07:57 +08:00
Doug Szumski
491d418476 Add support for deploying Prometheus libvirt exporter
Add support for deploying the Kolla Prometheus libvirt exporter image to
facilitate gathering metrics from the Nova libvirt service.

Co-Authored-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: Ib27e60c39297b86ae674297370f9543ab08cda05
Partially-Implements: blueprint libvirt-exporter
2022-01-05 13:30:45 +01:00
Radosław Piliszek
8cc569306a Deprecate storage_interface variable
Per [1] and exchange on IRC.

[1] http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026437.html

Change-Id: I322500e7204eb129d7bf085006627e8c4aaaa934
2021-12-23 15:37:03 +00:00
Radosław Piliszek
0cbdedd0a3 Drop vmtp
Details in the attached reno.

Change-Id: I438a453ca522493524fdb9760c1edb330916084b
2021-12-21 07:29:32 +00:00
likui
42035e211f The deprecated iscsi deploy interface has been removed since xena
[1] https://docs.openstack.org/releasenotes/ironic/xena.html

Change-Id: Ic0dd9fa7ef76b647682e124b1bae52e931a38225
2021-11-15 18:30:59 +08:00
Zuul
948088abe2 Merge "Update Manila deploy steps for Wallaby" 2021-10-20 09:36:35 +00:00
Maksim Malchuk
37e4dba879 Add support for Ironic inspection through DHCP-relay
This change updates documentation, examples and tests to support
Ironic inspection through DHCP-relay. The dnsmasq service should be
configured with more specific format set in the variable
``ironic_dnsmasq_dhcp_range``. See the dnsmasq manual page [1].

[1] https://thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html

Change-Id: I9488a72db588e31289907668f1997596a8ccdec6
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
2021-10-12 22:16:04 +03:00
wu.chunyang
1f71df1a8b Remove chrony role from kolla
chrony is not supported in Xena cycle, remove it from kolla

Moved tasks from chrony role to chrony-cleanup.yml playbook to avoid a
vestigial chrony role.

Co-Authored-By: Mark Goddard <mark@stackhpc.com>

Change-Id: I5a730d55afb49d517c85aeb9208188c81e2c84cf
2021-09-30 18:56:14 +02:00
Zuul
bfba65f286 Merge "Add support for Ceph RadosGW integration" 2021-09-30 16:06:48 +00:00
Mark Goddard
8c5012e940 Add support for Ceph RadosGW integration
* Register Swift-compatible endpoints in Keystone
* Load balance across RadosGW API servers using HAProxy

The support is exercised in the cephadm CI jobs, but since RGW is
not currently enabled via cephadm, it is not yet tested.

https://docs.ceph.com/en/latest/radosgw/keystone/

Implements: blueprint ceph-rgw

Change-Id: I891c3ed4ed93512607afe65a42dd99596fd4dbf9
2021-09-30 13:08:13 +00:00
Mark Goddard
66c84843e4 Deploy source type images by default
Source images get the most test coverage, so it makes sense to deploy
these by default.

Change-Id: I8d0c8750e2c1600e84cc2e677a4eae0e9f502dac
2021-09-30 08:07:48 +00:00
Zuul
f99bf8325f Merge "Never make Docker registry insecure by default" 2021-09-09 10:49:03 +00:00
Zuul
83c5d95b47 Merge "Support monitoring Fluentd with Prometheus" 2021-08-27 09:34:12 +00:00
Radosław Piliszek
802f7c6218 Never make Docker registry insecure by default
To follow best security practices and help fellow operators.

More details inline and in the linked bug report.

Closes-Bug: #1940547
Change-Id: Ide9e9009a6e272f20a43319f27d257efdf315f68
2021-08-20 18:23:56 +00:00
Zuul
a98076f11c Merge "Use more RMQ flags for less busy wait" 2021-08-19 18:20:13 +00:00
Skylar Kelty
8d5dde3723
Update Manila deploy steps for Wallaby
Manila has changed from using subfolders to subvolumes.
We need a bit of a tidy up to prevent deploy errors.
This change also adds the ability to specify the ceph FS
Manila uses instead of relying on the default "first found".

Closes-Bug: #1938285
Closes-Bug: #1935784
Change-Id: I1d0d34919fbbe74a4022cd496bf84b8b764b5e0f
2021-08-17 10:01:58 +01:00
Doug Szumski
b692ce7af1 Support monitoring Fluentd with Prometheus
This patch adds support for integrating Prometheus with Fluentd.
This can be used to extract useful information about the status
of Fluentd, such as output buffer capacity and logging rate,
and also to extract metrics from logs via custom Fluentd
configuration. More information can be found here in [1].

[1] https://docs.fluentd.org/monitoring-fluentd/monitoring-prometheus

Change-Id: I233d6dd744848ef1f1589a462dbf272ed0f3aaae
2021-08-09 10:12:20 +01:00
Zuul
1a4a8c1615 Merge "Reduce container metrics cardinality" 2021-08-06 14:47:38 +00:00
Zuul
bb05cf1150 Merge "Remove support for Prometheus v1" 2021-08-06 14:12:18 +00:00
Zuul
295c69b5ee Merge "Remove tempest role" 2021-08-06 14:04:55 +00:00
Piotr Parczewski
0d79d25fe9 Remove support for Prometheus v1
Change-Id: I0d7c7f47e6653cf2903589a9c86798a8c6404af5
2021-08-05 21:07:22 +02:00
Radosław Piliszek
d7cdad5325 Use more RMQ flags for less busy wait
As mentioned in the Iced014acee7e590c10848e73feca166f48b622dc
commit message, in Ussuri+ we can use ``+sbwtdcpu none
+sbwtdio none`` as well. This is due to relying on RMQ-provided
erlang in version 23.x.

This change adds the extra arguments by default.
It should be backported down to Ussuri before we do a release with
Iced014acee7e590c10848e73feca166f48b622dc.

Change-Id: I32e247a6cb34d7f6763b544f247fd408dce2b3a2
2021-07-28 19:14:43 +00:00
Piotr Parczewski
c2ae21fd97 Reduce container metrics cardinality
Adds support for passing extra runtime options to cAdvisor.
By default new options disable exporting rarely useful metrics
and labels by cAdvisor. This helps reducing the load on Prometheus
and cAdvisor itself.

Change-Id: I81f3845d6cd03a70a0c8569f8d0ea421027df083
2021-07-08 16:31:44 +02:00
wu.chunyang
5261998467 Remove tempest role
Remove tempest role as planned

Change-Id: If3cf073e88c83f670c867a49afe48845f9e81008
2021-07-07 21:58:39 +08:00
Rafael Weingärtner
15f2fdcd5d Make setup module arguments configurable
Ansible facts can have a large impact on the performance of the Ansible
control host. This patch introduces some control over which facts are
gathered (kolla_ansible_setup_gather_subset) and which facts are stored
(kolla_ansible_setup_filter). By default we do not change the default
values of these arguments to the setup module. The flexibility of these
arguments is limited, but they do provide enough for a large performance
improvement in a typical moderate to large OpenStack cloud.

In particular, the large complex dict fact for each interface has a
large effect, and on an OpenStack controller or hypervisor there may be
many virtual interfaces. We can use the kolla_ansible_setup_filter
variable to help:

    kolla_ansible_setup_filter: 'ansible_[!qt]*'

This causes Ansible to collect but not store facts matching that
pattern, which includes the virtual interface facts. Currently we are
not referencing other facts matching the pattern within Kolla Ansible.
Note that including the 'ansible_' prefix causes meta facts module_setup
and gather_subset to be filtered, but this seems to be the only way to
get a good match on the interface facts. To work around this, we use
ansible_facts rather than module_setup to detect whether facts exist in
the cache.

The exact improvement will vary, but has been reported to be as large as
18x on systems with many virtual interfaces.

For reference, here are some other tunings tried:

* Increased the number of forks (great speedup depending of the size of
  the deployment)
* Use `strategy = mitogen_linear` (cut processing time in half)
* Ansible caching (little speed up)
* SSH tunning (little speed up)

Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Closes-Bug: #1921538
Change-Id: Iae8ca4aae945892f1dc65e1b10381d2e26e88805
2021-07-02 10:30:35 -03:00
Zuul
3d7bcca990 Merge "Drop support for Cinder ZFSSA backend" 2021-06-22 02:43:58 +00:00
Zuul
2237e45db3 Merge "Revert "Reduce container metrics cardinality"" 2021-06-21 12:47:19 +00:00
Radosław Piliszek
0158221fd2 Drop support for Cinder ZFSSA backend
Following upstream which removed ZFSSA support in Ussuri [1].

[1] https://review.opendev.org/c/openstack/cinder/+/690137

Change-Id: Idb311e18b437fba696759ecb1cf2a6b4803aa5c5
2021-06-21 09:53:01 +00:00
Radosław Piliszek
640dbb03fa Revert "Reduce container metrics cardinality"
This reverts commit c6259158e3eff4aff9770b7044b0179a7de533aa.

Reason for revert: cAdvisor fails with:

invalid value "percpu,referenced_memory,cpu_topology,resctrl,udp,advtcp,sched,hugetlb,memory_numa,tcp,process" for flag -disable_metrics: unsupported metric "referenced_memory" specified in disable_metrics

Change-Id: I1a0eea5c20f95f38c707401b56b7d2454484377d
2021-06-20 13:58:32 +00:00
Zuul
663be549e0 Merge "Reduce container metrics cardinality" 2021-06-20 11:10:48 +00:00
Piotr Parczewski
c6259158e3 Reduce container metrics cardinality
Adds support for passing extra runtime options to cAdvisor.
By default new options disable exporting rarely useful metrics
and labels by cAdvisor. This helps reducing the load on Prometheus
and cAdvisor itself.

Change-Id: Id0144e8fa518e3236cb94ba2e3961fb455d36443
2021-06-16 08:10:51 +02:00
wu.chunyang
3009109616 Remove rally deployment
Remove rally role as planned

Change-Id: Ic898efe42b21b01c45d4621af2cf90ecd7afc398
2021-06-16 09:12:34 +08:00
Zuul
f5fa171983 Merge "Add ability to use the Neutron packet logging framework" 2021-06-14 14:44:53 +00:00
Zuul
4dcea739d5 Merge "Remove support for panko" 2021-06-11 20:56:40 +00:00
Matthias Runge
ccf8cc5dca Remove support for panko
the project is deprecated and in the process of being removed
from OpenStack upstream.

Change-Id: I9d5ebed293a5fb25f4cd7daa473df152440e8b50
2021-06-11 18:00:05 +02:00
John Garbutt
70f6f8e4c0 Reduce RabbitMQ busy waiting, lowering CPU load
On machines with many cores, we were seeing excessive CPU load on systems
that were not very busy. With the following Erlang VM argument we saw
RabbitMQ CPU usage drop from about 150% to around 20%, on a system with
40 hyperthreads.

    +S 2:2

By default RabbitMQ starts N schedulers where N is the number of CPU
cores, including hyper-threaded cores. This is fine when you assume all
your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical
Kolla Ansible setup. Here we go for two scheduler threads.
More details can be found here:
https://www.rabbitmq.com/runtime.html#scheduling
and here:
https://erlang.org/doc/man/erl.html#emulator-flags

    +sbwt none

This stops busy waiting of the scheduler, for more details see:
https://www.rabbitmq.com/runtime.html#busy-waiting
Newer versions of rabbit may need additional flags:
"+sbwt none +sbwtdcpu none +sbwtdio none"
But this patch should be back portable to older versions of RabbitMQ
used in Train and Stein.

Note that information on this tuning was found by looking at data from:
rabbitmq-diagnostics runtime_thread_stats
More details on that can be found here:
https://www.rabbitmq.com/runtime.html#thread-stats

Related-Bug: #1846467

Change-Id: Iced014acee7e590c10848e73feca166f48b622dc
2021-06-07 13:18:39 +01:00
Florian LEDUC
e923236001 Add ability to use the Neutron packet logging framework
* Enables the Neutron packet logging framework for OVS
(https://docs.openstack.org/neutron/latest/admin/config-logging.html).
* Adds a toggle variable "enable_neutron_packet_logging"

Change-Id: Ica3594cdac634b496949a06ed813dccd18090af4
Implements: blueprint neutron-log-service-plugin
2021-05-11 13:50:49 +02:00
Doug Szumski
82cf40edf2 Remove Monasca Grafana service
In the Xena cycle it was decided to remove the Monasca
Grafana fork due to lack of maintenance. This commit removes
the service and provides a limited workaround using the
Monasca Grafana datasource with vanilla Grafana.

Depends-On: I9db7ec2df050fa20317d84f6cea40d1f5fd42e60
Change-Id: I4917ece1951084f6665722ba9a91d47764d3709a
2021-04-27 11:06:25 +00:00
Mark Goddard
db517a44e4 masakari: support host monitor
Change-Id: I3f43df7766c57622ab8d01a759fbeeef0a0c2b93
Implements: blueprint masakari-hostmonitor
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2021-04-08 16:39:47 +00:00
Gaëtan Trellu
9f578c85e0 Add HAcluster Ansible role
Adds HAcluster Ansible role. This role contains High Availability
clustering solution composed of Corosync, Pacemaker and Pacemaker Remote.

HAcluster is added as a helper role for Masakari which requires it for
its host monitoring, allowing to provide HA to instances on a failed
compute host.

Kolla hacluster images merged in [1].

[1] https://review.opendev.org/#/c/668765/

Change-Id: I91e5c1840ace8f567daf462c4eb3ec1f0c503823
Implements: blueprint ansible-pacemaker-support
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
2021-04-08 06:39:19 +00:00
Radosław Piliszek
b647cb4128 Deprecate and disable chrony by default
Per [1].

[1] http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020707.html

Change-Id: Id6f3cd158bf5d01750971249b11364b6a8631789
Closes-Bug: #1885689
2021-04-06 09:17:51 +00:00
Michal Nasiadka
7a066f7154 Add missing octavia-driver-agent
For using 3rd party Octavia providers (such as OVN provider) an
octavia-driver-agent container must be running to expose those providers to
use.

OVN CI job has been extended with deploying Octavia and testing OVN Load
Balancer.

Closes-Bug: #1903506
Depends-On: https://review.opendev.org/c/openstack/kolla/+/771191

Change-Id: Ibafa8b7307981f2a51e630cc113d18af6162171c
2021-03-24 16:36:44 +00:00
Zuul
0bd235dffc Merge "don't use the same CIDR in octavia_amp_network_cidr and init-run-once" 2021-03-17 16:31:28 +00:00
Zuul
261cce4f45 Merge "Add missing elasticsearch cloudkitty storage and prometheus collector backend support." 2021-03-09 20:18:28 +00:00