1265 Commits

Author SHA1 Message Date
Zuul
5d5c3921c7 Merge "Fix ironic_ipxe healthcheck on Debian/Ubuntu" 2021-07-22 11:57:08 +00:00
Zuul
646be6589d Merge "Fix Masakari host monitor default config" 2021-07-22 02:23:43 +00:00
Mark Goddard
aa28675ca9 Fix ironic_ipxe healthcheck on Debian/Ubuntu
The healthcheck checks for a process called httpd, but these distros
call it apache2.  This results in the ironic_ipxe container being marked
as unhealthy.

This change fixes the issue by making the process name distro dependent.

Change-Id: I0b0126e3071146e7f8593ba970ecbed65b36fcfa
Closes-Bug: #1937037
2021-07-21 10:03:44 +01:00
Pierre Riteau
cccae8a654 Fix typos in release note
Backports: Wallaby, Victoria.

Change-Id: Ib9a5058d6290d805670001ac2b5a42630aeec2b2
2021-07-19 17:19:02 +02:00
Zuul
a43b815b34 Merge "Make setup module arguments configurable" 2021-07-19 12:30:29 +00:00
Zuul
2ecf0a8783 Merge "Support storing passwords in Hashicorp Vault" 2021-07-02 20:28:44 +00:00
Rafael Weingärtner
15f2fdcd5d Make setup module arguments configurable
Ansible facts can have a large impact on the performance of the Ansible
control host. This patch introduces some control over which facts are
gathered (kolla_ansible_setup_gather_subset) and which facts are stored
(kolla_ansible_setup_filter). By default we do not change the default
values of these arguments to the setup module. The flexibility of these
arguments is limited, but they do provide enough for a large performance
improvement in a typical moderate to large OpenStack cloud.

In particular, the large complex dict fact for each interface has a
large effect, and on an OpenStack controller or hypervisor there may be
many virtual interfaces. We can use the kolla_ansible_setup_filter
variable to help:

    kolla_ansible_setup_filter: 'ansible_[!qt]*'

This causes Ansible to collect but not store facts matching that
pattern, which includes the virtual interface facts. Currently we are
not referencing other facts matching the pattern within Kolla Ansible.
Note that including the 'ansible_' prefix causes meta facts module_setup
and gather_subset to be filtered, but this seems to be the only way to
get a good match on the interface facts. To work around this, we use
ansible_facts rather than module_setup to detect whether facts exist in
the cache.

The exact improvement will vary, but has been reported to be as large as
18x on systems with many virtual interfaces.

For reference, here are some other tunings tried:

* Increased the number of forks (great speedup depending of the size of
  the deployment)
* Use `strategy = mitogen_linear` (cut processing time in half)
* Ansible caching (little speed up)
* SSH tunning (little speed up)

Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Closes-Bug: #1921538
Change-Id: Iae8ca4aae945892f1dc65e1b10381d2e26e88805
2021-07-02 10:30:35 -03:00
Radosław Piliszek
f71646da18 Fix Masakari host monitor default config
Closes-Bug: #1933209
Change-Id: I644ad475ca88aac0c22b14163d33a30193fe706a
2021-07-01 18:22:10 +00:00
Zuul
a6f52759b3 Merge "reno: fix typo" 2021-07-01 10:47:29 +00:00
Zuul
ad6dbab260 Merge "magnum: Add CA certificate configuration for internal TLS" 2021-07-01 10:14:33 +00:00
Zuul
bc060c2049 Merge "Use ansible_facts to reference facts" 2021-07-01 02:37:09 +00:00
Scott Solkhon
6bf74aa20d Support storing passwords in Hashicorp Vault
This commit adds two new cli commands to allow an operator
to read and write passwords into a configured Hashicorp Vault
KV.

Change-Id: Icf0eaf7544fcbdf7b83f697cc711446f47118a4d
2021-06-30 15:16:12 +01:00
Zuul
283bbc6663 Merge "Use Docker healthchecks for kafka services" 2021-06-25 01:23:34 +00:00
Zuul
f80bc6d998 Merge "Use Docker healthchecks for rabbitmq services" 2021-06-24 13:17:27 +00:00
wu.chunyang
5316047575 reno: fix typo
follow: https://review.opendev.org/c/openstack/kolla-ansible/+/794359
trivial fix

Change-Id: I7bd24dd660939c81f15d46679454a2663137bdde
2021-06-24 09:10:41 +08:00
Zuul
18fd27feff Merge "Allow user to set sysctl_net_ipv4_tcp_retries2" 2021-06-23 13:57:13 +00:00
Zuul
b22a7726aa Merge "Make it possible to override automatic fluentd version detection" 2021-06-23 13:02:49 +00:00
Michal Arbet
09d0409ed4 Allow user to set sysctl_net_ipv4_tcp_retries2
This patch is adding configuration option to
manipulate with kernel option sysctl_net_ipv4_tcp_retries2.

More informations about kernel option in [1][2]
and RedHat suggestion [3] to set for DBs and HA.

[1]: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
[2]: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
[3]: https://access.redhat.com/solutions/726753

Closes-Bug: #1917068
Change-Id: Ia0decbbfa4e33b1889b635f8bb1c9094567a2ce6
2021-06-23 10:34:12 +00:00
Mark Goddard
ade5bfa302 Use ansible_facts to reference facts
By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.

This change updates all references to Ansible facts within Kolla Ansible
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.

This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.

[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars

Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1
Partially-Implements: blueprint performance-improvements
2021-06-23 10:38:06 +01:00
Mark Goddard
48f0957a1c magnum: Add CA certificate configuration for internal TLS
Magnum has various sections in its configuration file for OpenStack
clients. When internal TLS is enabled, these may need a CA certificate
to be specified.

This change adds a CA certificate configuration, based on
openstack_cacert, for all clients using internal endpoints.

Note: we are explicitly not adding the configuration for the
[magnum_client] ca_file and [drivers] openstack_ca_file options, since
these use the public endpoint by default. These options may be
provided via custom configuration if necessary.

Change-Id: Ie59b3777c0a2c142b580addd67e279bc4b2f2c90
Co-Authored-By: Kyle Dean
Closes-Bug: #1919389
2021-06-23 10:19:13 +01:00
Zuul
46e4f5a33a Merge "Add missing region_name in keystoneauth sections" 2021-06-22 11:08:56 +00:00
Michal Arbet
7da770d290 Add missing region_name in keystoneauth sections
Closes-Bug: #1933025

Change-Id: Ib67d715ddfa986a5b70a55fdda39e6d0e3333162
2021-06-22 08:35:35 +02:00
Zuul
3d7bcca990 Merge "Drop support for Cinder ZFSSA backend" 2021-06-22 02:43:58 +00:00
Radosław Piliszek
3a7440b370 Fix host bootstrap pkg removal on Debian
The variable names are awful but this all agrees with the docs now.

Closes-Bug: #1933122
Change-Id: Icd3d140473886ba3c4847859cddccdb3c1376818
2021-06-21 15:40:46 +00:00
Zuul
2237e45db3 Merge "Revert "Reduce container metrics cardinality"" 2021-06-21 12:47:19 +00:00
Radosław Piliszek
0158221fd2 Drop support for Cinder ZFSSA backend
Following upstream which removed ZFSSA support in Ussuri [1].

[1] https://review.opendev.org/c/openstack/cinder/+/690137

Change-Id: Idb311e18b437fba696759ecb1cf2a6b4803aa5c5
2021-06-21 09:53:01 +00:00
Radosław Piliszek
18a0af6954 Do not set pid file for iscsid
Kolla Ansible runs iscsid in the foreground (-f) and
a recent change to iscsid in CentOS 8 (both Linux and Stream)
caused it to reject setting pid file in such a case.
PID file is irrelevant in this scenario so this commit
removes its parameter.

Closes-Bug: #1933033
Change-Id: Ic0c4beae0c812f3ca68a6ee5cc4daa2fee0f277d
2021-06-20 14:14:53 +00:00
Radosław Piliszek
640dbb03fa Revert "Reduce container metrics cardinality"
This reverts commit c6259158e3eff4aff9770b7044b0179a7de533aa.

Reason for revert: cAdvisor fails with:

invalid value "percpu,referenced_memory,cpu_topology,resctrl,udp,advtcp,sched,hugetlb,memory_numa,tcp,process" for flag -disable_metrics: unsupported metric "referenced_memory" specified in disable_metrics

Change-Id: I1a0eea5c20f95f38c707401b56b7d2454484377d
2021-06-20 13:58:32 +00:00
Zuul
663be549e0 Merge "Reduce container metrics cardinality" 2021-06-20 11:10:48 +00:00
Zuul
754206477c Merge "Remove rally deployment" 2021-06-20 10:55:30 +00:00
Zuul
b113507e05 Merge "chronyd crash loop if Debian server is rebooted" 2021-06-17 09:15:04 +00:00
Zuul
b660f97a5b Merge "Persist nova libvirt secrets in a Docker volume" 2021-06-17 09:14:18 +00:00
Zuul
ffd200f5f1 Merge "octavia: Ensure service auth project exists" 2021-06-17 08:44:02 +00:00
Zuul
4f8a716b1e Merge "baremetal: fix /etc/hosts generation when api_interface has dashes" 2021-06-17 08:42:50 +00:00
Piotr Parczewski
c6259158e3 Reduce container metrics cardinality
Adds support for passing extra runtime options to cAdvisor.
By default new options disable exporting rarely useful metrics
and labels by cAdvisor. This helps reducing the load on Prometheus
and cAdvisor itself.

Change-Id: Id0144e8fa518e3236cb94ba2e3961fb455d36443
2021-06-16 08:10:51 +02:00
wu.chunyang
3009109616 Remove rally deployment
Remove rally role as planned

Change-Id: Ic898efe42b21b01c45d4621af2cf90ecd7afc398
2021-06-16 09:12:34 +08:00
Mark Goddard
3f9662278c Reno follow up for docker_disable_ip_forward
Follow up to I5129136c066489fdfaa4d93741c22e5010b7e89d, adding upgrade
notes.

Related-Bug: #1931615
Change-Id: I2f88b8fc2c6924de9f6bc1840b183ee024c5c1e9
2021-06-15 09:49:45 +01:00
Zuul
3675b442c9 Merge "Disable docker's ip-forward when iptables disabled" 2021-06-14 16:30:09 +00:00
Zuul
f5fa171983 Merge "Add ability to use the Neutron packet logging framework" 2021-06-14 14:44:53 +00:00
Zuul
4dcea739d5 Merge "Remove support for panko" 2021-06-11 20:56:40 +00:00
Matthias Runge
ccf8cc5dca Remove support for panko
the project is deprecated and in the process of being removed
from OpenStack upstream.

Change-Id: I9d5ebed293a5fb25f4cd7daa473df152440e8b50
2021-06-11 18:00:05 +02:00
Zuul
01142ecf2d Merge "Reduce RabbitMQ busy waiting, lowering CPU load" 2021-06-11 09:35:24 +00:00
Radosław Piliszek
0fa4ee56eb Disable docker's ip-forward when iptables disabled
With the new default since Wallaby, starting Docker makes it
enable forwarding and not filter it at all.
This may pose a security risk and should be mitigated.

Closes-Bug: #1931615
Change-Id: I5129136c066489fdfaa4d93741c22e5010b7e89d
2021-06-10 19:02:33 +00:00
Zuul
aa8b8798ac Merge "Fix RabbitMQ restart ordering" 2021-06-08 17:53:11 +00:00
Zuul
8e9b4ced7e Merge "Add forgotten 'Restart container' handler for swift" 2021-06-08 09:51:55 +00:00
Mark Goddard
0cd5b027c9 Fix RabbitMQ restart ordering
The host list order seen during Ansible handlers may differ to the usual
play host list order, due to race conditions in notifying handlers. This
means that restart_services.yml for RabbitMQ may be included in a
different order than the rabbitmq group, resulting in a node other than
the 'first' being restarted first. This can cause some nodes to fail to
join the cluster. The include_tasks loop was introduced in [1].

This change fixes the issue by splitting the handler into two tasks, and
restarting the first node before all others.

[1] https://review.opendev.org/c/openstack/kolla-ansible/+/763137

Change-Id: I1823301d5889589bfd48326ed7de03c6061ea5ba
Closes-Bug: #1930293
2021-06-08 08:20:46 +00:00
Maksim Malchuk
5c19f9a5e0 Add forgotten 'Restart container' handler for swift
Since I0474324b60a5f792ef5210ab336639edf7a8cd9e swift role uses the new
service-cert-copy role introduced in the
I6351147ddaff8b2ae629179a9bc3bae2ebac9519 but the swift role itself
doesn't contain the handler used in the service-cert-copy. Right now,
restarting the swift container isn't necessary, but the handler should
exist. Also we should fix the name of the service used.

Closes-Bug: #1931097
Change-Id: I2d0615ce6914e1f875a2647c8a95b86dd17eeb22
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
2021-06-08 02:48:40 +03:00
John Garbutt
70f6f8e4c0 Reduce RabbitMQ busy waiting, lowering CPU load
On machines with many cores, we were seeing excessive CPU load on systems
that were not very busy. With the following Erlang VM argument we saw
RabbitMQ CPU usage drop from about 150% to around 20%, on a system with
40 hyperthreads.

    +S 2:2

By default RabbitMQ starts N schedulers where N is the number of CPU
cores, including hyper-threaded cores. This is fine when you assume all
your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical
Kolla Ansible setup. Here we go for two scheduler threads.
More details can be found here:
https://www.rabbitmq.com/runtime.html#scheduling
and here:
https://erlang.org/doc/man/erl.html#emulator-flags

    +sbwt none

This stops busy waiting of the scheduler, for more details see:
https://www.rabbitmq.com/runtime.html#busy-waiting
Newer versions of rabbit may need additional flags:
"+sbwt none +sbwtdcpu none +sbwtdio none"
But this patch should be back portable to older versions of RabbitMQ
used in Train and Stein.

Note that information on this tuning was found by looking at data from:
rabbitmq-diagnostics runtime_thread_stats
More details on that can be found here:
https://www.rabbitmq.com/runtime.html#thread-stats

Related-Bug: #1846467

Change-Id: Iced014acee7e590c10848e73feca166f48b622dc
2021-06-07 13:18:39 +01:00
Zuul
dda787fca9 Merge "Bump min Docker version" 2021-06-07 09:00:26 +00:00
Zuul
3337e9873a Merge "chrony: allow to remove the container" 2021-06-07 08:55:19 +00:00