Ansible facts can have a large impact on the performance of the Ansible
control host. This patch introduces some control over which facts are
gathered (kolla_ansible_setup_gather_subset) and which facts are stored
(kolla_ansible_setup_filter). By default we do not change the default
values of these arguments to the setup module. The flexibility of these
arguments is limited, but they do provide enough for a large performance
improvement in a typical moderate to large OpenStack cloud.
In particular, the large complex dict fact for each interface has a
large effect, and on an OpenStack controller or hypervisor there may be
many virtual interfaces. We can use the kolla_ansible_setup_filter
variable to help:
kolla_ansible_setup_filter: 'ansible_[!qt]*'
This causes Ansible to collect but not store facts matching that
pattern, which includes the virtual interface facts. Currently we are
not referencing other facts matching the pattern within Kolla Ansible.
Note that including the 'ansible_' prefix causes meta facts module_setup
and gather_subset to be filtered, but this seems to be the only way to
get a good match on the interface facts. To work around this, we use
ansible_facts rather than module_setup to detect whether facts exist in
the cache.
The exact improvement will vary, but has been reported to be as large as
18x on systems with many virtual interfaces.
For reference, here are some other tunings tried:
* Increased the number of forks (great speedup depending of the size of
the deployment)
* Use `strategy = mitogen_linear` (cut processing time in half)
* Ansible caching (little speed up)
* SSH tunning (little speed up)
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Closes-Bug: #1921538
Change-Id: Iae8ca4aae945892f1dc65e1b10381d2e26e88805
Adds a new variable, 'disable_firewall', which defaults to true. If set
to false, then the host firewall will not be disabled during
kolla-ansible bootstrap-servers.
Change-Id: Ie5131013012f89c8c3b91ca359ad17d9cb77efc8
This commit adds two new cli commands to allow an operator
to read and write passwords into a configured Hashicorp Vault
KV.
Change-Id: Icf0eaf7544fcbdf7b83f697cc711446f47118a4d
By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.
This change updates all references to Ansible facts within Kolla Ansible
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.
This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.
[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars
Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1
Partially-Implements: blueprint performance-improvements
On machines with many cores, we were seeing excessive CPU load on systems
that were not very busy. With the following Erlang VM argument we saw
RabbitMQ CPU usage drop from about 150% to around 20%, on a system with
40 hyperthreads.
+S 2:2
By default RabbitMQ starts N schedulers where N is the number of CPU
cores, including hyper-threaded cores. This is fine when you assume all
your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical
Kolla Ansible setup. Here we go for two scheduler threads.
More details can be found here:
https://www.rabbitmq.com/runtime.html#scheduling
and here:
https://erlang.org/doc/man/erl.html#emulator-flags
+sbwt none
This stops busy waiting of the scheduler, for more details see:
https://www.rabbitmq.com/runtime.html#busy-waiting
Newer versions of rabbit may need additional flags:
"+sbwt none +sbwtdcpu none +sbwtdio none"
But this patch should be back portable to older versions of RabbitMQ
used in Train and Stein.
Note that information on this tuning was found by looking at data from:
rabbitmq-diagnostics runtime_thread_stats
More details on that can be found here:
https://www.rabbitmq.com/runtime.html#thread-stats
Related-Bug: #1846467
Change-Id: Iced014acee7e590c10848e73feca166f48b622dc
In the Xena cycle it was decided to remove the Monasca
Grafana fork due to lack of maintenance. This commit removes
the service and provides a limited workaround using the
Monasca Grafana datasource with vanilla Grafana.
Depends-On: I9db7ec2df050fa20317d84f6cea40d1f5fd42e60
Change-Id: I4917ece1951084f6665722ba9a91d47764d3709a
The current behaviour is to support supplying a single
folder of Grafana dashboards which can then be populated
into a single folder in Grafana. Some users may wish
to have sub-folders of Dashboards, and load these into
separate dashboard folders in Grafana via a custom
provisioning file. For example, a user may have a
sub-folder of Ceph dashboards that they wish to keep
separate from OpenStack dashboards. This patch supports
sub-folders whilst not affecting the original mechanism.
Trivial-Fix
Change-Id: I9cd289a1ea79f00cee4d2ef30cbb508ac73f9767
Allow users to import custom grafana dashboards.
Dashboards as JSON files should be placed into
"{{ node_custom_config }}/grafana/dashboards/" folder.
Change-Id: Id0f83b8d08541b3b74649f097b10c9450201b426
This change allows a user to forward control plane logs
directly to Elasticsearch from Fluentd, rather than via
the Monasca Log API when Monasca is enabled. The Monasca
Log API can continue to handle tenant logs.
For many use cases this is simpler, reduces resource
consumption and helps to decouple control plane logging
services from tenant logging services.
It may not always be desired, so is optional and off by
default.
Change-Id: I195e8e4b73ca8f573737355908eb30a3ef13b0d6
The Monasca alerting pipeline provides multi-tenancy alerts and
notifications. It runs as an Apache Storm topology and generally
places a significant memory and CPU burden on monitoring hosts,
particularly when there are lot of metrics. This is fine if the
alerting service is in use, but sometimes it is not. For example
you may use Prometheus for monitoring the control plane, and
wish to offer tenants a monitoring service via Monasca without
alerting and notification functionality. In this case it makes
sense to disable this part of the Monasca pipeline and this patch
adds support for that.
If the service is ever re-enabled, all alerts and notifications
should spawn back automatically since they are persisted in the
central mysql database cluster.
Change-Id: I84aa04125c621712f805f41c8efbc92c8e156db9