kolla-ansible/doc/source/user/ansible-tuning.rst
Mark Goddard af6e1ca4fd Support Ansible max_fail_percentage
This allows us to continue execution until a certain proportion of hosts
to fail. This can be useful at scale, where failures are common, and
restarting a deployment is time-consuming.

The default max failure percentage is 100, keeping the default
behaviour. A global max failure percentage may be set via
kolla_max_fail_percentage, and individual services may define a max
failure percentage via <service>_max_fail_percentage.

Note that all hosts in the inventory must be reachable for fact
gathering, even those not included in a --limit.

Closes-Bug: #1833737
Change-Id: I808474a75c0f0e8b539dc0421374b06cea44be4f
2023-12-05 11:49:42 +01:00

5.5 KiB

Ansible tuning

In this section we cover some options for tuning Ansible for performance and scale.

SSH pipelining

SSH pipelining is disabled in Ansible by default, but is generally safe to enable, and provides a reasonable performance improvement.

[ssh_connection]
pipelining = True

Forks

By default Ansible executes tasks using a fairly conservative 5 process forks. This limits the parallelism that allows Ansible to scale. Most Ansible control hosts will be able to handle far more forks than this. You will need to experiment to find out the CPU, memory and IO limits of your machine.

For example, to increase the number of forks to 20:

[defaults]
forks = 20

Fact caching

By default, Ansible gathers facts for each host at the beginning of every play, unless gather_facts is set to false. With a large number of hosts this can result in a significant amount of time spent gathering facts.

One way to improve this is through Ansible's support for fact caching. In order to make this work with Kolla Ansible, it is necessary to change Ansible's gathering configuration option to smart.

Example

In the following example we configure Kolla Ansible to use fact caching using the jsonfile cache plugin.

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible-facts

You may also wish to set the expiration timeout for the cache via [defaults] fact_caching_timeout.

Populating the cache

In some situations it may be helpful to populate the fact cache on demand. The kolla-ansible gather-facts command may be used to do this.

One specific case where this may be helpful is when running kolla-ansible with a --limit argument, since in that case hosts that match the limit will gather facts for hosts that fall outside the limit. In the extreme case of a limit that matches only one host, it will serially gather facts for all other hosts. To avoid this issue, run kolla-ansible gather-facts without a limit to populate the fact cache in parallel before running the required command with a limit. For example:

kolla-ansible gather-facts
kolla-ansible deploy --limit control01

Fact variable injection

By default, Ansible injects a variable for every fact, prefixed with ansible_. This can result in a large number of variables for each host, which at scale can incur a performance penalty. Ansible provides a configuration option that can be set to False to prevent this injection of facts. In this case, facts should be referenced via ansible_facts.<fact>. In recent releases of Kolla Ansible, facts are referenced via ansible_facts, allowing users to disable fact variable injection.

[defaults]
inject_facts_as_vars = False

Fact filtering

Ansible facts filtering can be used to speed up Ansible. Environments with many network interfaces on the network and compute nodes can experience very slow processing with Kolla Ansible. This happens due to the processing of the large per-interface facts with each task. To avoid storing certain facts, we can use the kolla_ansible_setup_filter variable, which is used as the filter argument to the setup module. For example, to avoid collecting facts for virtual interfaces beginning with q or t:

kolla_ansible_setup_filter: "ansible_[!qt]*"

This causes Ansible to collect but not store facts matching that pattern, which includes the virtual interface facts. Currently we are not referencing other facts matching the pattern within Kolla Ansible. Note that including the ansible_ prefix causes meta facts module_setup and gather_subset to be filtered, but this seems to be the only way to get a good match on the interface facts.

The exact improvement will vary, but has been reported to be as large as 18x on systems with many virtual interfaces.

Fact gathering subsets

It is also possible to configure which subsets of facts are gathered, via kolla_ansible_setup_gather_subset, which is used as the gather_subset argument to the setup module. For example, if one wants to avoid collecting facts via facter:

kolla_ansible_setup_gather_subset: "all,!facter"

Max failure percentage

It is possible to specify a maximum failure percentage using kolla_max_fail_percentage. By default this is undefined, which is equivalent to a value of 100, meaning that Ansible will continue execution until all hosts have failed or completed. For example:

kolla_max_fail_percentage: 50

A max fail percentage may be set for specific services using <service>_max_fail_percentage. For example:

kolla_max_fail_percentage: 50
nova_max_fail_percentage: 25