69 Commits

Author SHA1 Message Date
Mark Goddard
832989d0a6 nova: use any_errors_fatal for once-per-cell tasks
We run some nova tasks once per cell, using a condition to match a
single host in the cell. In other similar tasks, we use run_once, which
will fail all hosts if the task fails. Typically these tasks are
critical, and that is desirable. However, with the approach used in
nova-cell to support multiple cells, if a once-per-cell task fails, then
other hosts will continue to execute, which could lead to unexpected
results.

This change adds any_errors_fatal to the plays or blocks that run these
tasks.

Closes-Bug: #1948694

Change-Id: I2a5871ccd4e8198171ef3239ce95f475f3e4b051
2022-04-22 10:37:25 +00:00
Zuul
1de1e0f36c Merge "nova: improve compute service registration failure handling" 2022-04-21 21:23:22 +00:00
Mark Goddard
188b328566 libvirt: Fix nova-libvirt-cleanup command
This change addresses an issue in the nova-libvirt-cleanup command,
added in I46854ed7eaf1d5b5e3ccd8531c963427848bdc99.

Check for rc=1 pgrep command, since a lack of matches is a pass.

Also, use bash for set -o pipefail.

Change-Id: Iffda0dfffce8768324ffec55e629134c70e2e996
2022-04-05 08:09:14 +00:00
Mark Goddard
f1d3ff11d0 nova: improve compute service registration failure handling
If any nova compute service fails to register itself, Kolla Ansible will
fail the host that queries the Nova API. This is the first compute host
in the inventory, and fails in the task:

    Waiting for nova-compute services to register themselves

Other hosts continue, often leading to further errors later on. Clearly
this is not idea.

This change modifies the behaviour to query the compute service list
until all expected hosts are present, but does not fail the querying
host if they are not. A new task is added that executes for all hosts,
and fails only those hosts that have not registered successfully.

Alternatively, to fail all hosts in a cell when any compute service
fails to register, set nova_compute_registration_fatal to true.

Change-Id: I12c1928cf1f1fb9e28f1741e7fe4968004ea1816
Closes-Bug: #1940119
2022-03-29 11:26:44 +01:00
Mark Goddard
80b311bef7 libvirt: add nova-libvirt-cleanup command
Change Ia1239069ccee39416b20959cbabad962c56693cf added support for
running a libvirt daemon on the host, rather than using the nova_libvirt
container. It did not cover migration of existing hosts from using a
container to using a host daemon.

This change adds a kolla-ansible nova-libvirt-cleanup command which may
be used to clean up the nova_libvirt container, volumes and related
items on hosts, once it has been disabled.

The playbook assumes that compute hosts have been emptied of VMs before
it runs. A future extension could support migration of existing VMs, but
this is currently out of scope.

Change-Id: I46854ed7eaf1d5b5e3ccd8531c963427848bdc99
2022-03-21 11:54:54 +00:00
Mark Goddard
4e41acd8f0 libvirt: make it possible to run libvirt on the host
In some cases it may be desirable to run the libvirt daemon on the host.
For example, when mixing host and container OS distributions or
versions.

This change makes it possible to disable the nova_libvirt container, by
setting enable_nova_libvirt_container to false. The default values of
some Docker mounts and other paths have been updated to point to default
host directories rather than Docker volumes when using a host libvirt
daemon.

This change does not handle migration of existing systems from using
a nova_libvirt container to libvirt on the host.

Depends-On: https://review.opendev.org/c/openstack/ansible-collection-kolla/+/830504

Change-Id: Ia1239069ccee39416b20959cbabad962c56693cf
2022-03-21 11:54:31 +00:00
Zuul
ed148cd8dd Merge "[external-ceph] Use template instead of copy" 2022-03-19 00:04:33 +00:00
Imran Hussain
4c221be86e [external-ceph] Use template instead of copy
Consistently use template instead of copy. This has the added
advantage of allowing variables inside ceph conf files and keyrings.

Closes-Bug: 1959565

Signed-off-by: Imran Hussain <ih@imranh.co.uk>
Change-Id: Ibd0ff2641a54267ff06d3c89a26915a455dff1c1
2022-03-18 15:09:30 +00:00
Mark Goddard
d2d4b53d47 libvirt: support SASL authentication
In Kolla Ansible OpenStack deployments, by default, libvirt is
configured to allow read-write access via an unauthenticated,
unencrypted TCP connection, using the internal API network.  This is to
facilitate migration between hosts.

By default, Kolla Ansible does not use encryption for services on the
internal network (and did not support it until Ussuri). However, most
other services on the internal network are at least authenticated
(usually via passwords), ensuring that they cannot be used by anyone
with access to the network, unless they have credentials.

The main issue here is the lack of authentication. Any client with
access to the internal network is able to connect to the libvirt TCP
port and make arbitrary changes to the hypervisor. This could include
starting a VM, modifying an existing VM, etc. Given the flexibility of
the domain options, it could be seen as equivalent to having root access
to the hypervisor.

Kolla Ansible supports libvirt TLS [1] since the Train release, using
client and server certificates for mutual authentication and encryption.
However, this feature is not enabled by default, and requires
certificates to be generated for each compute host.

This change adds support for libvirt SASL authentication, and enables it
by default. This provides base level of security. Deployments requiring
further security should use libvirt TLS.

[1] https://docs.openstack.org/kolla-ansible/latest/reference/compute/libvirt-guide.html#libvirt-tls

Depends-On: https://review.opendev.org/c/openstack/kolla/+/833021
Closes-Bug: #1964013
Change-Id: Ia91ceeb609e4cdb144433122b443028c0278b71e
2022-03-10 16:57:16 +00:00
Radosław Piliszek
75b69ea745 Make nova_ssh listen on api_interface as well
This is required as nova_compute tries to reach my_ip of the other
node when resizing an instance and my_ip is set to
api_interface_address.

This potential issue was introduced with [1].

[1] https://review.opendev.org/c/openstack/kolla-ansible/+/569131

Closes-Bug: #1956976
Change-Id: Id57a672c69a2d5aa74e55f252d05bb756bbc945a
2022-01-10 17:10:46 +00:00
Mark Goddard
c93f59cd8e Revert "Do not load br_netfilter"
This reverts commit 15259002beb6b9f35f8eee6529132c6e1a126902.

Reason for revert: The iptables_firewall produces warnings without it.

Change-Id: Id046a3048436c4c18dd1fd9700ac9971d8c42c57
2021-10-27 15:48:43 +00:00
Radosław Piliszek
15259002be Do not load br_netfilter
Nor set related sysctls.
More details in the reno.

Change-Id: I898548ecc6df3caa094c3222159b7ba1e16dc211
Closes-Bug: #1945789
2021-10-01 13:23:54 +00:00
Niklas Hagman
2e933dceb5 Transition Keystone admin user to system scope
A system-scoped token implies the user has authorization to act on the
deployment system. These tokens are useful for interacting with
resources that affect the deployment as a whole, or exposes resources
that may otherwise violate project or domain isolation.

Since Queens, the keystone-manage bootstrap command assigns the admin
role to the admin user with system scope, as well as in the admin
project. This patch transitions the Keystone admin user from
authenticating using project scoped tokens to system scoped tokens.
This is a necessary step towards being able to enable the updated oslo
policies in services that allow finer grained access to system-level
resources and APIs.

An etherpad with discussion about the transition to the new oslo
service policies is:

https://etherpad.opendev.org/p/enabling-system-scope-in-kolla-ansible

Change-Id: Ib631e2211682862296cce9ea179f2661c90fa585
Signed-off-by: Niklas Hagman <ubuntu@post.blinkiz.com>
2021-09-28 09:45:06 -07:00
Michal Arbet
85879afc0b Trivial fix nova's healthchecks
Kolla-ansible upgrade task is calling different
handlers as deploy task and these handlers are
missing healthcheck key. This patch is fixing
this.

Closes-Bug: #1939679
Change-Id: Id83d20bfd89c27ccf70a3a79938f428cdb5d40fc
2021-08-12 13:39:50 +02:00
Radosław Piliszek
9ff2ecb031 Refactor and optimise image pulling
We get a nice optimisation by using a filtered loop instead
of task skipping per service with 'when'.

Partially-Implements: blueprint performance-improvements
Change-Id: I8f68100870ab90cb2d6b68a66a4c97df9ea4ff52
2021-08-10 11:57:54 +00:00
Zuul
01c62fa1b6 Merge "Fix nova deployment failure when rabbitmq is disabled" 2021-08-04 13:12:14 +00:00
Michal Arbet
c281a018c4 Fix freezed spice console in horizon
This trivial patch is setting "timeout tunnel" in haproxy's
configuration for spicehtml5proxy. This option extends time
when spice's websocket connection is closed, so spice will
not be freezed. Default value is set to 1h as it is in novnc.

Closes-Bug: #1938549
Change-Id: I3a5cd98ecf4916ebd0748e7c08111ad0e4dca0b2
2021-08-02 09:55:46 +02:00
wu.chunyang
24d08142d2 Fix nova deployment failure when rabbitmq is disabled
Nova always tries to create the rabbitmq user regardless of
whether RabbitMQ is enabled or not.
This ps also adds an external rabbitmq doc.

Change-Id: Iec517226e4c82ea351889b55689a3efceaadcc76
2021-07-27 22:07:08 +08:00
Mark Goddard
ade5bfa302 Use ansible_facts to reference facts
By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.

This change updates all references to Ansible facts within Kolla Ansible
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.

This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.

[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars

Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1
Partially-Implements: blueprint performance-improvements
2021-06-23 10:38:06 +01:00
Radosław Piliszek
9a77fb1ca0 Add support for Debian Bullseye (11) as host distro
Makes nova-libvirt container always run in 'host' CgroupnsMode
to ensure it works.

Change-Id: I75105baf434977c68bc5c8ca1f5213e602c52c8c
2021-05-30 18:40:12 +00:00
Michał Nasiadka
dbc63244ab nova-cell: Stop printing ceph keys in output
Change-Id: Ib6719a033b37be3e248b682795b7243c60b22b84
2021-03-02 16:24:39 +01:00
Zuul
860c32de76 Merge "Revert "Performance: Use import_tasks in the main plays"" 2020-12-15 19:52:24 +00:00
Mark Goddard
db4fc85c33 Revert "Performance: Use import_tasks in the main plays"
This reverts commit 9cae59be51e8d2d798830042a5fd448a4aa5e7dc.

Reason for revert: This patch was found to introduce issues with fluentd customisation. The underlying issue is not currently fully understood, but could be a sign of other obscure issues.

Change-Id: Ia4859c23d85699621a3b734d6cedb70225576dfc
Closes-Bug: #1906288
2020-12-14 10:36:55 +00:00
Radosław Piliszek
71e9c603b8 Do not set 'always' tag where unnecessary
Makes 'import_tasks' not change behaviour compared to
'include_tasks'.

Change-Id: I600be7c3bd763b3b924bd4a45b4e7b4dca7a33e3
2020-10-27 19:51:46 +01:00
Radosław Piliszek
9cae59be51 Performance: Use import_tasks in the main plays
Main plays are action-redirect-stubs, ideal for import_tasks.

This avoids 'include' penalty and makes logs/ara look nicer.

Fixes haproxy and rabbitmq not to check the host group as well.

Change-Id: I46136fc40b815e341befff80b54a91ef431eabc0
Partially-Implements: blueprint performance-improvements
2020-10-27 19:09:32 +01:00
Radosław Piliszek
3411b9e420 Performance: optimize genconfig
Config plays do not need to check containers. This avoids skipping
tasks during the genconfig action.

Ironic and Glance rolling upgrades are handled specially.

Swift and Bifrost do not use the handlers at all.

Partially-Implements: blueprint performance-improvements
Change-Id: I140bf71d62e8f0932c96270d1f08940a5ba4542a
2020-10-12 19:30:06 +02:00
Zuul
6c5e9321e4 Merge "Allow to skip and unset sysctl vars" 2020-10-08 10:21:31 +00:00
Zuul
21a96db1be Merge "Add support for changing sysctl.conf path" 2020-10-07 16:33:31 +00:00
Michal Nasiadka
c52a89ae04 Use Docker healthchecks for core services
This change enables the use of Docker healthchecks for core OpenStack
services.
Also check-failures.sh has been updated to treat containers with
unhealthy status as failed.

Implements: blueprint container-health-check
Change-Id: I79c6b11511ce8af70f77e2f6a490b59b477fefbb
2020-10-05 08:35:47 +00:00
Radosław Piliszek
bce266201b Allow to skip and unset sysctl vars
via KOLLA_SKIP and KOLLA_UNSET

Change-Id: I7d9af21c2dd8c303066eb1ee4dff7a72bca24283
Related-Bug: #1837551
2020-09-21 13:13:58 +02:00
Radosław Piliszek
6be51fa67a Add support for changing sysctl.conf path
via kolla_sysctl_conf_path

Change-Id: I09b20fa008a7fecedcb599b4792f24215179b853
2020-09-21 11:47:05 +02:00
wu.chunyang
88de8feb7b replace internal with openstack_interface
replace harcode 'internal' with {{ openstack_interface }}

Change-Id: I885622967ffde2a7a1a08fedbde2eb0e4e330e22
2020-09-18 21:42:52 +08:00
Mark Goddard
3c02c966cb Performance: remove one include_tasks in nova-cell
Including tasks has a performance penalty when compared with importing
tasks. The nova-cell role uses include_tasks twice when generating
certificates and keys for libvirt TLS. While a dynamic include makes
sense here for a non-default feature, we can use one include rather than
two with the same effect. Since this task runs against compute nodes the
overhead is significant.

See [1] for benchmarks of include_tasks and import_tasks.

[1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md

Partially-Implements: blueprint performance-improvements

Change-Id: Ic687d2f7d4625aede386e576ebb174da72142756
2020-08-28 16:16:56 +00:00
Mark Goddard
b685ac44e0 Performance: replace unconditional include_tasks with import_tasks
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. For unconditionally included tasks, switching to
import_tasks provides a clear benefit.

Benchmarking of include vs. import is available at [1].

This change switches from include_tasks to import_tasks where there is
no condition applied to the include.

[1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md#task-include-and-import

Partially-Implements: blueprint performance-improvements

Change-Id: Ia45af4a198e422773d9f009c7f7b2e32ce9e3b97
2020-08-28 16:12:03 +00:00
wu.chunyang
817cf80702 replace os-tenant-name with os-project-name in openstackclient
openstackclient doesn't supoort os-temant-name parameter
use os-project-name instead of os-tenant-name

https://docs.openstack.org/python-openstackclient/ussuri/cli/man/openstack.html

Change-Id: Ibf17424c49118b4c3b7e621e04b43c8cdcf308a4
2020-08-22 23:02:30 +08:00
Mark Goddard
9702d4c3c3 Performance: use import_tasks for check-containers.yml
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. In the case of the check-containers.yml include, the
included file only has a single task, so the overhead of skipping this
task will not be greater than the overhead of the task import. It
therefore makes sense to switch to use import_tasks there.

Partially-Implements: blueprint performance-improvements

Change-Id: I65d911670649960708b9f6a4c110d1a7df1ad8f7
2020-07-28 12:10:59 +01:00
Zuul
b0407ffb17 Merge "Make /dev/kvm permissions handling more robust" 2020-07-22 12:32:40 +00:00
Radosław Piliszek
202365e702 Make /dev/kvm permissions handling more robust
This makes use of udev rules to make it smarter and override
host-level packages settings.
Additionally, this masks Ubuntu-only service that is another
pain point in terms of /dev/kvm permissions.
Fingers crossed for no further surprises.

Change-Id: I61235b51e2e1325b8a9b4f85bf634f663c7ec3cc
Closes-bug: #1681461
2020-07-17 17:51:18 +00:00
Mark Goddard
2f91be9f39 Load br_netfilter module in nova-cell role
The nova-cell role sets the following sysctls on compute hosts, which
require the br_netfilter kernel module to be loaded:

    net.bridge.bridge-nf-call-iptables
    net.bridge.bridge-nf-call-ip6tables

If it is not loaded, then we see the following errors:

    Failed to reload sysctl:
    sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory
    sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory

Loading the br_netfilter module resolves this issue.

Typically we do not see this since installing Docker and configuring it
to manage iptables rules causes the br_netfilter module to be loaded.
There are good reasons [1] to disable Docker's iptables management
however, in which case we are likely to hit this issue.

This change loads the br_netfilter module in the nova-cell role for
compute hosts.

[1] https://bugs.launchpad.net/kolla-ansible/+bug/1849275

Co-Authored-By: Dincer Celik <hello@dincercelik.com>

Change-Id: Id52668ba8dab460ad4c33fad430fc8611e70825e
2020-07-08 11:13:39 +01:00
gugug
f220970d46 Clean up the unnecessary "" for include_tasks
The double quotation is not necessary for include_tasks, this
ps to cleanup it.

Change-Id: I0701035d185fdf19286cced7fe51fc277511e4c1
2020-06-16 23:36:42 +08:00
Zuul
e74cada7c1 Merge "permission denied when enable_kolla_dev_mod" 2020-06-10 02:32:45 +00:00
Christian Berendt
60e03d7bf3 Remove XenAPI integration
Change-Id: Iea3f4f3d2e5c6040c1e0bc7bfae8719cc7d8ac55
2020-06-09 13:56:17 +02:00
wu.chunyang
3e9a648601 permission denied when enable_kolla_dev_mod
non-root user has no permission to create directory under /opt
directory. use "become: true" to resolve it.

Change-Id: I155efc4b1e0691da0aaf6ef19ca709e9dc2d9168
2020-06-07 19:36:42 +08:00
Zuul
76d69cae0e Merge "Fix nova cell message queue URL with separate notification queue" 2020-04-26 16:46:35 +00:00
Zuul
7a193d1f06 Merge "Ansible lint: lines longer than 160 chars" 2020-04-17 09:29:00 +00:00
Zuul
87984f5425 Merge "Add Ansible group check to prechecks" 2020-04-16 15:33:46 +00:00
Zuul
2e2672e753 Merge "Fix nova compute addition with limit" 2020-04-16 15:33:44 +00:00
Zuul
7f42813159 Merge "Refactor copy certificates task" 2020-04-16 14:03:37 +00:00
Michal Nasiadka
d403690b88 Ansible lint: lines longer than 160 chars
Change-Id: I500cc8800c412bc0e95edb15babad5c1189e6ee4
2020-04-16 15:59:06 +02:00
Mark Goddard
e8ad5f37d4 Fix nova cell message queue URL with separate notification queue
If using a separate message queue for nova notifications, i.e.
nova_cell_notify_transport_url is different from
nova_cell_rpc_transport_url, then Kolla Ansible will unnecessarily
update the cell. This should not cause any issues since the URL is taken
from nova.conf.

This change fixes the comparison to use the correct URL.

Change-Id: I5f0e30957bfd70295f2c22c86349ebbb4c1fb155
Closes-Bug: #1873255
2020-04-16 12:32:40 +01:00