This reverts commit 9cae59be51e8d2d798830042a5fd448a4aa5e7dc.
Reason for revert: This patch was found to introduce issues with fluentd customisation. The underlying issue is not currently fully understood, but could be a sign of other obscure issues.
Change-Id: Ia4859c23d85699621a3b734d6cedb70225576dfc
Closes-Bug: #1906288
Main plays are action-redirect-stubs, ideal for import_tasks.
This avoids 'include' penalty and makes logs/ara look nicer.
Fixes haproxy and rabbitmq not to check the host group as well.
Change-Id: I46136fc40b815e341befff80b54a91ef431eabc0
Partially-Implements: blueprint performance-improvements
Config plays do not need to check containers. This avoids skipping
tasks during the genconfig action.
Ironic and Glance rolling upgrades are handled specially.
Swift and Bifrost do not use the handlers at all.
Partially-Implements: blueprint performance-improvements
Change-Id: I140bf71d62e8f0932c96270d1f08940a5ba4542a
This change enables the use of Docker healthchecks for core OpenStack
services.
Also check-failures.sh has been updated to treat containers with
unhealthy status as failed.
Implements: blueprint container-health-check
Change-Id: I79c6b11511ce8af70f77e2f6a490b59b477fefbb
When the internal VIP is moved in the event of a failure of the active
controller, OpenStack services can become unresponsive as they try to
talk with MariaDB using connections from the SQLAlchemy pool.
It has been argued that OpenStack doesn't really need to use connection
pooling with MariaDB [1]. This commit reduces the use of connection
pooling via two configuration options:
- max_pool_size is set to 1 to allow only a single connection in the
pool (it is not possible to disable connection pooling entirely via
oslo.db, and max_pool_size = 0 means unlimited pool size)
- lower connection_recycle_time from the default of one hour to 10
seconds, which means the single connection in the pool will be
recreated regularly
These settings have shown better reactivity of the system in the event
of a failover.
[1] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html
Change-Id: Ib6a62d4428db9b95569314084090472870417f3d
Closes-Bug: #1896635
This change adds support for encryption of communication between
OpenStack services and RabbitMQ. Server certificates are supported, but
currently client certificates are not.
The kolla-ansible certificates command has been updated to support
generating certificates for RabbitMQ for development and testing.
RabbitMQ TLS is enabled in the all-in-one source CI jobs, or when
The Zuul 'tls_enabled' variable is true.
Change-Id: I4f1d04150fb2b5af085b762890092f87ae6076b5
Implements: blueprint message-queue-ssl-support
Including tasks has a performance penalty when compared with importing
tasks. The nova-cell role uses include_tasks twice when generating
certificates and keys for libvirt TLS. While a dynamic include makes
sense here for a non-default feature, we can use one include rather than
two with the same effect. Since this task runs against compute nodes the
overhead is significant.
See [1] for benchmarks of include_tasks and import_tasks.
[1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md
Partially-Implements: blueprint performance-improvements
Change-Id: Ic687d2f7d4625aede386e576ebb174da72142756
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. For unconditionally included tasks, switching to
import_tasks provides a clear benefit.
Benchmarking of include vs. import is available at [1].
This change switches from include_tasks to import_tasks where there is
no condition applied to the include.
[1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md#task-include-and-import
Partially-Implements: blueprint performance-improvements
Change-Id: Ia45af4a198e422773d9f009c7f7b2e32ce9e3b97
* Multipath daemon allows to reach block devices
via multiple paths for better resiliency and performance.
Multipathd periodically checks the failed iscsi paths
and maintains a list of valid paths. Libvirt can use more
than one iSCSI path when option volume_use_multipath is set
and when multipathd enabled.
Change-Id: I54629656803c4989f7673e8c69d2a820609b5960
Implements: blueprint nova-libvirt-multipath-iscsi
The goal for this push request is to normalize the construction and use
of internal, external, and admin URLs. While extending Kolla-ansible
to enable a more flexible method to manage external URLs, we noticed
that the same URL was constructed multiple times in different parts
of the code. This can make it difficult for people that want to work
with these URLs and create inconsistencies in a large code base with
time. Therefore, we are proposing here the use of
"single Kolla-ansible variable" per endpoint URL, which facilitates
for people that are interested in overriding/extending these URLs.
As an example, we extended Kolla-ansible to facilitate the "override"
of public (external) URLs with the following standard
"<component/serviceName>.<companyBaseUrl>".
Therefore, the "NAT/redirect" in the SSL termination system (HAproxy,
HTTPD or some other) is done via the service name, and not by the port.
This allows operators to easily and automatically create more friendly
URL names. To develop this feature, we first applied this patch that
we are sending now to the community. We did that to reduce the surface
of changes in Kolla-ansible.
Another example is the integration of Kolla-ansible and Consul, which
we also implemented internally, and also requires URLs changes.
Therefore, this PR is essential to reduce code duplicity, and to
facility users/developers to work/customize the services URLs.
Change-Id: I73d483e01476e779a5155b2e18dd5ea25f514e93
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
Previously we mounted /etc/timezone if the kolla_base_distro is debian
or ubuntu. This would fail prechecks if debian or ubuntu images were
deployed on CentOS. While this is not a supported combination, for
correctness we should fix the condition to reference the host OS rather
than the container OS, since that is where the /etc/timezone file is
located.
Change-Id: Ifc252ae793e6974356fcdca810b373f362d24ba5
Closes-Bug: #1882553
This patch is a continuation of
I6a174468bd91d214c08477b93c88032a45c137be for the nova-cell role, which
was missed.
The Castellan (Barbican client) has different parameters to control
the used CA file.
This patch uses them.
Moreover, this aligns Barbican with other services by defaulting
its client config to the internal endpoint.
See also [1].
[1] https://bugs.launchpad.net/castellan/+bug/1876102
Closes-Bug: #1886615
Change-Id: I056f3eebcf87bcbaaf89fdd0dc1f46d143db7785
Including tasks has a performance penalty when compared with importing
tasks. If the include has a condition associated with it, then the
overhead of the include may be lower than the overhead of skipping all
imported tasks. In the case of the check-containers.yml include, the
included file only has a single task, so the overhead of skipping this
task will not be greater than the overhead of the task import. It
therefore makes sense to switch to use import_tasks there.
Partially-Implements: blueprint performance-improvements
Change-Id: I65d911670649960708b9f6a4c110d1a7df1ad8f7
This makes use of udev rules to make it smarter and override
host-level packages settings.
Additionally, this masks Ubuntu-only service that is another
pain point in terms of /dev/kvm permissions.
Fingers crossed for no further surprises.
Change-Id: I61235b51e2e1325b8a9b4f85bf634f663c7ec3cc
Closes-bug: #1681461
The nova-cell role sets the following sysctls on compute hosts, which
require the br_netfilter kernel module to be loaded:
net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-ip6tables
If it is not loaded, then we see the following errors:
Failed to reload sysctl:
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory
Loading the br_netfilter module resolves this issue.
Typically we do not see this since installing Docker and configuring it
to manage iptables rules causes the br_netfilter module to be loaded.
There are good reasons [1] to disable Docker's iptables management
however, in which case we are likely to hit this issue.
This change loads the br_netfilter module in the nova-cell role for
compute hosts.
[1] https://bugs.launchpad.net/kolla-ansible/+bug/1849275
Co-Authored-By: Dincer Celik <hello@dincercelik.com>
Change-Id: Id52668ba8dab460ad4c33fad430fc8611e70825e
The common role was previously added as a dependency to all other roles.
It would set a fact after running on a host to avoid running twice. This
had the nice effect that deploying any service would automatically pull
in the common services for that host. When using tags, any services with
matching tags would also run the common role. This could be both
surprising and sometimes useful.
When using Ansible at large scale, there is a penalty associated with
executing a task against a large number of hosts, even if it is skipped.
The common role introduces some overhead, just in determining that it
has already run.
This change extracts the common role into a separate play, and removes
the dependency on it from all other roles. New groups have been added
for cron, fluentd, and kolla-toolbox, similar to other services. This
changes the behaviour in the following ways:
* The common role is now run for all hosts at the beginning, rather than
prior to their first enabled service
* Hosts must be in the necessary group for each of the common services
in order to have that service deployed. This is mostly to avoid
deploying on localhost or the deployment host
* If tags are specified for another service e.g. nova, the common role
will *not* automatically run for matching hosts. The common tag must
be specified explicitly
The last of these is probably the largest behaviour change. While it
would be possible to determine which hosts should automatically run the
common role, it would be quite complex, and would introduce some
overhead that would probably negate the benefit of splitting out the
common role.
Partially-Implements: blueprint performance-improvements
Change-Id: I6a4676bf6efeebc61383ec7a406db07c7a868b2a
Change I810aad7d49db3f5a7fd9a2f0f746fd912fe03917 for supporting multiple
Nova cells updated the list of containers that require a policy file to
only include nova-api, nova-compute, and nova-compute-ironic.
The nova-conductor config.json template was left unchanged and fails to
copy the nova policy file into its container. This can be seen on a
fresh deployment, but might be missed on an upgrade if an older policy
file is still available in /etc/kolla/nova-conductor.
This commit removes the nova_policy_file block from the nova-conductor
config.json template, as it shouldn't be required.
Backport: ussuri, train
Change-Id: I17256b182d207aeba3f92c65a6d7cf3611180558
Closes-Bug: #1886170
when enable kolla_dev_mod, nova-cell role clones code failed,
because we use nova-cell repository which is not exists.
in fact, nova-cell role should use nova repository too
Change-Id: I7fa62726d0d5b0aeb3bd5fa06dc0e59667f94fa0
non-root user has no permission to create directory under /opt
directory. use "become: true" to resolve it.
Change-Id: I155efc4b1e0691da0aaf6ef19ca709e9dc2d9168
The RabbitMQ 'openstack' user has the 'administrator' tag assigned via
the RabbitMQ definitions.json file.
Since the Train release, the nova-cell role also configures the RabbitMQ
user, but omits the tag. This causes the tag to be removed from the
user, which prevents it from accessing the management UI and API.
This change adds support for configuring user tags to the
service-rabbitmq role, and sets the administrator tag by default.
Change-Id: I7a5d6fe324dd133e0929804d431583e5b5c1853d
Closes-Bug: #1875786
Nova cells support introduced a slight regression that triggers
odd behaviour when we tried switching to Apache (httpd) [1].
Bootstrap no longer applied permissions recursively to all log
files, creating a discrepancy between normal and bootstrap runs
and also Nova and other services such as Cinder (regarding
bootstrap logging).
This patch fixes it.
Backport to Train.
Not creating reno nor a bug record because it does not affect
any current standard usage in any currently known way.
Note this only really hides (standardizes?) the global issue that
we don't control file permissions on newly created files too well.
[1] https://review.opendev.org/724793
Change-Id: I35e9924ccede5edd2e1307043379aba944725143
Needed-By: https://review.opendev.org/724793
If using a separate message queue for nova notifications, i.e.
nova_cell_notify_transport_url is different from
nova_cell_rpc_transport_url, then Kolla Ansible will unnecessarily
update the cell. This should not cause any issues since the URL is taken
from nova.conf.
This change fixes the comparison to use the correct URL.
Change-Id: I5f0e30957bfd70295f2c22c86349ebbb4c1fb155
Closes-Bug: #1873255