The nova-cell role sets the following sysctls on compute hosts, which
require the br_netfilter kernel module to be loaded:
net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-ip6tables
If it is not loaded, then we see the following errors:
Failed to reload sysctl:
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory
Loading the br_netfilter module resolves this issue.
Typically we do not see this since installing Docker and configuring it
to manage iptables rules causes the br_netfilter module to be loaded.
There are good reasons [1] to disable Docker's iptables management
however, in which case we are likely to hit this issue.
This change loads the br_netfilter module in the nova-cell role for
compute hosts.
[1] https://bugs.launchpad.net/kolla-ansible/+bug/1849275
Co-Authored-By: Dincer Celik <hello@dincercelik.com>
Change-Id: Id52668ba8dab460ad4c33fad430fc8611e70825e
The value should be the full path to the keyring file, not just the
name. Without this fix Gnocchi fails to connect to Ceph.
Change-Id: Iaa69b2096b09a448345de50911e21436875d48d6
Closes-Bug: #1886711
The common role was previously added as a dependency to all other roles.
It would set a fact after running on a host to avoid running twice. This
had the nice effect that deploying any service would automatically pull
in the common services for that host. When using tags, any services with
matching tags would also run the common role. This could be both
surprising and sometimes useful.
When using Ansible at large scale, there is a penalty associated with
executing a task against a large number of hosts, even if it is skipped.
The common role introduces some overhead, just in determining that it
has already run.
This change extracts the common role into a separate play, and removes
the dependency on it from all other roles. New groups have been added
for cron, fluentd, and kolla-toolbox, similar to other services. This
changes the behaviour in the following ways:
* The common role is now run for all hosts at the beginning, rather than
prior to their first enabled service
* Hosts must be in the necessary group for each of the common services
in order to have that service deployed. This is mostly to avoid
deploying on localhost or the deployment host
* If tags are specified for another service e.g. nova, the common role
will *not* automatically run for matching hosts. The common tag must
be specified explicitly
The last of these is probably the largest behaviour change. While it
would be possible to determine which hosts should automatically run the
common role, it would be quite complex, and would introduce some
overhead that would probably negate the benefit of splitting out the
common role.
Partially-Implements: blueprint performance-improvements
Change-Id: I6a4676bf6efeebc61383ec7a406db07c7a868b2a
Change I810aad7d49db3f5a7fd9a2f0f746fd912fe03917 for supporting multiple
Nova cells updated the list of containers that require a policy file to
only include nova-api, nova-compute, and nova-compute-ironic.
The nova-conductor config.json template was left unchanged and fails to
copy the nova policy file into its container. This can be seen on a
fresh deployment, but might be missed on an upgrade if an older policy
file is still available in /etc/kolla/nova-conductor.
This commit removes the nova_policy_file block from the nova-conductor
config.json template, as it shouldn't be required.
Backport: ussuri, train
Change-Id: I17256b182d207aeba3f92c65a6d7cf3611180558
Closes-Bug: #1886170
In Fluentd v0.12, both the in memory and file buffer chunk size default
to 8MB. In v1.0 the file buffer defaults to 256MB. This can exceed the
Monasca Log or Unified API maximum chunk size which is set to 10MB.
This can result in logs being rejected and filling the local buffer
on disk.
Change-Id: I9c495773db726a3c5cd94b819dff4141737a1d6e
Closes-Bug: #1885885
Co-Authored-By: Sebastian Luna Valero <sebastian.luna.valero@gmail.com>
While all other clients should use internalURL, the Magnum client itself
and Keystone interface for trustee credentials should be publicly
accessible (upstream default when no config is specified) since
instances need to be able to reach them.
Closes-Bug: #1885420
Change-Id: I74359cec7147a80db24eb4aa4156c35d31a026bf
There were two issues with it. Lack of /usr/local/bin in PATH
for CentOS and wrong crontab path for Ubuntu/Debian.
This patch mirrors how it is handled in keystone.
Change-Id: Ib54b261e12c409d66b792648807646015826e83c
Closes-Bug: #1885732
The etcd service protocol is currently configured with internal_protocol.
The etcd service is not load balanced by a HAProxy container, so
there is no proxy layer to do TLS termination when internal_protocol
is configured to be "https".
Until the etcd service is configured to deploy with native TLS
termination, the etcd uses should be independent of
internal_protocol, and "http" by default.
Change-Id: I730c02331514244e44004aa06e9399c01264c65d
Closes-Bug: 1884137
Currently openvswitch sets system-id based on inventory_hostname, but when
Ansible inventory contains ip addresses - then it will only take first ip
octet - resulting in multiple OVN chassis being named i.e. "10".
Then Neutron and OVN have problems functioning, because a chassis named "10"
will be created and deleted multiple times per second - this ends up in
ovsdb and neutron-server processes using up to 100% CPU.
Adding openvswitch role to ovn CI job triggers.
Change-Id: Id22eb3e74867230da02543abd93234a5fb12b31d
Closes-Bug: #1884734
Currently, if internal TLS communication is enabled, Kibana to
Elasticsearch communication is unverified. This is because we set
elasticsearch.ssl.verificationMode to 'none' by default (via
kibana_elasticsearch_ssl_verify). This is poor a security
posture.
This change changes the default value of
'kibana_elasticsearch_ssl_verify' to 'true'.
Change-Id: Ie4fa8e3a60d69cf5c4bdd975030c92be8113ffb1
Closes-Bug: #1885110
Currently there is no way to configure a CA certificate bundle file for
fluentd to Elasticsearch communication. This change adds a new variable,
'fluentd_elasticsearch_cacert' with a default value set to the value of
'openstack_cacert.
Closes-Bug: #1885109
Change-Id: I5bbf55a4dd4ccce9fa2635cee720139c088268e3
Change openvswitch & neutron-openvswitch-agent to deploy only
with manila generic backend - which uses ovs-vsctl functionality
when configuring share servers.
Change-Id: I124108cda62b38ea498612ff9ddb07d6122a330c
Closes-Bug: #1884939
Magnum, Cinder and Octavia clients in Magnum now use endpoint_type of
internalURL by default consistent with other clients also used by the
conductor. Additionally, they also use the globally defined
`openstack_region_name` for region_name.
Closes-Bug: #1885096
Change-Id: Ibec511013760cc4f681a2ec1b769b532be3daf2d
Added a spec file for this blueprint.
Changed the kolla-ansible script to accept more than one
globals.yml file. That will still be the main one but operators
will be able to create more, under the /etc/kolla/globals.d
directory.
Also added some paragraphs in the quickstart documentation
about this.
Finally, Adding a release note
Change-Id: I34eb91d0e2ed80694594b8fc6801cf8ad77da754
Implements: blueprint multiple-globals-files
Recently a patch [1] was merged to stop adding the octavia user to the
admin project, and remove it on upgrade. However, the octavia
configuration was not updated to use the service project, causing load
balancer creation to fail.
There is also an issue for existing deployments in simply switching to
the service project. While existing load balancers appear to continue to
work, creating new load balancers fails due to the security group
belonging to the admin project. At a minimum, the deployer needs to
create a security group in the service project, and update
'octavia_amp_secgroup_list' to match its ID. Ideally the flavor and
network would also be recreated in the service project, although this
does not seem to impact operation and will result in downtime for
existing Amphorae.
This change adds a new variable, 'octavia_service_auth_project', that
can be used to set the project. The default in Ussuri is 'service',
switching to the new behaviour. For backports of this patch it should be
switched to 'admin' to maintain compatibility.
If a deployer sets 'octavia_service_auth_project' to 'admin', the
octavia user will be assigned the admin role in the admin project, as
was done previously.
Closes-Bug: #1882643
Related-Bug: #1873176
[1] https://review.opendev.org/720243/
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I1efd0154ebaee69373ae5bccd391ee9c68d09b30
Replaced "kolla_external_fqdn_cacert" and "kolla_internal_fqdn_cacert" with
"kolla_admin_openrc_cacert". OS_CACERT is now set to the value of
"kolla_admin_openrc_cacert" in the generated admin-openrc.sh file.
Change-Id: If195d5402579cee9a14b91f63f5fde84eb84cccf
Partially-Implements: blueprint add-ssl-internal-network
Depends-On: https://review.opendev.org/#/c/731344/
Update the certificate generation task to create a root CA for the
self-signed certificates. The internal and external facing certificates
are then generated using the root CA.
Updated openstack_cacert to use system CA trust store in CI tests
certificate by default.
Change-Id: I6c2adff7d0128146cf086103ff6060b0dcefa37b
Partially-Implements: blueprint add-ssl-internal-network
During an upgrade from Stein to Train, Kolla Ansible fails while running
TASK [cinder : Running Cinder online schema migration]
This is because the `--max_count 10` option is used, which returns 1
while migrations are processed. According to the upgrade documentation,
the command should be rerun while the exit status is 1:
https://docs.openstack.org/cinder/train/upgrade.html
This issue was introduced by a change to the image [1] which fixed a bug
in the way that the max count was interpreted, but exposed an issue in
using the max count.
This change fixes the issue by ceasing to pass MAX_NUMBER, which will
cause all migrations to occur in a single pass.
[1] https://review.opendev.org/#/c/712055
Change-Id: Ia786d037f5484f18294188639c956d4ed5ffbc2a
Closes-Bug: #1880753