This change enables the use of Docker healthchecks for core OpenStack
services.
Also check-failures.sh has been updated to treat containers with
unhealthy status as failed.
Implements: blueprint container-health-check
Change-Id: I79c6b11511ce8af70f77e2f6a490b59b477fefbb
The goal for this push request is to normalize the construction and use
of internal, external, and admin URLs. While extending Kolla-ansible
to enable a more flexible method to manage external URLs, we noticed
that the same URL was constructed multiple times in different parts
of the code. This can make it difficult for people that want to work
with these URLs and create inconsistencies in a large code base with
time. Therefore, we are proposing here the use of
"single Kolla-ansible variable" per endpoint URL, which facilitates
for people that are interested in overriding/extending these URLs.
As an example, we extended Kolla-ansible to facilitate the "override"
of public (external) URLs with the following standard
"<component/serviceName>.<companyBaseUrl>".
Therefore, the "NAT/redirect" in the SSL termination system (HAproxy,
HTTPD or some other) is done via the service name, and not by the port.
This allows operators to easily and automatically create more friendly
URL names. To develop this feature, we first applied this patch that
we are sending now to the community. We did that to reduce the surface
of changes in Kolla-ansible.
Another example is the integration of Kolla-ansible and Consul, which
we also implemented internally, and also requires URLs changes.
Therefore, this PR is essential to reduce code duplicity, and to
facility users/developers to work/customize the services URLs.
Change-Id: I73d483e01476e779a5155b2e18dd5ea25f514e93
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
This patch introduces an optional backend encryption for the Nova API
service. When used in conjunction with enabling TLS for service API
endpoints, network communcation will be encrypted end to end, from
client through HAProxy to the Nova service.
Change-Id: I48e1540b973016079d5686b328e82239dcffacfd
Partially-Implements: blueprint add-ssl-internal-network
Previously we mounted /etc/timezone if the kolla_base_distro is debian
or ubuntu. This would fail prechecks if debian or ubuntu images were
deployed on CentOS. While this is not a supported combination, for
correctness we should fix the condition to reference the host OS rather
than the container OS, since that is where the /etc/timezone file is
located.
Change-Id: Ifc252ae793e6974356fcdca810b373f362d24ba5
Closes-Bug: #1882553
Some services look for /etc/timezone on Debian/Ubuntu, so we should
introduce it to the containers.
In addition, added prechecks for /etc/localtime and /etc/timezone.
Closes-Bug: #1821592
Change-Id: I9fef14643d1bcc7eee9547eb87fa1fb436d8a6b3
In dev mode currently the python source is mounted under python2.7
site-packages. This change fixes this to use the distro_python_version
variable to ensure dev mode works with Python 3 images.
Change-Id: Ieae3778a02f1b79023b4f1c20eff27b37f481077
Partially-Implements: blueprint python-3
For the CentOS 7 to 8 transition, we will have a period where both
CentOS 7 and 8 images are available. We differentiate these images via a
tag - the CentOS 8 images will have a tag of train-centos8 (or
master-centos8 temporarily).
To achieve this, and maintain backwards compatibility for the
openstack_release variable, we introduce a new 'openstack_tag' variable.
This variable is based on openstack_release, but has a suffix of
'openstack_tag_suffix', which is empty except on CentOS 8 where it has a
value of '-centos8'.
Change-Id: I12ce4661afb3c255136cdc1aabe7cbd25560d625
Partially-Implements: blueprint centos-rhel-8
This patch adds initial support for deploying multiple Nova cells.
Splitting a nova-cell role out from the Nova role allows a more granular
approach to deploying and configuring Nova services.
A new enable_cells flag has been added that enables the support of
multiple cells via the introduction of a super conductor in addition to
cell-specific conductors. When this flag is not set (the default), nova
is configured in the same manner as before - with a single conductor.
The nova role now deploys the global services:
* nova-api
* nova-scheduler
* nova-super-conductor (if enable_cells is true)
The nova-cell role handles services specific to a cell:
* nova-compute
* nova-compute-ironic
* nova-conductor
* nova-libvirt
* nova-novncproxy
* nova-serialproxy
* nova-spicehtml5proxy
* nova-ssh
This patch does not support using a single cell controller for managing
more than one cell. Support for sharing a cell controller will be added
in a future patch.
This patch should be backwards compatible and is tested by existing CI
jobs. A new CI job has been added that tests a multi-cell environment.
ceph-mon has been removed from the play hosts list as it is not
necessary - delegate_to does not require the host to be in the play.
Documentation will be added in a separate patch.
Partially Implements: blueprint support-nova-cells
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I810aad7d49db3f5a7fd9a2f0f746fd912fe03917
Introduce kolla_address filter.
Introduce put_address_in_context filter.
Add AF config to vars.
Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]
Other changes:
globals.yml - mention just IP in comment
prechecks/port_checks (api_intf) - kolla_address handles validation
3x interface conditional (swift configs: replication/storage)
2x interface variable definition with hostname
(haproxy listens; api intf)
1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)
neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network
basic multinode source CI job for IPv6
prechecks for rabbitmq and qdrouterd use proper NSS database now
MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)
Ceph naming workaround in CI
TODO: probably needs documenting
RabbitMQ IPv6-only proto_dist
Ceph ms switch to IPv6 mode
Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)
haproxy upgrade checks for slaves based on ipv6 addresses
TODO:
ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.
ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.
rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.
ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.
KNOWN ISSUES (beyond us):
One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN
RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982
For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227
Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689
Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Using profiles in cephx is the recommended way since Mimic,
this also adds support for blacklist ops.
Change-Id: Ib9f65644637a5761c6cd7ca8925afc6bb2b8d5f5
Closes-Bug: #1760065
The rolling upgrade has been the default since Stein. The legacy
upgrade has been removed because it doesn't follow the upgrade
guide [1].
[1] https://docs.openstack.org/nova/latest/user/upgrade.html
Change-Id: I2aa879699cb4e9955bf5c38053eada5a53fb6211
To securely support live migration between computenodes we should enable
tls, with cert auth, instead of TCP with no auth support.
Implements: blueprint libvirt-tls
Change-Id: I22ea6233933c840b853fdcc8e03400b2bf577271
Use upstream Ansible modules for registration of services, endpoints,
users, projects, roles, and role grants.
Change-Id: I7c9138d422cc91c177fd8992347176bb54156b5a
Nova-consoleauth support was removed in
I099080979f5497537e390f531005a517ab12aa7a, but these variables were
left.
Change-Id: I1ce1631119bba991225835e8e409f11d53276550
During an upgrade, nova pins the version of RPC calls to the minimum
seen across all services. This ensures that old services do not receive
data they cannot handle. After the upgrade is complete, all nova
services are supposed to be reloaded via SIGHUP to cause them to check
again the RPC versions of services and use the new latest version which
should now be supported by all running services.
Due to a bug [1] in oslo.service, sending services SIGHUP is currently
broken. We replaced the HUP with a restart for the nova_compute
container for bug 1821362, but not other nova services. It seems we need
to restart all nova services to allow the RPC version pin to be removed.
Testing in a Queens to Rocky upgrade, we find the following in the logs:
Automatically selected compute RPC version 5.0 from minimum service
version 30
However, the service version in Rocky is 35.
There is a second issue in that it takes some time for the upgraded
services to update the nova services database table with their new
version. We need to wait until all nova-compute services have done this
before the restart is performed, otherwise the RPC version cap will
remain in place. There is currently no interface in nova available for
checking these versions [2], so as a workaround we use a configurable
delay with a default duration of 30 seconds. Testing showed it takes
about 10 seconds for the version to be updated, so this gives us some
headroom.
This change restarts all nova services after an upgrade, after a 30
second delay.
[1] https://bugs.launchpad.net/oslo.service/+bug/1715374
[2] https://bugs.launchpad.net/nova/+bug/1833542
Change-Id: Ia6fc9011ee6f5461f40a1307b72709d769814a79
Closes-Bug: #1833069
Related-Bug: #1833542
When integrating 3rd party component into openstack with kolla-ansible,
maybe have to mount some extra volumes to container.
Change-Id: I69108209320edad4c4ffa37dabadff62d7340939
Implements: blueprint support-extra-volumes
After upgrading from Rocky to Stein, nova-compute services fail to start
new instances with the following error message:
Failed to allocate the network(s), not rescheduling.
Looking in the nova-compute logs, we also see this:
Neutron Reported failure on event
network-vif-plugged-60c05a0d-8758-44c9-81e4-754551567be5 for instance
32c493c4-d88c-4f14-98db-c7af64bf3324: NovaException: In shutdown, no new
events can be scheduled
During the upgrade process, we send nova containers a SIGHUP to cause
them to reload their object version state. Speaking to the nova team in
IRC, there is a known issue with this, caused by oslo.service performing
a full shutdown in response to a SIGHUP, which breaks nova-compute.
There is a patch [1] in review to address this.
The workaround employed here is to restart the nova compute service.
[1] https://review.openstack.org/#/c/641907
Change-Id: Ia4fcc558a3f62ced2d629d7a22d0bc1eb6b879f1
Closes-Bug: #1821362
This allows nova service endpoints to use custom hostnames, and adds the
following variables:
* nova_internal_fqdn
* nova_external_fqdn
* placement_internal_fqdn
* placement_external_fqdn
* nova_novncproxy_fqdn
* nova_spicehtml5proxy_fqdn
* nova_serialproxy_fqdn
These default to the old values of kolla_internal_fqdn or
kolla_external_fqdn.
This also adds the following variables:
* nova_api_listen_port
* nova_metadata_listen_port
* nova_novncproxy_listen_port
* nova_spicehtml5proxy_listen_port
* nova_serialproxy_listen_port
* placement_api_listen_port
These default to <service>_port, e.g. nova_api_port, for backward
compatibility.
These options allow the user to differentiate between the port the
service listens on, and the port the service is reachable on. This is
useful for external load balancers which live on the same host as the
service itself.
Change-Id: I7bcce56a2138eeadcabac79dd07c8dba1c5af644
Implements: blueprint service-hostnames
bump up the max_files to 32768 and max_processes to 131072.
when nova used ceph as backend, the default limit 1024 is not enough.
each connection from rbd image to osd needs 1 fd and 2 threads. if we
have 200 osds, we need 200 fds and 400 threads for 1 image.
Change-Id: I94c3ec111473ea2ccacdea5dbbf3fdc9c569859f
Add an enable_cinder_backend_quobyte option to etc/kolla/globals.yml to
enable use the Quobyte Cinder backend.
Change the bind mounts for /var/lib/nova/mnt to include the shared
propogation if Quobyte is enabled.
Update the documentation to include a section on configuring the Cinder.
Implements: blueprint cinder-quobyte-backend
Change-Id: I364939407ad244fe81cea40f880effdbcaa8a20d
According [1], vitrage notification has to be configured in Nova,
Neutron, Cinder & Aodh config file.
[1] https://review.openstack.org/#/c/302802/
Change-Id: Iaf8cd7d40e6eb988adf4d208e6ad784f1004caa5
Currently, the serial consoles as accessed through Horizon,
timeout after the haproxy_client_timeout (default: 1m) of
inactivity. This change allows you to set a larger timeout.
Change-Id: I2a9923cb69d5db976395146685aded83922c4120
Closes-Bug: #1800643
Kolla-ansible provides support for the dev mode for some projects
of openstack, but there are still some projects that do not yet
support specific release tag. This patch will implement this function
for these project.
Change-Id: I917b27dd61295b542457a21b240afe2cd4e83e58
The cephx keys are missing a default permission
to allow to see blacklisted clients.
This permission ensures that in the event of a client
crash (kill -9/hard shutdown/power outage) the client
can re-connect and write to any devices after reboot.
Closes-Bug: 1773449
Change-Id: I44d3982233f892d2c0ce3b9964194d8098453978
Signed-off-by: Jorge Niedbalski <jorge.niedbalski@linaro.org>
Having all services in one giant haproxy file makes altering
configuration for a service both painful and dangerous. Each service
should be configured with a simple set of variables and rendered with a
single unified template.
Available are two new templates:
* haproxy_single_service_listen.cfg.j2: close to the original style, but
only one service per file
* haproxy_single_service_split.cfg.j2: using the newer haproxy syntax
for separated frontend and backend
For now the default will be the single listen block, for ease of
transition.
Change-Id: I6e237438fbc0aa3c89a3c8bd706a53b74e71904b
Add a possibility to mount sources as volumes to containers,
in "more than documentation" way. That will let us to use kolla
as a replacement for devstack.
Partially implements: blueprint mount-sources
Co-Authored-By: zhulingjie <easyzlj@gmail.com>
Change-Id: I10677e5ad22f2107a0657feeeaf32287ab9f8e28
Enables setting rp_filter mode on Neutron L3 agent and Nova compute
hosts whilst maintaining the default that it is disabled.
Closes-Bug: #1782799
Change-Id: I93e53bad9727beb786b00bd7fcd6d78785c619c2
This service is only required if you want to support cold migration.
In some instances that is not a needed feature, and avoiding having
another key to manage is an advantage.
Co-Authored-By: Adam Harwell <flux.adam@gmail.com>
Change-Id: I0a55a91673d9178933f134832df4bd849ddf5af4
This commit will constrain the dimensions of service `Nova`
and sub-containers deployed along with it.
A user can give the dimension values in `/etc/kolla/globals.yml`
the data-types just like stated in this commit.
Reference-Docs:
https://docs.docker.com/config/containers/resource_constraints/
Added Test-cases for the same.
Partially-Implements: blueprint resource-constraints
Change-Id: I6458d8fb7b26a6e7c3a9fd0d674d9cf129b0bf5d
User can use custom directory for nova instance.
For example using a shared file system as backend.
Change-Id: I11fe4891719a2e2a34888d8b798df5602e294e4f
NSXV3 is the OpenStack support for the NSX Transformers platform.
This is supported from neutron in the Mitaka version. This patch
adds Kolla support
This adds a new neutron_plugin_agent type 'vmware_nsxv3'. The plugin
does not run any neutron agents.
Change-Id: I1ecd7e5f3471e4ff03cfe8c9a3aff17af3fe1842
This patch allows configuration of the Infoblox
pluggable IPAM driver in neutron [0].
When 'infoblox' is chosen as the driver, an Infoblox
IPAM agent can be started as well. The agent
allows for enhanced DNS capabilities by listening
for neutron and nova notifications.
[0] https://github.com/openstack/networking-infoblox/blob/master/README.rst
Change-Id: I4f863750a7806a7b6eaf13900d44e5f063afe3de
Depends-On: Ia44f0e0d7a0d60cebf0857ad51700e02eba5099b
Partially-Implements: blueprint neutron-ipam-driver-infoblox
This patchset implements yamllint test to all *.yml
files.
Also fixes syntax errors to make jobs to pass.
Change-Id: I3186adf9835b4d0cada272d156b17d1bc9c2b799
This change allows the following use cases:
1. Using an already-configured MariaDB / MySQL server / Cluster
2. Using already-created DB users, without requiring root DB access.
Update: added external mariadb precheck
Change-Id: I78b0d178306d7c5293b0bf53e445f19f18b4b824
Implements: blueprint external-mariadb-support.
Closes-Bug: #1603121