Render {{ openstack_service_workers }} for workers
of each openstack service is not enough. There are
several services which has to have more workers because
there are more requests sent to them.
This patch is just adding default value for workers for
each service and sets {{ openstack_service_workers }} as
default, so value can be overrided in hostvars per server.
Nothing changed for normal user.
Change-Id: Ifa5863f8ec865bbf8e39c9b2add42c92abe40616
Fixes an issue where access rules failed to validate:
Cannot validate request with restricted access rules. Set
service_type in [keystone_authtoken] to allow access rule validation
I've used the values from the endpoint. This was mostly a straight
forward copy and paste, except:
- versioned endpoints e.g cinderv3 where I stripped the version
- monasca has multiple endpoints associated with a single service. For
this, I concatenated logging and monitoring to be logging-monitoring.
Closes-Bug: #1965111
Change-Id: Ic4b3ab60abad8c3dd96cd4923a67f2a8f9d195d7
Following up on [1].
The 3 variables are only introducing noise after we removed
the reliance on Keystone's admin port.
[1] I5099b08953789b280c915a6b7a22bdd4e3404076
Change-Id: I3f9dab93042799eda9174257e604fd1844684c1c
NSXP is the OpenStack support for the NSX Policy platform.
This is supported from neutron in the Stein version. This patch
adds Kolla support
This adds a new neutron_plugin_agent type 'vmware_nsxp'. The plugin
does not run any neutron agents.
Change-Id: I9e9d8f07e586bdc143d293e572031368af7f3fca
Nova provides a mechanism to set static vendordata via a file [1].
This patch provides support in Kolla Ansible for using this
feature.
Arguably this could be part of a generic mechansim for copying
arbitrary config, but:
- It's not clear if there is anything else that would take
advantage of this
- One size might not fit all
[1] https://docs.openstack.org/nova/latest/configuration/config.html#api.vendordata_jsonfile_path
Change-Id: Id420376d96d0c40415c369ae8dd36e845a781820
When only the directory is specified, separate log files
are created for the Nova API / metadata services with a
-wsgi postfix. This affects the 'programname' field in
Fluentd which affects the processing of these logs. This
is a regression.
When the log file name is specified, the directory is
not required.
Closes-Bug: #1950185
Change-Id: I8fec8b787349f83c05d8af7f52fc58da7c3e9cc4
In services which use the Apache HTTP server to service HTTP requests,
there exists a TimeOut directive [1] which defaults to 60 seconds. APIs
which come under heavy load, such as Cinder, can sometimes exceed this
which results in a HTTP 504 Gateway timeout, or similar. However, the
request can still be serviced without error. For example, if Nova calls
the Cinder API to detach a volume, and this operation takes longer
than the shortest of the two timeouts, Nova will emit a stack trace
with a 504 Gateway timeout. At some time later, the request to detach
the volume will succeed. The Nova and Cinder DBs then become
out-of-sync with each other, and frequently DB surgery is required.
Although strictly this category of bugs should be fixed in OpenStack
services, it is not realistic to expect this to happen in the short
term. Therefore, this change makes it easier to set the Apache HTTP
timeout via a new variable.
An example of a related bug is here:
https://bugs.launchpad.net/nova/+bug/1888665
Whilst this timeout can currently be set by overriding the WSGI
config for individual services, this change makes it much easier.
Change-Id: Ie452516655cbd40d63bdad3635fd66693e40ce34
Closes-Bug: #1917648
When the internal VIP is moved in the event of a failure of the active
controller, OpenStack services can become unresponsive as they try to
talk with MariaDB using connections from the SQLAlchemy pool.
It has been argued that OpenStack doesn't really need to use connection
pooling with MariaDB [1]. This commit reduces the use of connection
pooling via two configuration options:
- max_pool_size is set to 1 to allow only a single connection in the
pool (it is not possible to disable connection pooling entirely via
oslo.db, and max_pool_size = 0 means unlimited pool size)
- lower connection_recycle_time from the default of one hour to 10
seconds, which means the single connection in the pool will be
recreated regularly
These settings have shown better reactivity of the system in the event
of a failover.
[1] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html
Change-Id: Ib6a62d4428db9b95569314084090472870417f3d
Closes-Bug: #1896635
This change adds support for encryption of communication between
OpenStack services and RabbitMQ. Server certificates are supported, but
currently client certificates are not.
The kolla-ansible certificates command has been updated to support
generating certificates for RabbitMQ for development and testing.
RabbitMQ TLS is enabled in the all-in-one source CI jobs, or when
The Zuul 'tls_enabled' variable is true.
Change-Id: I4f1d04150fb2b5af085b762890092f87ae6076b5
Implements: blueprint message-queue-ssl-support
The goal for this push request is to normalize the construction and use
of internal, external, and admin URLs. While extending Kolla-ansible
to enable a more flexible method to manage external URLs, we noticed
that the same URL was constructed multiple times in different parts
of the code. This can make it difficult for people that want to work
with these URLs and create inconsistencies in a large code base with
time. Therefore, we are proposing here the use of
"single Kolla-ansible variable" per endpoint URL, which facilitates
for people that are interested in overriding/extending these URLs.
As an example, we extended Kolla-ansible to facilitate the "override"
of public (external) URLs with the following standard
"<component/serviceName>.<companyBaseUrl>".
Therefore, the "NAT/redirect" in the SSL termination system (HAproxy,
HTTPD or some other) is done via the service name, and not by the port.
This allows operators to easily and automatically create more friendly
URL names. To develop this feature, we first applied this patch that
we are sending now to the community. We did that to reduce the surface
of changes in Kolla-ansible.
Another example is the integration of Kolla-ansible and Consul, which
we also implemented internally, and also requires URLs changes.
Therefore, this PR is essential to reduce code duplicity, and to
facility users/developers to work/customize the services URLs.
Change-Id: I73d483e01476e779a5155b2e18dd5ea25f514e93
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
This patch introduces an optional backend encryption for the Nova API
service. When used in conjunction with enabling TLS for service API
endpoints, network communcation will be encrypted end to end, from
client through HAProxy to the Nova service.
Change-Id: I48e1540b973016079d5686b328e82239dcffacfd
Partially-Implements: blueprint add-ssl-internal-network
The Castellan (Barbican client) has different parameters to control
the used CA file.
This patch uses them.
Moreover, this aligns Barbican with other services by defaulting
its client config to the internal endpoint.
See also [1].
[1] https://bugs.launchpad.net/castellan/+bug/1876102
Closes-Bug: #1886615
Change-Id: I6a174468bd91d214c08477b93c88032a45c137be
Nova cells support introduced a slight regression that triggers
odd behaviour when we tried switching to Apache (httpd) [1].
Bootstrap no longer applied permissions recursively to all log
files, creating a discrepancy between normal and bootstrap runs
and also Nova and other services such as Cinder (regarding
bootstrap logging).
This patch fixes it.
Backport to Train.
Not creating reno nor a bug record because it does not affect
any current standard usage in any currently known way.
Note this only really hides (standardizes?) the global issue that
we don't control file permissions on newly created files too well.
[1] https://review.opendev.org/724793
Change-Id: I35e9924ccede5edd2e1307043379aba944725143
Needed-By: https://review.opendev.org/724793
The use of default(omit) is for module parameters, not templates. We
define a default value for openstack_cacert, so it should never be
undefined anyway.
Change-Id: Idfa73097ca168c76559dc4f3aa8bb30b7113ab28
Include a reference to the globally configured Certificate Authority to
all services. Services use the CA to verify HTTPs connections.
Change-Id: I38da931cdd7ff46cce1994763b5c713652b096cc
Partially-Implements: blueprint support-trusted-ca-certificate-file
The [placement].os_interface option was replaced by
[placement].valid_interfaces in Queens and was removed in Rocky.
Change-Id: I306c57305b9088159dd18af4aa373bbc39a8b881
Closes-Bug: #1853621
If "reclaim_instance_interval" has been set in nova conf,
attched volume may not be delete while instacne deleted.
Adding cinder auth in nova conf can solve the problem.
Change-Id: I9eb3a74c2f6976043cc35a94915f1fcecb9ef601
Closes-Bug: 1850279
Affects config with Blazar and fake Nova only.
The default does not include it.
Upstream docs:
RetryFilter - Deprecated since version 20.0.0 (Train)
Since the 17.0.0 (Queens) release, the scheduler has provided
alternate hosts for rescheduling so the scheduler does not need to
be called during a reschedule which makes the RetryFilter useless.
Change-Id: I26bf45997005124e9166b5bf1d44cb276624430b
This patch adds initial support for deploying multiple Nova cells.
Splitting a nova-cell role out from the Nova role allows a more granular
approach to deploying and configuring Nova services.
A new enable_cells flag has been added that enables the support of
multiple cells via the introduction of a super conductor in addition to
cell-specific conductors. When this flag is not set (the default), nova
is configured in the same manner as before - with a single conductor.
The nova role now deploys the global services:
* nova-api
* nova-scheduler
* nova-super-conductor (if enable_cells is true)
The nova-cell role handles services specific to a cell:
* nova-compute
* nova-compute-ironic
* nova-conductor
* nova-libvirt
* nova-novncproxy
* nova-serialproxy
* nova-spicehtml5proxy
* nova-ssh
This patch does not support using a single cell controller for managing
more than one cell. Support for sharing a cell controller will be added
in a future patch.
This patch should be backwards compatible and is tested by existing CI
jobs. A new CI job has been added that tests a multi-cell environment.
ceph-mon has been removed from the play hosts list as it is not
necessary - delegate_to does not require the host to be in the play.
Documentation will be added in a separate patch.
Partially Implements: blueprint support-nova-cells
Co-Authored-By: Mark Goddard <mark@stackhpc.com>
Change-Id: I810aad7d49db3f5a7fd9a2f0f746fd912fe03917
Introduce kolla_address filter.
Introduce put_address_in_context filter.
Add AF config to vars.
Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]
Other changes:
globals.yml - mention just IP in comment
prechecks/port_checks (api_intf) - kolla_address handles validation
3x interface conditional (swift configs: replication/storage)
2x interface variable definition with hostname
(haproxy listens; api intf)
1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)
neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network
basic multinode source CI job for IPv6
prechecks for rabbitmq and qdrouterd use proper NSS database now
MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)
Ceph naming workaround in CI
TODO: probably needs documenting
RabbitMQ IPv6-only proto_dist
Ceph ms switch to IPv6 mode
Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)
haproxy upgrade checks for slaves based on ipv6 addresses
TODO:
ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.
ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.
rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.
ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.
KNOWN ISSUES (beyond us):
One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN
RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982
For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227
Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689
Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
To securely support live migration between computenodes we should enable
tls, with cert auth, instead of TCP with no auth support.
Implements: blueprint libvirt-tls
Change-Id: I22ea6233933c840b853fdcc8e03400b2bf577271
Also fixes similar issues introduced by the same recent change.
Added FIXME note about possible TLS malfunction regarding horizon.
Change-Id: I5f46a9306139eb550d3849757c8bdf0767537c78
Closes-Bug: #1844016
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
nova.conf currently uses the [neutron] "url" parameter which has been
deprecated since 17.0.0. In multi-region environments this can
cause Nova to look up the Neutron endpoint for a different region.
Remove this parameter and set region_name and
valid_interfaces to allow the correct lookup to be performed.
Change-Id: I1bbc73728439a460447bc8edd264f9f2d3c814e0
Closes-Bug: #1836952
This resolves an issue where the web browser would complain that it
was trying to connect to insecure websocket when using HTTPS with
horizon.
Change-Id: Ib75cc2bc1b3811bc31badd5fda3db3ed0c59b119
Closes-Bug: #1841914
This review is the first one in a series of patches and it introduces an
optional encryption for internal openstack endpoints, implementing part
of the add-ssl-internal-network spec.
Change-Id: I6589751626486279bf24725f22e71da8cd7f0a43
After all of the discussions we had on
"https://review.opendev.org/#/c/670626/2", I studied all projects that
have an "oslo_messaging" section. Afterwards, I applied the same method
that is already used in "oslo_messaging" section in Nova, Cinder, and
others. This guarantees that we have a consistent method to
enable/disable notifications across projects based on components (e.g.
Ceilometer) being enabled or disabled. Here follows the list of
components, and the respective changes I did.
* Aodh:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Congress:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Cinder:
It was already properly configured.
* Octavia:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Heat:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Ceilometer:
Ceilometer publishes some messages in the rabbitMQ. However, the
default driver is "messagingv2", and not ''(empty) as defined in Oslo;
these configurations are defined in ceilometer/publisher/messaging.py.
Therefore, we do not need to do anything for the
"oslo_messaging_notifications" section in Ceilometer
* Tacker:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Neutron:
It was already properly configured.
* Nova
It was already properly configured. However, we found another issue
with its configuration. Kolla-ansible does not configure nova
notifications as it should. If 'searchlight' is not installed (enabled)
the 'notification_format' should be 'unversioned'. The default is
'both'; so nova will send a notification to the queue
versioned_notifications; but that queue has no consumer when
'searchlight' is disabled. In our case, the queue got 511k messages.
The huge amount of "stuck" messages made the Rabbitmq cluster
unstable.
https://bugzilla.redhat.com/show_bug.cgi?id=1478274https://bugs.launchpad.net/ceilometer/+bug/1665449
* Nova_hyperv:
I added the same configurations as in Nova project.
* Vitrage
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Searchlight
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Ironic
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Glance
It was already properly configured.
* Trove
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Blazar
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Sahara
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Watcher
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Barbican
I created a mechanism similar to what we have in Cinder, Nova,
and others. I also added a configuration to 'keystone_notifications'
section. Barbican needs its own queue to capture events from Keystone.
Otherwise, it has an impact on Ceilometer and other systems that are
connected to the "notifications" default queue.
* Keystone
Keystone is the system that triggered this work with the discussions
that followed on https://review.opendev.org/#/c/670626/2. After a long
discussion, we agreed to apply the same approach that we have in Nova,
Cinder and other systems in Keystone. That is what we did. Moreover, we
introduce a new topic "barbican_notifications" when barbican is
enabled. We also removed the "variable" enable_cadf_notifications, as
it is obsolete, and the default in Keystone is CADF.
* Mistral:
It was hardcoded "noop" as the driver. However, that does not seem a
good practice. Instead, I applied the same standard of using the driver
and pushing to "notifications" queue if Ceilometer is enabled.
* Cyborg:
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Murano
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Senlin
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Manila
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Zun
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Designate
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Magnum
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
Closes-Bug: #1838985
Change-Id: I88bdb004814f37c81c9a9c4e5e491fac69f6f202
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
Controllers lacking compute should not be required to provide
valid migration_interface as it is not used there (and prechecks
do not check that either).
Inclusion of libvirt conf section is now conditional on service type.
libvirt conf section has been moved to separate included file to
avoid evaluation of the undefined variable (conditional block did not
prevent it and using 'default' filter may hide future issues).
See https://github.com/ansible/ansible/issues/58835
Additionally this fixes the improper nesting of 'if' blocks for libvirt.
Change-Id: I77af534fbe824cfbe95782ab97838b358c17b928
Closes-Bug: #1835713
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
In a rare event both kolla-ansible and nova-scheduler try to do
the mapping at the same time and one of them fails.
Since kolla-ansible runs host discovery on each deployment,
there is no need to change the default of no periodic host discovery.
I added some notes for future. They are not critical.
I made the decision explicit in the comments.
I changed the task name to satisfy recommendations.
I removed the variable because it is not used (to avoid future doubts).
Closes-Bug: #1832987
Change-Id: I3128472f028a2dbd7ace02abc179a9629ad74ceb
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
backport: stein, rocky
During startup of nova-compute, we see the following error message:
Error gathering result from cell 00000000-0000-0000-0000-000000000000:
DBNotAllowed: nova-compute
This issue was observed in devstack [1], and fixed [2] by removing
database configuration from the compute service.
This change takes the same approach, removing DB config from nova.conf
in the nova-compute* containers.
[1] https://bugs.launchpad.net/devstack/+bug/1812398
[2] 8253787137
Change-Id: I18c99ff4213ce456868e64eab63a4257910b9b8e
Closes-Bug: #1829705
The api_endpoint option was deprecated, and will be removed by
https://review.openstack.org/643483.
Change-Id: Ie56a8ab07ab21d2e7d678e636c1408099d8ab3aa
This allows ironic service endpoints to use custom hostnames, and adds the
following variables:
* ironic_internal_fqdn
* ironic_external_fqdn
* ironic_inspector_internal_fqdn
* ironic_inspector_external_fqdn
These default to the old values of kolla_internal_fqdn or
kolla_external_fqdn.
This also adds ironic_api_listen_port and ironic_inspector_listen_port
options, which default to ironic_api_port and ironic_inspector_port for
backward compatibility.
These options allow the user to differentiate between the port the
service listens on, and the port the service is reachable on. This is
useful for external load balancers which live on the same host as the
service itself.
Change-Id: I45b175e85866b4cfecad8451b202a5a27f888a84
Implements: blueprint service-hostnames
We're duplicating code to build the keystone URLs in nearly every
config, where we've already done it in group_vars. Replace the
redundancy with a variable that does the same thing.
Change-Id: I207d77870e2535c1cdcbc5eaf704f0448ac85a7a
This allows neutron service endpoints to use custom hostnames, and adds the
following variables:
* neutron_internal_fqdn
* neutron_external_fqdn
These default to the old values of kolla_internal_fqdn or
kolla_external_fqdn.
This also adds a neutron_server_listen_port option, which defaults to
neutron_server_port for backward compatibility.
This option allow the user to differentiate between the port the
service listens on, and the port the service is reachable on. This is
useful for external load balancers which live on the same host as the
service itself.
Change-Id: I87d7387326b6eaa6adae1600b48d480319d10676
Implements: blueprint service-hostnames