After all of the discussions we had on
"https://review.opendev.org/#/c/670626/2", I studied all projects that
have an "oslo_messaging" section. Afterwards, I applied the same method
that is already used in "oslo_messaging" section in Nova, Cinder, and
others. This guarantees that we have a consistent method to
enable/disable notifications across projects based on components (e.g.
Ceilometer) being enabled or disabled. Here follows the list of
components, and the respective changes I did.
* Aodh:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Congress:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Cinder:
It was already properly configured.
* Octavia:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Heat:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Ceilometer:
Ceilometer publishes some messages in the rabbitMQ. However, the
default driver is "messagingv2", and not ''(empty) as defined in Oslo;
these configurations are defined in ceilometer/publisher/messaging.py.
Therefore, we do not need to do anything for the
"oslo_messaging_notifications" section in Ceilometer
* Tacker:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Neutron:
It was already properly configured.
* Nova
It was already properly configured. However, we found another issue
with its configuration. Kolla-ansible does not configure nova
notifications as it should. If 'searchlight' is not installed (enabled)
the 'notification_format' should be 'unversioned'. The default is
'both'; so nova will send a notification to the queue
versioned_notifications; but that queue has no consumer when
'searchlight' is disabled. In our case, the queue got 511k messages.
The huge amount of "stuck" messages made the Rabbitmq cluster
unstable.
https://bugzilla.redhat.com/show_bug.cgi?id=1478274https://bugs.launchpad.net/ceilometer/+bug/1665449
* Nova_hyperv:
I added the same configurations as in Nova project.
* Vitrage
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Searchlight
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Ironic
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Glance
It was already properly configured.
* Trove
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Blazar
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Sahara
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Watcher
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Barbican
I created a mechanism similar to what we have in Cinder, Nova,
and others. I also added a configuration to 'keystone_notifications'
section. Barbican needs its own queue to capture events from Keystone.
Otherwise, it has an impact on Ceilometer and other systems that are
connected to the "notifications" default queue.
* Keystone
Keystone is the system that triggered this work with the discussions
that followed on https://review.opendev.org/#/c/670626/2. After a long
discussion, we agreed to apply the same approach that we have in Nova,
Cinder and other systems in Keystone. That is what we did. Moreover, we
introduce a new topic "barbican_notifications" when barbican is
enabled. We also removed the "variable" enable_cadf_notifications, as
it is obsolete, and the default in Keystone is CADF.
* Mistral:
It was hardcoded "noop" as the driver. However, that does not seem a
good practice. Instead, I applied the same standard of using the driver
and pushing to "notifications" queue if Ceilometer is enabled.
* Cyborg:
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.
* Murano
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Senlin
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Manila
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Zun
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.
* Designate
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
* Magnum
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components
Closes-Bug: #1838985
Change-Id: I88bdb004814f37c81c9a9c4e5e491fac69f6f202
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
Neutron FWaaS v1 is deprecated and removed since stein cycle by [0]. So
remove related options in kolla.
[0] https://review.opendev.org/616410
Change-Id: Ia03e7979dd48bafb34c11edd08c2a2a87b949e0e
Most other services already gate the DB bootstrap operations with the
'use_preconfigured_databases' variable; Blazar did not.
Change-Id: I772b1cb92612c7e6936f052ed9947f93582f264c
Change https://review.opendev.org/#/c/670247/ attempted to fix glance
deployment with the file backend. However, it added a new bug by being
more strict about only generating configuration where the container will
be deployed. This means that the current method of running the glance
bootstrap container on any host in glance-api group could be broken,
since it needs the container configuration.
This change only runs the bootstrap container on hosts in the
glance_api_hosts list, which in the case of the file backend typically
only contains one host.
This change also fixes up some logic during rolling upgrade, where we
might not generate new configuration for the bootstrap host.
Change-Id: I83547cd83b06ddefb3a9e1f39844537bdb32bd7f
Related-Bug: #1836151
The keepalived_virtual_router_id should be changed from the default in
the case of a multi-region deployment where the VIP of the different
regions resides on the same subnet.
This is not immediately clear - this change should make it more obvious.
Change-Id: Ia4899ba407937d9f27832c9d123701729e89987a
* Ubuntu ships with nfs-ganesha 2.6.0, which requires to do an rpcbind
udp test on startup (was fixed later)
* Add rpcbind package to be installed by kolla-ansible bootstrap when
ceph_nfs is enabled
* Update Ceph deployment docs with a note
Change-Id: Ic19264191a0ed418fa959fdc122cef543446fbe5
The ironic inspector iPXE configuration includes the following kernel
argument:
initrd=agent.ramdisk
However, the ramdisk is actually called ironic-agent.initramfs, so the
argument should be:
initrd=ironic-agent.initramfs
In BIOS boot mode this does not cause a problem, but for compute nodes
with UEFI enabled, it seems to be more strict about this, and fails to
boot.
Change-Id: Ic84f3b79fdd3cd1730ca2fb79c11c7a4e4d824de
Closes-Bug: #1836375
Currently, the documentation around configuring regions directs
you to make changes to openstack_region_name and multiple_regions_names
in the globals.yml file.
The defaults weren't represented in there which could potentially cause
confusion. This change adds these defaults with a brief description.
TrivialFix
Change-Id: Ie0ff7e3dfb9a9355a9c9dbaf27151d90162806dd
Tweaked some of the language in doc/source/user/multi-regions.rst for
clarity purposes.
TrivialFix
Change-Id: Icdd8da6886d0e39da5da80c37d14d2688431ba8f
A common class of problems goes like this:
* kolla-ansible deploy
* Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
* Re-run kolla-ansible deploy
* Service fails to start
This happens because the DB is created during the first run, but for some
reason we fail before performing the DB sync. This means that on the second run
we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
already exists, and therefore still don't perform the DB sync. However this
time, the command may complete without apparent error.
We should be less careful about when we perform the DB sync, and do it whenever
it is necessary. There is an argument for not doing the sync during a
'reconfigure' command, although we will not change that here.
This change only always performs the DB sync during 'deploy' and
'reconfigure' commands.
Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
Closes-Bug: #1823766
Closes-Bug: #1797814
Since https://review.opendev.org/647699/, we lost the logic to only
deploy glance-api on a single host when using the file backend.
This code was always a bit custom, and would be better supported by
using the 'host_in_groups' pattern we have in a few other places where a
single group name does not describe the placement of containers for a
service.
Change-Id: I21ce4a3b0beee0009ac69fecd0ce24efebaf158d
Closes-Bug: #1836151
Controllers lacking compute should not be required to provide
valid migration_interface as it is not used there (and prechecks
do not check that either).
Inclusion of libvirt conf section is now conditional on service type.
libvirt conf section has been moved to separate included file to
avoid evaluation of the undefined variable (conditional block did not
prevent it and using 'default' filter may hide future issues).
See https://github.com/ansible/ansible/issues/58835
Additionally this fixes the improper nesting of 'if' blocks for libvirt.
Change-Id: I77af534fbe824cfbe95782ab97838b358c17b928
Closes-Bug: #1835713
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
This mimics behavior of core 'template' module to allow relative
includes from the same dir as merged template, base dir of
playbook/role (usually role for us) and its 'templates' subdir.
Additionally old unused code was removed.
Change-Id: I83804d3cf5f17eb2302a2dfe49229c6277b1e25f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Skip creation by setting ENABLE_EXT_NET to 0.
Since adding errexit we are failing in kayobe CI, since we have a
conflicting flat network on physnet1.
Change-Id: I88429f30eb81a286f4b8104d5e7a176eefaad667
* Sometimes getting/creating ceph mds keyring fails, similar to https://tracker.ceph.com/issues/16255
Change-Id: I47587cbeb8be0e782c13ba7f40367409e2daa8a8
Updated the docs to refer to the openstack client, rather than the (old)
neutron client.
TrivialFix
Change-Id: I82011175f7206f52570a0f7d1c6863ad8fa08fd0
The "backup_driver" option should be configured to
cinder.backup.drivers.ceph.CephBackupDriver instead of
cinder.backup.drivers.ceph.
Change-Id: I22457023c6ad76b508bcbe05e37517c18f1ffc81
Closes-Bug: #1832878
Missed by me in a recent merge.
TrivialFix
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I83b1e84a43f014ce20be8677868be3f66017e3c2
We have a minimum supported version of Ansible, currently 2.5. We should
test this in addition to the latest version. This change tests latest on
Ubuntu, and minimum on other distros.
Change-Id: I45a7173139f057177a71e919ad3e718a99d9f87b
Due to a bug in ansible, kolla-ansible deploy currently fails in nova
with the following error when used with ansible earlier than 2.8:
TASK [nova : Waiting for nova-compute services to register themselves]
*********
task path:
/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30
fatal: [primary]: FAILED! => {
"failed": true,
"msg": "The field 'vars' has an invalid value, which
includes an undefined variable. The error was:
'nova_compute_services' is undefined\n\nThe error
appears to have been in
'/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml':
line 30, column 3, but may\nbe elsewhere in the file
depending on the exact syntax problem.\n\nThe
offending line appears to be:\n\n\n- name: Waiting
for nova-compute services to register themselves\n ^
here\n"
}
Example:
http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy
This was caused by
https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921,
which hits upon an ansible bug described here:
https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until.
We can work around this by not using an intermediary variable.
Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af
Closes-Bug: #1835817