Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.
This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.
I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.
[1] https://review.opendev.org/667363
Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
* Ubuntu ships with nfs-ganesha 2.6.0, which requires to do an rpcbind
udp test on startup (was fixed later)
* Add rpcbind package to be installed by kolla-ansible bootstrap when
ceph_nfs is enabled
* Update Ceph deployment docs with a note
Change-Id: Ic19264191a0ed418fa959fdc122cef543446fbe5
The ironic inspector iPXE configuration includes the following kernel
argument:
initrd=agent.ramdisk
However, the ramdisk is actually called ironic-agent.initramfs, so the
argument should be:
initrd=ironic-agent.initramfs
In BIOS boot mode this does not cause a problem, but for compute nodes
with UEFI enabled, it seems to be more strict about this, and fails to
boot.
Change-Id: Ic84f3b79fdd3cd1730ca2fb79c11c7a4e4d824de
Closes-Bug: #1836375
Currently, the documentation around configuring regions directs
you to make changes to openstack_region_name and multiple_regions_names
in the globals.yml file.
The defaults weren't represented in there which could potentially cause
confusion. This change adds these defaults with a brief description.
TrivialFix
Change-Id: Ie0ff7e3dfb9a9355a9c9dbaf27151d90162806dd
Tweaked some of the language in doc/source/user/multi-regions.rst for
clarity purposes.
TrivialFix
Change-Id: Icdd8da6886d0e39da5da80c37d14d2688431ba8f
A common class of problems goes like this:
* kolla-ansible deploy
* Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
* Re-run kolla-ansible deploy
* Service fails to start
This happens because the DB is created during the first run, but for some
reason we fail before performing the DB sync. This means that on the second run
we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
already exists, and therefore still don't perform the DB sync. However this
time, the command may complete without apparent error.
We should be less careful about when we perform the DB sync, and do it whenever
it is necessary. There is an argument for not doing the sync during a
'reconfigure' command, although we will not change that here.
This change only always performs the DB sync during 'deploy' and
'reconfigure' commands.
Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
Closes-Bug: #1823766
Closes-Bug: #1797814
Since https://review.opendev.org/647699/, we lost the logic to only
deploy glance-api on a single host when using the file backend.
This code was always a bit custom, and would be better supported by
using the 'host_in_groups' pattern we have in a few other places where a
single group name does not describe the placement of containers for a
service.
Change-Id: I21ce4a3b0beee0009ac69fecd0ce24efebaf158d
Closes-Bug: #1836151
Controllers lacking compute should not be required to provide
valid migration_interface as it is not used there (and prechecks
do not check that either).
Inclusion of libvirt conf section is now conditional on service type.
libvirt conf section has been moved to separate included file to
avoid evaluation of the undefined variable (conditional block did not
prevent it and using 'default' filter may hide future issues).
See https://github.com/ansible/ansible/issues/58835
Additionally this fixes the improper nesting of 'if' blocks for libvirt.
Change-Id: I77af534fbe824cfbe95782ab97838b358c17b928
Closes-Bug: #1835713
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
This mimics behavior of core 'template' module to allow relative
includes from the same dir as merged template, base dir of
playbook/role (usually role for us) and its 'templates' subdir.
Additionally old unused code was removed.
Change-Id: I83804d3cf5f17eb2302a2dfe49229c6277b1e25f
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Skip creation by setting ENABLE_EXT_NET to 0.
Since adding errexit we are failing in kayobe CI, since we have a
conflicting flat network on physnet1.
Change-Id: I88429f30eb81a286f4b8104d5e7a176eefaad667
* Sometimes getting/creating ceph mds keyring fails, similar to https://tracker.ceph.com/issues/16255
Change-Id: I47587cbeb8be0e782c13ba7f40367409e2daa8a8
Updated the docs to refer to the openstack client, rather than the (old)
neutron client.
TrivialFix
Change-Id: I82011175f7206f52570a0f7d1c6863ad8fa08fd0
The "backup_driver" option should be configured to
cinder.backup.drivers.ceph.CephBackupDriver instead of
cinder.backup.drivers.ceph.
Change-Id: I22457023c6ad76b508bcbe05e37517c18f1ffc81
Closes-Bug: #1832878
Missed by me in a recent merge.
TrivialFix
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I83b1e84a43f014ce20be8677868be3f66017e3c2
We have a minimum supported version of Ansible, currently 2.5. We should
test this in addition to the latest version. This change tests latest on
Ubuntu, and minimum on other distros.
Change-Id: I45a7173139f057177a71e919ad3e718a99d9f87b
Due to a bug in ansible, kolla-ansible deploy currently fails in nova
with the following error when used with ansible earlier than 2.8:
TASK [nova : Waiting for nova-compute services to register themselves]
*********
task path:
/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30
fatal: [primary]: FAILED! => {
"failed": true,
"msg": "The field 'vars' has an invalid value, which
includes an undefined variable. The error was:
'nova_compute_services' is undefined\n\nThe error
appears to have been in
'/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml':
line 30, column 3, but may\nbe elsewhere in the file
depending on the exact syntax problem.\n\nThe
offending line appears to be:\n\n\n- name: Waiting
for nova-compute services to register themselves\n ^
here\n"
}
Example:
http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy
This was caused by
https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921,
which hits upon an ansible bug described here:
https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until.
We can work around this by not using an intermediary variable.
Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af
Closes-Bug: #1835817
* Fix wsrep sequence number detection. Log message format is
'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
the UUID rather than the sequence number. This is as good as random.
* Add become: true to log file reading and removal since
I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
'docker cp' command which creates it.
* Don't run handlers during recovery. If the config files change we
would end up restarting the cluster twice.
* Wait for wsrep recovery container completion (don't detach). This
avoids a potential race between wsrep recovery and the subsequent
'stop_container'.
* Finally, we now wait for the bootstrap host to report that it is in
an OPERATIONAL state. Without this we can see errors where the
MariaDB cluster is not ready when used by other services.
Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
Closes-Bug: #1834467