Supports calling custom Kolla Ansible commands directly after a
``kayobe control host bootstrap``.
Change-Id: I19f188cc002f8578618003e90c0a4a154b806e49
Since using the to_bool function in more places in
I3a5a43fef9c3d68d0db02be12b9f892c437e513d, we are now more strict about
the result of the variable dump. If there are no controllers in the
inventory, the result will not be a valid boolean and the to_bool
function will exit non-zero.
This change fixes the issue by running against localhost, which should
always be in the inventory.
Change-Id: Idcfd9d335f11f6c4d676033128d207f62b363ee9
This requires disabling libvirt_vm_trust_guest_rx_filters, which when
enabled triggers the following errors when booting baremetal instances
with Tenks on Libvirt 9 (and most likely since 8.9.0):
Cannot set interface flags on 'macvtap1': Value too large for defined data type
This is apparently triggered by a Libvirt commit refreshing rx-filters
more often [1].
As explained in I71a2051d8acd63379bd70bc1287a059d4a7f6387, this setting
was added to allow traffic destined for other MAC addresses to reach VMs
when using a macvtap interface.
This will prevent multicast from working, but we don't need it for
baremetal tests in CI.
This setting will be enabled again once the issue is resolved in either
Libvirt or Tenks.
This reverts commit 21c68bbfafe529e1c337ba242c2e501c75bfedaa.
Also increase timeout of upgrade jobs which is too short now due to the
added delay added by bare metal testing.
[1] 060d4c83ef
Change-Id: I2cfd2667abb1ae8988b7a7fd9761b75c20a0eaa4
Kolla Ansible enabled RabbitMQ HA queues by default, which require a
manual migration step [1]. Adds these to the Kayobe upgrade CI.
[1] https://review.opendev.org/c/openstack/kolla-ansible/+/882825
Change-Id: I82c286fd17e3a1d7f31952442fa281302cda7ee4
Various functions in the development/testing scripts rely on 'kayobe
configuration dump' to extract the value of flags. If this command fails
for any reason, we should exit the script. Currently, some places we
continue and return 1, since we check the output against the string
'true'.
The to_bool helper function handles failure by checking for a valid
boolean output, so let's use that everywhere.
Change-Id: I3a5a43fef9c3d68d0db02be12b9f892c437e513d
Yoga upper constraints were used to keep compatibility with Python 3.6.
This is not needed with all supported OS using Python 3.9 or newer.
This reverts commits d2e0d64eb00d4cea8a4f8ff6a963b1ec0c3660ac and
d190e9e3a33e049267300ef0ce90bc1a4db14061.
Change-Id: I35a07bcc2b7c9cbb49fa60e6802cc6288a34fbd8
CentOS Stream 8 support has been dropped. Migration path will be present
in Yoga release - as a followup change.
MichaelRigart.interfaces does not support custom routes for
NetworkManager yet. It has been disabled in CI for Rocky Linux 9
temporarily.
Non-voting CentOS Stream 9 CI overcloud job is using RL9 container
images (as kolla CI is not building CS9 images anymore).
Change-Id: Idf5ee822b03ba40179803c981500a6bad37594bf
This is required to be able to install tenks. Otherwise, we try to
install Jinja2 3.1.2 which requires Python 3.7 or newer.
Change-Id: Ie497b191b6de8bc818dc4a2a12f7129a02d0fd00
Requirements upper constraints bumped python-novaclient to version
18.0.0 [1], which requires Python 3.8 [2]. This results in failures when
installing python-openstackclient on CentOS and Rocky with Python 3.6.
ERROR: Cannot install python-openstackclient==5.8.0 because these package versions have conflicting dependencies.
The conflict is caused by:
python-openstackclient 5.8.0 depends on python-novaclient>=17.0.0
The user requested (constraint) python-novaclient===18.0.0
Work around this issue by using yoga upper constraints until we upgrade
to CentOS Stream 9 and Rocky Linux 9.
This also fixes another issue seen on Ubuntu where image uploads to
Glance through Ansible fail with a 400 Bad Request error. This is caused
by the bump of openstacksdk to version 0.99.0 and will be fixed by a new
release of ansible-collections-openstack.
[1] https://review.opendev.org/c/openstack/requirements/+/842808
[2] https://review.opendev.org/c/openstack/python-novaclient/+/838944
Change-Id: I40c6b898963c2218d41d37bd73d40ce8dcf22b87
Previously we were using the zuul user in the TLS jobs. This was due to
a permissions issue when accessing the CA certificate in kayobe-config
in the zuul user's home directory.
This change reverts to the default of using the stack user for the TLS
jobs. In order to make this work, the generated CA cert chain is added
to the trust store.
Change-Id: I875f8976df75dee68ba00842fe624c29cc1b123c
Set the Ironic boot mode to legacy BIOS explicitly in Tenks config in
anticipation of an upcoming change to the default boot mode.
Override the boot mode to UEFI in the overcloud TLS job to improve
coverage. This requires enabling iPXE booting.
Depends-On: https://review.opendev.org/c/openstack/tenks/+/827479/
Change-Id: Id1b4e9775c834b8b97e086241ee8b247977225a2
As a first step towards supporting multiple overcloud disk images, this
change introduces a new command to build a disk image directly with DIB:
`kayobe overcloud host image build`.
It also disables building a root disk image during Bifrost bootstrap if
overcloud_dib_build_host_images is set to true.
Change-Id: I93d242889e225b4e60254f6b9cc5eeb457294ac8
Story: 2002098
Task: 41693
Arguments are passed through to kayobe-env in kayobe-config, which
allows to set the Kayobe environment.
Change-Id: I4c72e32e5379237340284a09874b0c500e41ad0f
By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.
This change updates all references to Ansible facts within Kayobe
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.
This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.
[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars
Story: 2007993
Task: 42464
Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/791276
Change-Id: I14db53ed6e57d37bbd28dd5819e432e3fe6628b2
/usr/bin/python may be python 2 on Focal, which causes problems with
Ansible on the control host. By installing the python-is-python3 package
we ensure that the correct interpreter is used.
This change updates the installation documentation and development
environment scripts.
Story: 2004960
Task: 42579
Change-Id: Ie94099075bae3c491f9cf830c38e6cfc8af605a6
* add 'bridge_type: linuxbridge' to tenks configuration to avoid
dependency on OVS
* extend seed development environment testing to include overcloud
provisioning and deployment
* remove seed hypervisor and seed VM environments. These are very
stale, and largely replaced by a-universe-from-nothing. Add a link to
that workshop on the same page
Change-Id: I9928e5912e6770bdcc1d5d0884d2f101c16ee6a9
All instances of 'kayobe control host bootstrap' in the development
scripts use a helper function, except for during seed_hypervisor_deploy.
The helper adds a retry mechanism to combat flakiness often seen during
Ansible Galaxy installs.
This change fixes the issue.
TrivialFix
Change-Id: I954cb604a18874744b3673ebf2e2c29caa18ce8f
A bug has been introduced to the which package in CentOS Stream 8 which
causes it to fail when used with the following bash options:
set -u
set -o pipefail
Then, when running which we see the following output:
environment: line 1: _declare: unbound variable
As found by Pierre, this seems to be caused by the implementation of
which as a bash function which references an unbound variable
(_declare). It's tracked in Fedora by
https://bugzilla.redhat.com/show_bug.cgi?id=1944877#.
This change works around the issue by using the /usr/bin/which binary.
Co-Authored-By: Pierre Riteau <pierre@stackhpc.com>
Change-Id: I468d4e0460c13791b9f01d5854ef45472528c6fe
Story: 2008795
Task: 42215
We still see flakiness when downloading content from Ansible Galaxy,
often HTTP 520. This change increases the retries from 3 to 10, and adds
a 5 second delay between attempts.
Change-Id: I0c46e5fcc6979027dc6f1bc5cc49e923a205f654
Related: https://github.com/ansible/galaxy/issues/2429
The 'openstack server show <server> -f value -c addresses' command
previously had output like this:
<network name>=<IP>
Now it shows a Python output like this:
{'<network name>': ['IP']}
This broke the parsing of the command output when determining which IP
address to use to access a bare metal instance via SSH.
This change fixes the issue by querying the server's port in Neutron,
and using the fixed IP address.
Change-Id: I55b5f185fb7136d3c6fa565aa46598f21c94eb43
* Use source images
* Need to specify bash for &> syntax
Issues worked around:
* Manually configuring bridge via ip commands makes ifup fail to bring
up the link. Adds a kayobe-network-bootstrap Zuul CI role that adds
persistent configuration for the all-in-one network.
* bridge not active after interfaces role bounce. Added a pause, similar
to https://github.com/michaelrigart/ansible-role-interfaces/pull/31
* fails installing docker python module for kolla user. WARNING: The
repository located at mirror-int.ord.rax.opendev.org is not a trusted
or secure host and is being ignored ERROR: No matching distribution
found for docker===4.4.0 Adding trusted host for PyPI mirror.
* Tenks fails to create block devices - missing qemu-img (in qemu-utils)
* Tenks qemu emulator is different on Ubuntu
Remaining issues:
* Bare metal testing is unreliable on Ubuntu - some jobs see IPMI
failures such as the following:
ipmitool chassis bootdev pxe
Error setting Chassis Boot Parameter 5\nError setting Chassis Boot
Parameter 0\n
Bare metal testing is disabled on Ubuntu for now.
Depends-On: https://review.opendev.org/766984
Depends-On: https://review.opendev.org/766958
Story: 2004960
Task: 29393
Change-Id: I1985efae7c18f55c3ff7c27c17d6242523904f3e
This partially reverts commit 47bbb96b29ab30764d6220cdd43e63a1d2072533
which triggered a retry on vexxhost clouds.
The issue was introduced in Ie8fd965165e8d347d27528a2c16d0647e412ccdc,
which applied some fixes for CentOS 8.3, and inadvertently removed
the Tenks variable that forces the use of qemu for 'bare metal' VMs.
This lead to autodetection of KVM, which does not work well when nested
in all CI cloud providers.
This change fixes the issue by forcing the use of qemu for the overcloud
once more. It also adds a similar option for the seed VM job.
Change-Id: I6bc8da2b8da903e09b97df8cd95c68a562c11db9
This requires stackhpc.os-images v1.10.0 or newer, for compatibility
with CentOS 8 when SELinux is enabled: we disable SELinux, but without
rebooting it stays enabled.
This Ansible role was updated to v1.10.2 in master and stable/victoria
by I5efdbd52556721914fe69d7c6ba454b2c721b643, for another reason.
Remember to bump the requirement when backporting to earlier releases.
It also needs changes in the way we interact with Bifrost to avoid using
the env-vars file which has been removed. This is implemented by change
I25078e69acdf41a4ef9957f99fe5047de54b778d.
Finally, it requires building seed deployment images only after
deploying Bifrost, because the task copying images onto the seed expects
/etc/kolla/bifrost to exist.
We also copy log files to identify issues when the job fails.
Change-Id: I4719b4d397c01b35c78cb84c6d686dd27742d1c0
* Bump stackhpc.libvirt-host to v1.7.1. On seed-hypervisors installed
using CentOS 8.2 or earlier, interaction with libvirt may fail due to
libgcrypt being incompatible. See
https://github.com/stackhpc/ansible-role-libvirt-host/issues/42
* Bump MichaelRigart.interfaces to v1.9.2. The CentOS 8.3 cloud image
includes an ifcfg-ens3-1 file. See
https://github.com/michaelrigart/ansible-role-interfaces/pull/93
* Previously a second libvirt daemon was installed by Tenks on the host,
however changes in libvirt 6.0.0 to separate libvirtd into multiple
daemons do not allow for customisation of the PID files used by the
new daemons. This leads to a conflict between the container and host
daemons. Update the Tenks config to use the containerised Nova libvirt
daemon. This depends on a change to the stackhpc.libvirt-host role:
https://github.com/stackhpc/ansible-role-libvirt-host/pull/44
* Not CentOS 8.3 related, but tox jobs are now failing on python
dependencies. Remove upper limits from docker and paramiko.
* Not CentOS 8.3 related, but Bifrost has enabled authentication by
default. We are not ready to support this, so override it.
Story: 2008429
Task: 41378
Change-Id: Ie8fd965165e8d347d27528a2c16d0647e412ccdc
While we always test baremetal compute in CI, development environments
may not. Given that Ironic is now disabled by default, we should make
this work out of the box.
Story: 2008207
Task: 41003
Change-Id: Id3128380f5ff74d24265f6b2132c6d7992bf00ba
Adds the kayobe-seed-vm-centos8 CI job to configure the Zuul VM as a
seed hypervisor, and use nested virt to provision a seed VM. This
ensures that the seed hypervisor code paths are tested.
The job uses a Cirros image for the seed VM rather than the usual CentOS
cloud image. This is to reduce bandwidth required to download the image.
It does mean that the resulting seed VM cannot be used as a seed, but
nested virt would make this slow and unreliable anyway. Cirros does not
load cdrom drivers by default, so we add the configdrive as a disk
rather than a cdrom device.
Depends-On: https://review.opendev.org/617161
Change-Id: I2268a1ddf9a2870c713f32a40689e1686365aabd
Story: 2001655
Task: 6683
Switches to use the IPA builder project for building IPA images.
Switches the IPA images used by default to CentOS 8 based image.
Changes the file extension of the IPA kernel image from vmlinuz to
kernel.
Story: 2007070
Task: 37953
Change-Id: I82fc455f41f48dacb453e135870dd776895d7c99
Story: 2006574
Task: 39485
* Always use Python 3
* Drop code paths for CentOS 7
* Drop support for Yum
* Remove support for host NTP daemon, always use chrony
* Switch references from 'yum_install_epel' to 'dnf_install_epel'
* Remove overcloud host image workaround for tagged VLAN admin network
* Remove the kayobe.utils.yum_install function, which is unused
Change-Id: I368f6edafed9779658798fc342116b4c1b3ffd48
Story: 2006574
Task: 39481
Backport: train, stein, rocky
This fixes issues seen with a-universe-from-nothing using stable/train.
Change-Id: Ib477de5f3af2e4c182d0c2999c274dbb5553531c
Story: 2007572
Task: 39469