Investigation of our standalone test job issues, where jobs would
fail, hosts not get DHCP updates, and ultimately IPXE would
fail prior to getting a valid or the expected response,
revealed the discovery that dnsmasq was crashing often when
the port updates were going through, ultimately preventing
the mutli-scenario test jobs from running as the standalone
jobs represent a number of different scenarios which are
executed across a pool of test machines.
In this case, the path forward appears to be to downgrade
dnsmasq to stablize our CI and allow us to otherwise upgrade.
This patch adds the focal updates as a package source,
and installs the dnsmasq package.
Related-Bug: #2026757
Change-Id: Iacfd1ab677c612525601afcaeee5e5b067206ff3
All database migration testing in opestack is done through
an opportunistic worker model, where if the database is available
and correctly configured for testing, i.e. openstack-citest user
and access appropriately granted, then the tests will create and
test migrations.
However, this has been problematic with mysql as of recent, as we
have seen a long standing migration issue boil to the surface often
with tests.
As a result, we're isolating that test down to it's own job so we
can limit the blast damage. This also helps us isolate is it all
of the tests, or is it just soley isolated down to the mysql test
run class, which is an additional data point.
By default, we continue to run Postgres migration tests in the
main jobs, as they haven't been impacted by this issue.
Change-Id: Iefc044c31ef029e400a7dad294504175a4462638
Leave the snmp job on focal for the time being as it's failing on jammy
and we need to move forward with the migration.
Change-Id: I0b9b600c3eb10761054abdb9c13d7107269001b9
It appear the push to Cirros 0.6.1 has re-occured, and we now
have things failing as a result.
Specifically ironic-grenade is trying to run with Cirros 0.5.2,
yet the file is not found later on.
Anyhow, an explicit pin should resolve this.
Change-Id: I97a1403820c8dbe633cf1d529adc79e8af463e80
Disabling the performance counters as we suspect it is causing
database interaction to freeze on the grenade CI job.
Change-Id: Id951815ab9bfd1ca16aa66fa4c87c0e1b3e788f6
Launching test VMs can take a while, and grenade can fail
if the VM's networking is not quite online in under sixty
seconds. As such, it is reasonable to use a larger window
so the failure rate of ironic-grenade will hopefully decline.
Depends-On: https://review.opendev.org/c/openstack/grenade/+/879674
Change-Id: I07aead4b09ccb7e427a0d3d04e7a580cf4b00a95
The anaconda job is failing as were getting a redirect issued back
upon attempting to validate URLs. The servers are now directing us
to use HTTPS instead.
Change-Id: Iac8e6e58653ac616250f4ce3ab3ae7f5164e5b03
This commit partially reverts change set
I0bfef09a5312a17be54ce5c09805f06b7c349026
where the amount of memory for test VMs was
increased to 4GB. This was because excess
junk getting stuck in the staged ramdisk
images used by CI.
Change-Id: Ia0c74cbeecdb9febf9f7a4e76db84e0f378a97fc
It appears we are getting an opcode error when attempting to boot
Centos 9-stream utilizing the EFI artifacts from Ubuntu.
Technically this should work, however further aftifacts in the boot
chain may be signed with other key credentials that Ubuntu's
grub does not know about, because the chain of trust is
MSFT -> Vendor shim (slow change rate) -> Vendor GRUB -> Kernel
Where vendor differences should never work, is if Secure Boot
is enforcing.
Exception on launch:
X64 Exception Type - 06(#UD - Invalid Opcode) CPU Apic ID - 00000000 !!!!
A similar Debian bug is open for a very similar issue:
https://groups.google.com/g/linux.debian.bugs.dist/c/BOiLLeROrmo
However, no additional comments or information have been in follow
up to that reported issue. So in the mean time, we're going to try
and do what those smarter than I recommend, use the vendor's
binaries for their distribution.
There is one further, potentially far more depressing possibility,
that centos9's kernel doesn't support the type of hardware
we're getting. This is suggested by the precise opcode error, UD,
https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce-212.html
But again, easiest possibility first.
Change-Id: Id9bd30bc3c2f1076555317e4a3f277725fa7c1f4
- Remove skipsdist that it was never supported and causes breakage
when used with usedevelop.
- add script to allowlist for pep8 test
- disable setuptools autodiscovery
- Increase base VM memory according to new requirements for CS9
based IPA
Change-Id: I0bfef09a5312a17be54ce5c09805f06b7c349026
Introduces additional job configuration to enable automated
integration testing via tempest of the anaconda deployment
interface.
Also, configures a private subnet with DNS, which is required
by anaconda executing, in order to facilitate processing of URLs.
Change-Id: I61b5205cf2c9f83dfcabf4314247c76fb6a56acd
Instance network boot (not to be confused with ramdisk, iSCSI or
anaconda deploy methods) is insecure, underused and difficult to
maintain. This change removes a lot of related code from Ironic.
The so called "netboot fallback" is still supported for legacy boot when
boot device management is not available or is unreliable.
Change-Id: Ia8510e4acac6dec0a1e4f5cb0e07008548a00c52
Netboot option will be removed soon, this change stops covering it.
Some jobs have been renamed to reflect the new reality.
Change-Id: I7e248c3deb4778fcf59bc64821833987653fbbcd
* Fixes the IPv6 job by utilizing HOST_IPV6 instead of
SERVICE_IPV6, as Devstack now automatically wraps
SERVICE_IPV6 with brackets as if it is for a URL.
* Locks ipv6 job to bios mode. Ubuntu Focal OVMF/EDK2 does not
support IPv6 PXE boot by default.
* Split from Devstack in terms of IP usage, since full explicit
V6 usage is not a thing anymore. 4+6 is the default in devstack
and regardless of what we set on the job we see both now used.
So we delineate apart our usage for our own sanity.
* Reduce VM Interface count for IPv6 in an attempt to eliminate
in-kernel routing confusion by two interfaces on the same physical
network.
* Set IPv6 mode to dhcpv6-stateless due to fun issues in dhcp clients.
When we move to UEFI, this will need to be changed to stateful as
stateless is not supported in general by OVMF/E2DK.
Once the job has run in normal non-voting for a while, and we
ensure that it seems to be stable, we can make it voting again.
Change-Id: Ia833bfb64c6c3cc8e48cbe34ed200536652a8adf
Grenade, for some confusing reason, creates a separate network,
and uses that for upgrade testing as opposed to the original network
the VMs were bound to. If Julia's memory is correct, this was for
multinode upgrade testing.
Anyway, When in UEFI mode, it appears that the TFTP packets
don't get tracked nor cross the boundrary. We likley need to
explicitly address this, but first, lets get the job working as
it was and can then update it.
Also, update requirements because markupsafe removed soft_unicode
method taht was deprecated since a while. Jinja2 started using the
new soft_str method since version 3.0.0
Change-Id: Iaebe966569962b0d3d43774d57b570469479f159
We need configdrives to pass information reliably, and the new cirros
image does not work without them.
Change-Id: I6cafa050d5c1c8289483f968d26c50485fd4528a
Cirros partition images are not compatible with local boot since they
don't ship grub (nor a normal root partition). This change adds a script
that builds a partition image with UEFI artifacts present. It still
cannot be booted in legacy mode, but it's a progress.
Set the tempest plugin's partition_netboot option. We need it to inform
the tempest plugin about the ability to do local boot. This option
already exists but is never set.
Also set the new default_boot_option parameter, which will be introduced
and used in Iaba563a2ecbca029889bc6894b2a7f0754d27b88.
Remove netboot from most of the UEFI jobs.
Change-Id: I15189e7f5928126c6b336b1416ce6408a4950062
Not everyone on the team even knows what pxe_ipmitool used to mean :)
Furthermore, we don't need "ipa" in job names, everything uses IPA
for... even longer than pxe_ipmitool does not exist.
While here, one job was clearly meant to use BIOS boot, but it does not.
Change-Id: I8a37efa0f222361f30ddb7fa621548685a40f961
CI is memory intensive, and we realistically don't need 2 or
more API workers running for every single WSGI process which
does not implement it's own specific override value.
This should reduce the memory footprint by an average of six processes
which consume 60-90 MB each.
Change-Id: Ia0a986152c2b9fc9c5ff54cf698a351db452fbbd
Changes a neutron call to be project scoped as system
scoped can't create a resource and, and removes the unset
which no longer makes sense now that
I86ffa9cd52454f1c1c72d29b3a0e0caa3e44b829
has merged removing the legacy vars from devstack.
Also renames intenral use setting of OS_CLOUD to IRONIC_OS_CLOUD
as some services were still working with system scope or some sort
of mixed state occuring previously as some of the environment variables
were present still, however they have been removed from devstack.
This change *does* explicitly set an OS_CLOUD variable as well on
the base ironic job. This is because things like grenade for Xena
will expect the variable to be present.
Depends-On: https://review.opendev.org/c/openstack/devstack/+/818449
Change-Id: I912527d7396a9c6d8ee7e90f0c3fd84461d443c1
Change the default boot mode to UEFI, as discussed during the end
of the Wallaby release cycle and previously agreed a very long time
ago by the Ironic community.
Change-Id: I6d735604d56d1687f42d0573a2eed765cbb08aec
Neutron's firewall initialization with OVS seems
to be the source of our pain with ports not being found
by ironic jobs. This is because firewall startup errors
crashes out the agent with a RuntimeError while it is deep
in it's initial __init__ sequence.
This ultimately seems to be rooted with communication
with OVS itself, but perhaps the easiest solution is
to just disable the firewall....
Related: https://bugs.launchpad.net/neutron/+bug/1944201
Change-Id: I303989a825a7e35f1cb7b401134fd63553f6791c
Observed an OOM incident causing
ironic-tempest-ipa-partition-pxe_ipmitool to fail.
One vm started, the other seemed to try to start twice, but both times
stopped shortly into the run and the base OS had recorded in it an OOM
failure.
It appears the actual QEMU memory footprint being consumed when
configured at 3GB is upwards of 4GB, which obviously is too big to
fit in our 8GB VM instance.
Dialing back slightly, in hopes it stabilizes the job.
Change-Id: Id8cef722ed305e96d89b9960a8f60f751f900221
This is part of the work to add jobs which confirm ironic works with
FIPS enabled, but this change is also appropriate non-FIPS jobs.
Change-Id: I4af4e811104088d28d7be6df53c26e72db039e08
The devstack default limit enforcement for glance defaults
to 1GB, and unfortunately this is too small for many to use
larger images such as centos which includes hardware firmware
images for execution on baremetal where drivers need the vendor
blobs in order to load/run.
Sets ironic-base to 5GB, and updates examples accordingly.
Depends-On: https://review.opendev.org/c/openstack/devstack/+/801309
Change-Id: I41294eb571d07a270a69e5b816cdbad530749a94
Adds support to the ironic devstack plugin to configure
ironic to be used in a scope-enforcing mode in line with
the Secure RBAC effort. This change also defines two new
integration jobs *and* changes one of the existing
integration.
In these cases, we're testing functional crub interactions,
integration with nova, and integration with ironic-inspector.
As other services come online with their plugins and
devstack code being able to set the appropriate scope
enforcement configuration, we will be able to change the
overall operating default for all of ironic's jobs and
exclude the differences.
This effort identified issues in ironic-tempest-plugin,
tempest, devstack, and required plugin support in
ironic-inspector as well, and is ultimately required
to ensure we do not break the Secure RBAC.
Luckilly, it all works.
Change-Id: Ic40e47cb11a6b6e9915efcb12e7912861f25cae7
At current Zuul job in zuul.d/ironic-jobs.yaml, items of
required-project are like this (without leading hostname)
required-projects:
- openstack/ironic
- openstack/ABCD
but not like this (with leading hostname)
required-projects:
- opendev.org/openstack/ironic
- opendev.org/openstack/ABCD
With first format, if we have two openstack/ironic entries in
Zuul's tenant configuration file (Zuul tenant config file in 3rd
party CI environment usually has 2 entries: one to fetch upstream
code, another for Gerrit event stream to trigger Zuul job), we'll
have warning in zuul-scheduler's log
Project name 'openstack/ironic' is ambiguous,
please fully qualify the project with a hostname
With second format, that warning doesn't appear. And Zuul running at
3rd party CI environment can reuse Zuul jobs in zuul.d/ironic-jobs.yaml
in their Zuul jobs.
This commit modifies all Zuul jobs in zuul.d/ironic-jobs.yaml
to use second format.
Story: 2008724
Task: 42068
Change-Id: I85adf3c8b3deaf0d1b2d58dcd82724c7e412e2db