The standalone job at present has a high chance of failure
due to two separate things occuring:
1) The deployed nodes from raid tests can be left in a dirty state
as the raid configuration remains and is chosen as the root
device for the next deployment. IF this is chosen by any job,
such as rescue or a deployment test that attempts to login,
then the job fails with unable to ssh. The fix for this is
in the ironic-tempest-plugin but we need to get other fixes
into stablilize the gate first.
https://review.opendev.org/#/c/757141/
2) Long running scenarios run in cleaning such as deployment with
RAID in the standalone suite can encounter conditions where
the conductor tries to send the next command along before the
present configuration command has completed. An example is
downloading the image is still running, while a heartbeat
has occured in the background and the conductor then seeks
to perform a second action. This then causes the entire
deployment to fail, even though it was transitory.
This should be a relatively easy fix.
https://review.opendev.org/759906
Change-Id: I6b02be0fa353daac90abf2b1576800c0710f651e
We're seeing cases where cleaning barely manages to finish after
a 2nd PXE retry, failing a job.
Also make the PXE retry timeout consistent between the CI and
local devstack installations.
Change-Id: I6dc7a91d1a482008cf4ec855a60a95ec0a1abe28
As per victoria cycle testing runtime and community goal
we need to migrate upstream CI/CD to Ubuntu Focal(20.04).
keeping few jobs running on bionic nodeset till
https://storyboard.openstack.org/#!/story/2008185 is fixed
otherwise base devstack jobs switching to Focal will block
the gate.
Change-Id: I1106c5c2b400e7db899959550eb1dc92577b319d
Story: #2007865
Task: #40188
It was supposed to be made voting shortly after the split, but we
sort of forgot. It provides coverage for things (like ansible deploy)
that we used to have voting jobs for.
Change-Id: Id99586d5e01b940089d55c133d9181db05bfdc7e
This change marks the iscsi deploy interface as deprecated and
stops enabling it by default.
An online data migration is provided for iscsi->direct, provided that:
1) the direct deploy is enabled,
2) image_download_source!=swift.
The CI coverage for iscsi deploy is left only on standalone jobs.
Story: #2008114
Task: #40830
Change-Id: I4a66401b24c49c705861e0745867b7fc706a7509
The minimum amount of disk space on CI test nodes
may be approximately 60GB on /opt with now only 1GB
of available swap space by default.
This means we're constrained on the number of VMs and
their disk storage capacity in some cases.
Change-Id: Ia6dac22081c92bbccc803f233dd53740f6b48abb
Infra's disk/swap availability has been apparently
reduced with the new focal node sets such that we
have ~60GB of disk space and only 1GB of swap.
If we configure more swap, then naturally that means
we take away from available VMs as well.
And as such, we should be able to complete grenade
with only four instances, I hope.
Change-Id: I36f8fc8130ed914e8a2c2a11c9679144d931ad73
Currently ironic-base defaults to 2 and our tests try to introspect
all of them. This puts unnecessary strain on the CI systems, return
the number back to 1.
Change-Id: I820bba1347954b659fd7469ed542f98ef0a6eaf0
As part of the plan to deprecate the iSCSI deploy interface, changing
this option to a value that will work out-of-box for more deployments.
The standalone CI jobs are switched to http as well, the rest of jobs
are left with swift. The explicit indirect jobs are removed.
Change-Id: Idc56a70478dfe65e9b936006a5355d6b96e536e1
Story: #2008114
Task: #40831
Removes the deprecated support for token-less agents which
better secures the ironic-python-agent<->ironic interactions
to help ensure heartbeat operations are coming from the same
node which originally checked-in with the Ironic and that
commands coming to an agent are originating from the same
ironic deployment which the agent checked-in with to begin
with.
Story: 2007025
Task: 40814
Change-Id: Id7a3f402285c654bc4665dcd45bd0730128bf9b0
Tinyipa is not that tiny anymore and we need to increase the base
memory for VMs in jobs that use it.
Change-Id: Ibd7e87c0b5676eef94512285edaca416635a29ef
Sets the settings to enable the ramdisk iso booting tests
including a bootable ISO image that should boot the machine.
NB: The first depends-on is only for temporary testing of another
which changes the substrate ramdisk interface. Since this change pulls
in tempest testing for iso ramdisk and uses it, might as well
use it to test if the change works or not as the other two patches
below are known to be in a good state.
Change-Id: I5d4213b0ba4f7884fb542e7d6680f95fc94e112e
The kernel for the UEFI PXE job seems to download
without issue, however the required ramdisk does not
seem to be making it.
As such, changing the job to use TinyCore to see if the smaller
helps resolve these issues.
Change-Id: Ie248de2269a63a41b634f7205468edabccc53738
The default dhcp client in tinycore does not automatically trigger
IPv6 address acquisition.
This is a problem when the random spread of nodes and devstack
cause tinycore to get pulled in for the v6 job.
Change-Id: I635a69dfd7450a218474ccb7cecf1c9e29c0a43c
Our ramdisks have swelled, and are taking anywhere from 500-700
seconds to even reach the point where IPA is starting up.
This means, that a 900 second build timeout is cutting it close
and intermittent performance degredation in CI means that a job
may fail simply because it is colliding with the timeout.
One example I deconstruted today where a 900 second timout was
in effect:
* 08:21:41 Tempest job startes
* 08:21:46 Nova instance requested
* Compute service requests ironic to do the thing.
* Ironic downloads IPA and stages it - ~20-30 seconds
* VM boots and loads ipxe ~30 seconds.
* 08:23:22 - ipxe downloads kernel/ramdisk (time should be completion
unless apache has changed logging behavior for requests.)
* 08:26:28 - Kernel at 120 second marker and done decompressing
the ramdisk.
* ~08:34:30 - Kernel itself hit the six hundred second runtime
marker and hasn't even started IPA.
* 08:35:02 - Ironic declars the deploy failed due to wait timeout.
([conductor]deploy_callback_timeout hit at 700 seconds.)
* 08:35:32 - Nova fails the build saying it can't be scheduled.
(Note, I started adding times to figure out the window to myself, so
they are incomplete above.)
The time we can account for in the job is about 14 minutes or 840
seconds. As such, our existing defaults are just not enough to handle
the ramdisk size AND variance in cloud performance.
Change-Id: I4f9db300e792980059c401fce4c37a68c438d7c0
After the recent changes we're running 5 tests already, some of them
using several VMs. This should cover scheduling to different conductors
well enough, the nova test just adds random failures on top.
This allows reducing the number of test VMs to 3 per testing node
(6 totally), reducing the resource pressure and allowing giving
each VM a bit more RAM.
Also adding missing VM_SPECS_DISK to the subnode configuration.
Change-Id: Idde2891b2f15190f327e4298131a6069c58163c0
Since we merged the change to have partition and wholedisk
testing on basic_ops most of the jobs started requiring 2 VMs
to run the tempest tets.
Let's increase on the ironic-base so all jobs will be default to 2.
Removing IRONIC_VM_COUNT=2 from jobs that uses ironic-base as parent.
Change-Id: I13da6275c04ffc6237a7f2edf25c03b4ddee936a
Devstack is changing the Neutron default to OVN backend. This patch is
to make sure Ironic gate will not get broken by this change as currently
OVN doesn't support baremetal nodes.
Change-Id: I0745e07d32e3455fad2a2249c31f279fd1d38b5b
Signed-off-by: Jakub Libosvar <libosvar@redhat.com>
Alaises the old name of the cross gating job to the new name
so we can change jobs in other projects without breaking the world.
Change-Id: I9e17f48f83444b5e2cab63a2041e77e860ce6df5
In the last PTG the Neutron team discussed and decided to undeprecate
the neutron-legacy module in DevStack because that's the module being
used (almost) everywhere and it works. The lib/neutron was an attempt
to refactor the old module but, in the last few years it hasn't gained
any traction and due to the lack of features and people to work on it,
it's going to be removed from DevStack eventually.
Below is a snippet from the PTG summary email [0] about this topic:
<snippet>
In Devstack there are currently 2 modules which can configure
Neutron. Old one called "lib/neutron-legacy" and the new one called
"lib/neutron". It is like that since many cycles that "lib/neutron-legacy"
is deprecated. But it is still used everywhwere. New module isn't still
finished and isn't working fine. This is very confusing for users as
really maintained and recommended is still "lib/neutron-legacy" module.
During the discussion Sean Collins explained us that originally this
new module was created as an attempt to refactor old module, and to
make Neutron in the Devstack better to maintain. But now we see that
this process failed as new module isn't still used and we don't have
any cycles to work on it. So our final conclusion is to "undeprecate"
old "lib/neutron-legacy" and get rid of the new module.
</snippet>
This patch changes the Ironic jobs to use the old Neutron module in
DevStack.
[0]
http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015368.html
[1]
http://codesearch.openstack.org/?q=neutron-api%3A%20true&i=nope&files=&repos=
Change-Id: Ief043a0a01a800ea2d01a602000f0854df9e629f
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
Let's use the default timeout from ironic-base for all jobs
so we can avoid job timeout in our CI.
Change-Id: I5e753c4bbcb8075a1889754a468d9c3dd8310a08
A large ramdisk image tends to take an undesirable amount
of time performing the initial uncompression into memory before
the system is booted and available. This sets the number of CPU cores
by default for all jobs to 2, and only sets that back to 1 where
TinyIPA is being used.
Change-Id: I88c57a1345edb1b14c760753638ad927641b34a2
This patches update the devstack to automatically
set the new tempest configuration `boot_mode`,
it will use the value from IRONIC_BOOT_MODE variable.
Increase the number of VM's in ironic-tempest-ipa-partition-pxe_ipmitool
and ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa
to 2 since it runs cleanning and now we run two tempest tests.
Depends-On: https://review.opendev.org/735960
Change-Id: Ic6faf73430e56e2b1ff19a72b1b03f8ef34eff5f
The ovmf pacakge in bionic doesn't really work in our CI.
As a workaround we use the old package from xenial, but we can't keep
using it also in Ubuntu Focal.
This patch aims to convert the uefi jobs to use Ubuntu Focal as
base operating system and use the native ovmf package.
Story: 2007785
Task: 40025
Change-Id: I653e5da2672b14eae88c6cab923b8617432f1dc1
Adds an ability to generate network boot templates even for nodes that
use local boot via the new ``[pxe]enable_netboot_fallback`` option.
This is required to work around the situation when switching boot devices
does not work reliably.
Depends-On: https://review.opendev.org/#/c/736191/
Change-Id: Id80f2d88f9c92ff102340309a526a9b3992c6038
Story: #2007610
Task: #39600