510 Commits

Author SHA1 Message Date
Arne Wiebalck
e751218059 Burn-in: Add options for named log files
In order to ease logging of the various burn-in steps, this patch
proposes options to define the outpout files for all burn-in steps:
{'agent_burnin_cpu', 'agent_burnin_vm', 'agent_burnin_fio_network',
'agent_burnin_fio_disk'}_outputfile  via a node's driver-info.

Story: #2007523
Task: #44102

Change-Id: I327cae5949d38e738d3c535487b3795d00ad8f1e
2021-12-08 17:47:19 +01:00
Derek Higgins
12f5f30e63 Instruct qemu-img to write image zeros to disk.
Doing this will cause it not to zero out the entire
block device which can be very costly on a slow HDD.

Story: 2009227
Task: 43315

Change-Id: I62ba2afc037d9844387e6b0984fe5008779d95d2
2021-12-08 15:56:05 +00:00
Riccardo Pittau
9b827944b0 Prepare for bugfix release
Change-Id: Ic9437c6d55879db45aa1aa3dc548ea62ed5ca10d
2021-12-07 10:38:58 +01:00
Riccardo Pittau
a35c77e62e [trivial] Fix Xena release notes versions
Change-Id: Iaf511e6aeae59409ac6d1ba4543d297c5bb2ec01
2021-12-06 11:09:03 +01:00
Arne Wiebalck
c6b1cb1c32 Burn-in: Add SMART self test to disk burn-in
Add the option to run a SMART self test right after
the disk burn-in. The disk burn-in step will fail if
the SMART test on any of the disk fails.

Story: #2007523
Task: #43383

Change-Id: I1312d5b71bedd044581a136af0b4c43769d21877
2021-12-06 09:09:35 +01:00
Riccardo Pittau
23e67b5fea Re-read the partition table with partx -a, part 2
Use add instead of update to re-read the partition table with partx.

See [1] for more details.

Co-authored-by: Arne Wiebalck <arne.wiebalck@cern.ch>

[1] https: //opendev.org/openstack/ironic-python-agent/commit/dc8c1f16f9a00e2bff21612d1a9cf0ea0f3addf0

Change-Id: I2336e22dadc790cfbde87904612fcaa3b8c501db
2021-11-09 13:03:14 +01:00
Arne Wiebalck
dc8c1f16f9 Re-read the partition table with partx -a
Re-read the partition table with 'partx -a', rather than 'partx -u'.

This should fix an timing issue where the bootloader installation
fails to mount the EFI partition from a whole disk image since it
is not yet aware of the new partitions (observed with both, the
iscsi and the direct deploy interface).

Change-Id: If5da3075e813ae01df3decf8f0647aba111b0515
2021-11-06 13:43:48 +01:00
Zuul
0b56cca7f0 Merge "Fix UEFI record regex" 2021-11-05 14:59:35 +00:00
Julia Kreger
c5268bbdbb Fix UEFI record regex
I accidently put colons on the test data and remembered taking the
colon character out of the regex I was working on, but apparently
left it in, and accounted for the active entry indicator flag
which appears to have inconsistent support across vendors.

The regex has been fixed, and a test added from a Lenovo SR650
which has some additional string entry data in the UEFI output
which may separate entries.

Change-Id: I1f67b0fb1f645fa82e98bd7c7bba3ffc7755cc74
2021-11-04 09:45:25 -07:00
Zuul
a4b73058ee Merge "Always include the oslo_log log file in ramdisk logs" 2021-11-04 15:14:33 +00:00
Julia Kreger
67eddfa7e3 Delete EFI boot entry duplicate labels first
Some firmware seems to take an objection with EFI nvram
entries being deleted after one is added, resulting in the
entire entry table being reset to the last known good state.

This is problematic, as ultimately deployments can time out
if we previously booted with Networking, and the machine, while
commanded to do other wise, reboots back to networking regardless.

We will now delete entries first, before proceeding.

Additionally, for general use, this pattern may serve the
community better by avoiding cases where we would have
previously just relied upon efibootmgr[0] to warn us of duplicate
entries.

[0]: 103aa22ece/src/efibootmgr.c (L228)

Change-Id: Ib61a7100a059e79a8b0901fd8f46b9bc41d657dc
Story: 2009649
Task: 43808
2021-11-01 06:59:26 -07:00
Dmitry Tantsur
2cedaa53c2 Always include the oslo_log log file in ramdisk logs
Even if journald is present, there is no guarantee that IPA logs there
(this is the case in container-based ramdisks).

Change-Id: Iceeab0010827728711e19e5b031ccac55fe1efde
2021-10-28 18:32:40 +02:00
Dmitry Tantsur
8a66978666 Respect global parameters when downloading a configdrive
* Use the same TLS parameters as everything else
* Respect image_download_connection_timeout
* Do not ignore HTTP errors

Change-Id: I84f8021f731186d82e44ac3d4ef2d12df13f830a
2021-10-20 15:11:16 +02:00
Arne Wiebalck
333ed70c94 Assert EFI part UUID is not None before editing fstab
The EFI partition UUID may be None and this will break
the fstab editing. While this is not necessarily fatal when
instantiating a node, it creates an exception at the end of
bootloader installation, so only attempt to add a line to
fstab when the UUID is not None.

Change-Id: I68799980e67c05afe4ca68ca9733605dd166d54d
2021-10-08 08:35:29 +02:00
Arne Wiebalck
9d707e9f4b Software RAID: Call udev_settle before creation
This patch fixes a race during software RAID creation:
we create the partition with parted, the kernel then
notifies udev, but we need to wait for udevd to create
the device files before calling mdadm to create the
md device.

Credits to jcosmao for finding this.

Change-Id: I642f28acc351cf50263e37dfbc8468bf59de2cc5
2021-10-05 11:42:49 +02:00
1665abca04 Update master for stable/xena
Add file to the reno documentation build to show release notes for
stable/xena.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/xena.

Sem-Ver: feature
Change-Id: If28b1df9c76469062e6d9ce28edcf3026fdbfbaa
2021-09-22 18:38:15 +00:00
Zuul
ab026da95b Merge "Expose BMC MAC address in inventory data" 2021-08-08 17:21:25 +00:00
Jonas Schäfer
61af712fe5 Expose BMC MAC address in inventory data
This exposes the MAC address of the first LAN channel with an assigned
IP address in the inventory data. This is useful for inventory
processes where the asset number is not discoverable from the software
side: the BMC MAC is going to be unique (at least within an
organization).

Change-Id: I8a4bee0c25743befd7f2033e4e0cba26895c8926
2021-08-06 13:14:45 +02:00
Arne Wiebalck
5531d5cee7 Force immediate NTP time sync with chronyd at IPA startup
In order to make sure we have the correct time early, e.g.
by the time we create a TLS certificate, this patch proposes
to force an immediate NTP update when using chronyd. While
the previous approach uses the passed NTP server as well, the
update may happen only after chronyd has performed measurements
(which may be too late).

Story: #2009058
Task: #42843

Change-Id: I6edafe8edeb8549f324959e7a1ec175c3049a515
2021-07-16 10:28:31 +02:00
Arne Wiebalck
cacdd9bab3 Burn-in: Add network step
Add a clean step for network burn-in via fio. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42385

Change-Id: I2861696740b2de9ec38f7e9fc2c5e448c009d0bf
2021-07-13 11:36:31 +02:00
Zuul
20e145e4da Merge "Only mount the ESP if not yet mounted" 2021-06-25 15:25:33 +00:00
Arne Wiebalck
27568204ae Only mount the ESP if not yet mounted
Check if the ESP is already mounted before attempting to mount it
for the bootloader installation.

Change-Id: Ifd738b2c5663f1a211d7e13b5ba386be631d8db1
2021-06-21 12:10:54 +02:00
Dmitry Tantsur
b605943796 Coalesce heartbeats
The IPA sends heartbeats to the conductor periodically and when
requested, e.g. at the end of asynchronous commands. In order
to avoid to send such notifications in too quick succession,
e.g. when two asynchronous commands finish at the same time or
when the periodic heartbeat was just sent right before a command
ended, this patch proposes to coalesce heartbeats which are
close together timewise and send only one for all of them
in a time interval of 5 seconds.

Co-Authored-By: Arne Wiebalck <arne.wiebalck@cern.ch>

Story: #2008983
Task: 42633

Change-Id: Idfbce44065e1e5a8b730b94741b2604c51f0ab14
2021-06-18 17:19:30 +02:00
Julia Kreger
2fab70c36b Utilize CSV file for EFI loader selection
Adds support to identify and utilize a CSV file to signal which
bootloader to utilize, and set it when the OS is running as opposed
to when EFI is running. This works around EFI loader potentially
crashing some vendors hardware types when entry stored in the
image does not match the EFI loader record which was utilzied to
boot.

Grub2+shim specifically specifically needs the CSV file name
and entry label to match what the system was booted with in order
to prevent the machine from potentially crashing.

See https://storyboard.openstack.org/#!/story/2008962
and https://bugzilla.redhat.com/show_bug.cgi?id=1966129#c37
for more information.

Change-Id: Ibf1ef4fe0764c0a6f1a39cb7eebc23ecc0ee177d
Story: 2008962
Task: 42598
Co-Authored-By: Bob Fournier <bfournie@redhat.com>
2021-06-10 11:23:14 -07:00
Zuul
32e3b435bc Merge "Burn-in: Add disk step" 2021-06-07 12:46:15 +00:00
Zuul
434de569e6 Merge "Ignore efi grub2-install failure" 2021-06-07 09:47:12 +00:00
Steve Baker
a057be7dad Ignore efi grub2-install failure
Recent releases of redhat grub2 will always fail when installing to
EFI paths, to encourage a transition to the signed shim bootloader.

Partition image deploys avoid calling grub2-install with the
preserve-efi-assets functions. Deploying whole disk images doesn't
require grub2-install. This leaves whole disk images installed onto
softraid devices, which still attempts to call grub2-install.

This change will still attempt to run grub2-install in this
one remaining case, but will ignore any failure.

A future enhancement can avoid calling grub2-install entirely so that
non-redhat secure-boot capable images can keep their signed
bootloaders.

Story: 2008923
Task: 42521
Change-Id: If432ef795d64d76442d739eb4f7d155ff847041e
2021-06-04 10:03:55 +12:00
Iury Gregory Melo Ferreira
031b5c4b61 Clean-up releasenotes for 8.0.0 release
Change-Id: I6a52b55237225f169ff59ac8854f739d9c0f92c7
2021-06-02 18:59:10 +02:00
Zuul
7fdbcde3de Merge "Stop accepting duplicated configdrive" 2021-06-02 12:36:57 +00:00
Dmitry Tantsur
f657526807 Stop accepting duplicated configdrive
We're currently requiring it twice: in image_info and in a separate
configdrive argument. I think we should eventually settle on separate
arguments for separate entities, so this change makes the value in
image_info optional with a goal to stop accepting it.

We could probably just remove the handling in image_info, but a
deprecation is safer.

The (unused in ironic) cache_image call is updated with an optional
configdrive arguments.

Story: #2008904
Task: #42480
Change-Id: I1e2efa28efa3ea7e389774cb7633d916757bc6ed
2021-06-02 11:19:39 +02:00
Julia Kreger
9e4c7052a2 Limit qemu-img execution arenas
qemu-img attempts to launch multiple threads by default *and*
attempts to have multiple memory allocation arenas to operate
from. While multithreading can be good for performance, this
pattern and the memory footprint for process launch and
dependencies can turn the memory footprint for a cirros image
conversion (16MB) into 1.2GB of memory being asked for by the
qemu-img tool.

In order to limit this impact, as the default number of arenas
is governed by the number of CPUs times the number 8, it seems
reasonable to lower this to a more reasonable number which
also helps keep our possible memory footprint from being exceeded.

Change-Id: I71a28ec59ec31c691205eb34d9fcab63a2ccb682
Story: 2008928
Task: 42528
2021-05-26 13:04:46 -07:00
Zuul
b4dd03168e Merge "Enable out-of-order writes when writing whole disk images" 2021-05-25 13:03:29 +00:00
Arne Wiebalck
20c5894bc2 Burn-in: Add disk step
Add a clean step for disk burn-in via fio. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42384

Change-Id: I5f5e336bd629846b3d779fd0fc7a2060b385b035
2021-05-21 16:33:11 +02:00
Zuul
6fc5a14760 Merge "Do not serialize command_params" 2021-05-18 14:58:42 +00:00
Dmitry Tantsur
d1844c61b1 Enable out-of-order writes when writing whole disk images
Per documentation it improves performance when using -O host_device.

Change-Id: Ic6a97af9f865d07c9cb4257397a320475a28f88b
2021-05-18 14:41:21 +02:00
Dmitry Tantsur
51aa31070a Do not serialize command_params
The command params can be huge when configdrive is used. There is no
point in sending them back, Ironic does not use them anyhow.

Story: #2008904
Task: #42479
Change-Id: I6e3db5db2042ca3fb5dafacfacf036fd7fc2fc4c
2021-05-18 12:59:28 +02:00
Zuul
d6e4fbd827 Merge "Remove the iscsi extension" 2021-05-12 11:08:19 +00:00
Zuul
823e0ed743 Merge "Burn-in: Add memory step" 2021-05-11 09:31:54 +00:00
Zuul
29f3230791 Merge "Software RAID: RAID the ESPs" 2021-05-11 09:31:36 +00:00
Zuul
9837f1c2f0 Merge "Fix NVMe Partition image on UEFI" 2021-05-10 15:00:21 +00:00
Zuul
5c01ec4f6f Merge "Burn-in: Add CPU step" 2021-05-10 15:00:14 +00:00
Dmitry Tantsur
be3882162e Remove the iscsi extension
Change-Id: I2f0e581575112d6c7ba0d211661cab3e0b6caca6
2021-05-10 12:43:44 +02:00
Julia Kreger
fe825fa97e Fix NVMe Partition image on UEFI
The _manage_uefi code has a check where it attempts to just
identify the precise partition number of the device, in order
for configuration to be parsed and passed. However, the same code
did not handle the existence of a `p1` partition instead of just a
partition #1. This is because the device naming format is different
with NVMe and Software RAID.

Likely, this wasn't an issue with software raid due to how complex the
code interaction is, but the docs also indicate to use only whole disk
images in that case.

This patch was pulled down my one RH's professional services folks
who has confirmed it does indeed fix the issue at hand. This is noted
as a public comment on the Red Hat bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=1954096

Story: 2008881
Task: 42426
Related: rhbz#1954096
Change-Id: Ie3bd49add9a57fabbcdcbae4b73309066b620d02
2021-05-04 16:44:37 +00:00
Arne Wiebalck
5c222560f0 Burn-in: Add memory step
Add a clean step for memory burn-in via stress-ng. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42383

Change-Id: I33a83968c9f87cf795ec7ec922bce98b52c5181c
2021-05-01 10:36:58 +02:00
Arne Wiebalck
6702fcaa43 Burn-in: Add CPU step
Add a clean step for CPU burn-in via stress-ng. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42382

Change-Id: I14fd4164991fb94263757244f716b6bfe8edf875
2021-05-01 10:36:20 +02:00
Zane Bitter
ed791d9778 Fix getting memory size in some lshw output
Due to a regression in lshw introduced by
https://github.com/lyonel/lshw/pull/60, there are some versions in the
wild that do not return sizes for memory banks <32GiB. In those cases,
work around the problem by looking at the top-level size (if available)
to find the total size. Previously we assumed that we only needed the
top-level size when there was no list of memory banks.

The issue is fixed upstream by https://github.com/lyonel/lshw/pull/65,
but the erroneous patch is still present in the lshw-B.02.19.2-5.el8
package in CentOS 8.4 and 8.5.

Change-Id: I6eb5981d28b9ae368239af0c1d0ec32ff79d95b3
Story: #2008865
Task: 42395
2021-04-29 14:41:11 -04:00
Riccardo Pittau
2057d861a6 [trivial] Add versions to wallaby release notes
Change-Id: I806b5f4b74c087a16d194d36665b1115f6d34184
2021-04-23 10:31:56 +02:00
Zuul
9edb13d891 Merge "Do not fail network interface collection on unsupported interface" 2021-04-22 16:35:25 +00:00
Derek Higgins
9c3fbfd000 Add a call to "udevadm settle" in write_image.sh
After GPT and MBR are destroyed systemd-udevd gets triggered
which may hold /dev/sda open preventing qemu-img from writting
its image.

Story: 2008830
Task: 42312
Change-Id: I6105192a16fcb7f6898910e8d0ab824d731d491d
2021-04-20 17:48:46 +01:00
Arne Wiebalck
c2d04dc156 Software RAID: RAID the ESPs
For software RAID in UEFI mode, we create ESPs on all holder disks
and copy the bootloader there. Since there is no mechanism to keep
the ESPs in sync, e.g. on kernel upgrades or when kernel parameters
are updated, the ESPs will get out of sync eventually. This may lead
to a situation where a node boots with outdated parameters or does
not have any of the installed kernels in the boot menu anymore.
This change proposes to RAID the ESPs. While the UEFI firmware will
find an ESP partition (one leg of the mirror), the node will see
an md device and all subsequent updates will go to all member disks.

Also, remove the source ESP after copying in order to avoid mount
confusion (same UUID!).

Story: #2008745
Task: #42103
Change-Id: I9078ef37f1e94382c645ae98ce724ac9ed87c287
2021-04-16 14:40:28 +02:00