11651 Commits

Author SHA1 Message Date
Jacob Anders
b385d9ae5b Add support for verify steps
This change adds support for verify steps in Ironic. Verify steps
allow executing actions on transition from "verifying" to "managable"
state and can perform actions such as cleaning BMC job queue or
resetting the BMC on supported platforms. Verify steps are similar
to deploy and clean steps, just simpler.

Story: 2009025
Task: 42751
Change-Id: Iee27199a0315b8609e629bac272998c28274802b
2021-09-30 20:46:17 +10:00
Zuul
62439d5633 Merge "Update python-dracclient version" 2021-09-21 20:08:48 +00:00
Zuul
dd7993af71 Merge "Reno for default_boot_mode change in Yoga" 2021-09-21 16:28:00 +00:00
Zuul
161984dafd Merge "Fix idrac-wsman having Completed with Errors jobs" 2021-09-21 16:27:57 +00:00
Zuul
0aa7f85949 Merge "Facilitate asset copy for bootloader ops" 2021-09-21 14:39:02 +00:00
Julia Kreger
e73757316a Reno for default_boot_mode change in Yoga
Change-Id: I64da534508af30318b0c9dcfce989cfb4f98da0b
2021-09-21 07:12:56 -07:00
Aija Jauntēva
38bba45ae8 Update python-dracclient version
Update python-dracclient version to indicate Xena compatibility with
7.*.* releases.

Version 7.0.0 is available from PyPI [1].

[1] https://pypi.org/project/python-dracclient/7.0.0/

Change-Id: I399f1525a473afc0783a52dabf5f85f820794e24
2021-09-21 09:57:30 -04:00
Zuul
33f09ad4d6 Merge "API endpoints to get node history" 2021-09-21 12:56:17 +00:00
Zuul
d591ed76ad Merge "Support HttpHeaders in create_subscription" 2021-09-21 12:56:12 +00:00
Julia Kreger
8e173b88d1 Disable Neutron firewall
Neutron's firewall initialization with OVS seems
to be the source of our pain with ports not being found
by ironic jobs. This is because firewall startup errors
crashes out the agent with a RuntimeError while it is deep
in it's initial __init__ sequence.

This ultimately seems to be rooted with communication
with OVS itself, but perhaps the easiest solution is
to just disable the firewall....

Related: https://bugs.launchpad.net/neutron/+bug/1944201
Change-Id: I303989a825a7e35f1cb7b401134fd63553f6791c
2021-09-20 14:09:20 +00:00
Aija Jauntēva
6e0c0e7fd0 Fix idrac-wsman having Completed with Errors jobs
iDRAC jobs can finish in 'Completed', 'Failed' and also
'Completed with Errors' state. This fix adds handling of
'Completed with Errors' as finished failed job otherwise node
stays in wait state as it does not consider such jobs
as finished.

Change-Id: I5018bf8ef6c86c6d303258f1497fa83d33b3cb76
2021-09-17 10:31:27 -04:00
Zuul
de50ff2df5 Merge "Remove images from the OS profiler docs" 2021-09-16 18:09:49 +00:00
Zuul
f651c05b27 Merge "Expand the driver contributor documentation" 2021-09-16 18:09:41 +00:00
Zuul
edd7565937 Merge "Dial back gate job memory allocation" 2021-09-16 16:08:32 +00:00
Zuul
7f5d52ff6c Merge "Fix clear foreign config in idrac-redfish RAID" 2021-09-16 12:27:17 +00:00
Zuul
1133efc9b5 Merge "Fix RAID steps for non-RAID in idrac-redfish" 2021-09-16 11:31:15 +00:00
Julia Kreger
85b6dc9356 Facilitate asset copy for bootloader ops
Adds capability to copy bootloader assets from the system OS
into the network boot folders on conductor startup.

Change-Id: Ica8f9472d0a2409cf78832166c57f2bb96677833
2021-09-15 16:24:13 -07:00
Julia Kreger
34fd84560a Dial back gate job memory allocation
Observed an OOM incident causing
ironic-tempest-ipa-partition-pxe_ipmitool to fail.

One vm started, the other seemed to try to start twice, but both times
stopped shortly into the run and the base OS had recorded in it an OOM
failure.

It appears the actual QEMU memory footprint being consumed when
configured at 3GB is upwards of 4GB, which obviously is too big to
fit in our 8GB VM instance.

Dialing back slightly, in hopes it stabilizes the job.

Change-Id: Id8cef722ed305e96d89b9960a8f60f751f900221
2021-09-15 13:58:19 -07:00
Julia Kreger
fb9eae7412 API endpoints to get node history
Adds API for retrieving node history events
via a node. Includes pagination and limitation
of the response set.

Story: 2002980
Tas: 42961

Change-Id: I22a92fa6c30d721f6a5dd0670b2e0a9cf76ad7b1
2021-09-15 10:54:11 -07:00
Zuul
20503d94e5 Merge "Add support for fields in drivers API" 2021-09-15 13:33:46 +00:00
Zuul
472959c1d7 Merge "Fix iDRAC import configuration missing task handling" 2021-09-15 07:56:22 +00:00
Zuul
ba4cb57ef3 Merge "Set postgresql password encryption for FIPS compliance" 2021-09-15 07:33:57 +00:00
Zuul
db5fb9c356 Merge "Move ramdisk deploy to its own module" 2021-09-15 05:53:36 +00:00
Zuul
ae99d97c4c Merge "Document eject_vmedia for Redfish" 2021-09-14 21:39:50 +00:00
Zuul
2bb84a02fb Merge "Fix idrac-wsman set_power_state to wait on HW" 2021-09-14 20:58:13 +00:00
Zuul
3f8be1471a Merge "Record node history and manage events in db" 2021-09-14 20:04:53 +00:00
Zuul
b650235b28 Merge "Implements node history: database" 2021-09-14 17:16:32 +00:00
Zuul
e377ec72a2 Merge "Fix in-band cleaning for ramdisk and anaconda deploy" 2021-09-14 17:16:26 +00:00
Zuul
227850979c Merge "Fix iDRAC import configuration job with errors" 2021-09-14 17:12:20 +00:00
Aija Jauntēva
2a0fd1d13f Fix idrac-wsman set_power_state to wait on HW
set_power_state has returned to the caller immediately without
confirming the system has reached the requested state. This fixes that
by synchronously waiting until the target state has been read before
returning.

That bug can cause instance workload deployments to fail on Dell EMC
PowerEdge server models on which IPA ramdisk soft power off fails and
ironic employs its OOB fallback strategy. After an otherwise successful
deployment, the node is active, but is powered off. No error is reported
in last_error. If the subsequent instance workflow expects the system to
be powered on into the operating system, it fails.

Story: 2009204
Task: 43261
Change-Id: I3112a22149c07e5508f26c79f33d09aeb905c308
2021-09-14 16:02:54 +00:00
Iury Gregory Melo Ferreira
72eac4e748 Support HttpHeaders in create_subscription
This commit adds support to check if request to create
a subscription contains HttpHeaders so we can send in the
payload. [1]

[1] https://redfish.dmtf.org/schemas/v1/EventDestination.v1_0_0.json

Change-Id: I548ae06e22f217bf10ce6a7af57addfa7f9be555
2021-09-14 15:44:49 +02:00
Aija Jauntēva
50c87cf633 Fix clear foreign config in idrac-redfish RAID
After volumes are deleted in Redfish RAID also
clear foreign config if there is any.

Story: 2009160
Task: 43145

Depends-On: https://review.opendev.org/c/x/sushy-oem-idrac/+/806888
Change-Id: Ifde4656b4edd387ce2db2dbfc4c5ede261fafc70
2021-09-14 09:43:07 -04:00
Julia Kreger
4fc1abf91f Fix driver task pattern to reduce periodic db load
Previously, a pattern of periodic tasks was created where
nodes, and in many cases, all nodes not actively locked nor
those in maintenance state, were pulled in by a periodic task.
These periodic tasks would then create tasks which generated
additional database queries in order to populate the task object.

With the task object populated, the driver would then evaluate
if the driver in question was for the the driver interface in
question and *then* evaluate if work had to be performed.

However, that field containing a pointer to if work needed to be
performed as often already queried from the database on the
very initial query to generate the list of nodes to evaluate.

In essence, we've moved this up in the sequence so we evaluate
that field in question prior to creating the task, potentially
across every conductor, depending on the query, and ultimately
which drivers are enabled.

This saves potentially saves hundreds of thousands of needless
database queries on a medium size deployment per single day,
depending on which drivers and driver interfaces are in use.

Change-Id: I409e87de2808d442d39e4d0ae6e995668230cbba
2021-09-13 10:05:37 -07:00
Aija Jauntēva
c174380db3 Fix iDRAC import configuration missing task handling
Older iDRACs delete the task after 1 minute, since 5.00.00.00
the task is being kept for 10 minutes.
However, if encountering the issue, handle it and advise
user to either upgrade iDRAC if not already or decrease
checking interval.
Prior this node got stuck in wait mode forever if task was
deleted as raised exception by periodic didn't make the step
fail.

Change-Id: I5d500b7d53e9804aa3b54dc400d8621f40cd5d0c
2021-09-13 11:57:03 -04:00
Zuul
fa1c60cbce Merge "Enable parallel downloads and allow tuning concurrency" 2021-09-13 07:19:23 +00:00
Zuul
c2ebeadc8c Merge "Always update cache for HTTP images if Last Modified is unknown" 2021-09-13 07:19:20 +00:00
Julia Kreger
d17749249c Record node history and manage events in db
* Adds periodic task to purge node_history entries based upon
  provided configuration.
* Adds recording of node history entries for errors in the
  core conductor code.
* Also changes the rescue abort behavior to remove the notice
  from being recorded as an error, as this is a likely bug in
  behavior for any process or service evaluating the node
  last_error field.
* Makes use of a semi-free form event_type field to help
  provide some additional context into what is going on and
  why. For example if deployments are repeatedly failing,
  then perhaps it is a configuration issue, as opposed to
  a general failure. If a conductor has no resources, then
  the failure, in theory would point back to the conductor
  itself.

Story: 2002980
Task: 42960

Change-Id: Ibfa8ac4878cacd98a43dd4424f6d53021ad91166
2021-09-10 14:47:27 -07:00
Dmitry Tantsur
d0963378da Document eject_vmedia for Redfish
Change-Id: Id8a3a6a334ea59f676e279ca8bc7eecf19cebea6
2021-09-10 13:08:41 +02:00
Zuul
b0a8d9df9a Merge "Redfish RAID: Use identity instead of durable_name" 2021-09-10 09:38:35 +00:00
Zuul
651a05707c Merge "Fix typo and add subscription docs" 2021-09-10 09:25:25 +00:00
Zuul
6e8d48d860 Merge "update irmc document" 2021-09-10 07:28:03 +00:00
Zuul
b22c60d988 Merge "Clean step to remove CA certificates from iLO" 2021-09-09 20:30:44 +00:00
Zuul
f8763bae57 Merge "Trivial: shorten the deploy/clean step failure message" 2021-09-09 17:32:53 +00:00
Zuul
0affe4de8d Merge "Use packaged grub efi for network boot" 2021-09-09 17:19:17 +00:00
Kaifeng Wang
fbaad948d8 Implements node history: database
This patch provides basic data model change to support node history.
Batch removal is not included in this patch.

Change-Id: I5c7cebd585ee84b5b57bd4690d4074baf0d05699
Story: 2002980
Task: 22989
2021-09-09 09:35:09 -07:00
Aija Jauntēva
391543946b Fix iDRAC import configuration job with errors
In iDRAC import configuration task can be completed with OK health
but having some errors, for example, when one disk failed to be
created and another succeeded.
Also changed to exclude informational messages for error reporting.

Story: 2009198
Task: 43253

Change-Id: I02b63547566c94ffa1a5d0e84bd1b1f10d28bfc3
2021-09-09 05:13:22 -04:00
Dmitry Tantsur
a9d82bb12b Enable parallel downloads and allow tuning concurrency
Currently we set parallel_image_downloads to False, which means that
all downloads that go through the image cache are serialized.
This change enables it by default and deprecates in favour of a new
more fine-grained mechanism: the new option image_download_concurrency
specifies how many downloads (and raw conversions) will run in parallel.

Update logging to trace how long each download takes.

Change-Id: I8b85afda295029f85e82143cf7d4bcb2316860f6
2021-09-09 06:00:30 +00:00
Dmitry Tantsur
94a1560d31 Always update cache for HTTP images if Last Modified is unknown
Currently we default to assuming the cache is up-to-date. This is likely
wrong. Normal web servers provide Last Modified for files they serve.
If it is absent, chances are high the image is served by some sort
of a dynamic service, which may modify the URL on fly.

In any case, always updating the image is a safer choice.

Change-Id: I0548db14a97638d26ebb687e8f47f1b295d1f774
2021-09-08 16:40:49 +00:00
Zuul
743f206d07 Merge "Add release note upgrade version check handling change" 2021-09-08 14:39:52 +00:00
vinay50muddu
df4778605d Clean step to remove CA certificates from iLO
Implements clean step "clear_ca_certificates" to remove any 3rd party
expired/revoked CA certificates from iLO.

Change-Id: I0a3c1da9b94e4037a53ade100354ac51ca08db35
Story: #2008784
Task: #42175
2021-09-08 04:09:01 -07:00