11628 Commits

Author SHA1 Message Date
Julia Kreger
fb9eae7412 API endpoints to get node history
Adds API for retrieving node history events
via a node. Includes pagination and limitation
of the response set.

Story: 2002980
Tas: 42961

Change-Id: I22a92fa6c30d721f6a5dd0670b2e0a9cf76ad7b1
2021-09-15 10:54:11 -07:00
Zuul
20503d94e5 Merge "Add support for fields in drivers API" 2021-09-15 13:33:46 +00:00
Zuul
472959c1d7 Merge "Fix iDRAC import configuration missing task handling" 2021-09-15 07:56:22 +00:00
Zuul
ba4cb57ef3 Merge "Set postgresql password encryption for FIPS compliance" 2021-09-15 07:33:57 +00:00
Zuul
db5fb9c356 Merge "Move ramdisk deploy to its own module" 2021-09-15 05:53:36 +00:00
Zuul
ae99d97c4c Merge "Document eject_vmedia for Redfish" 2021-09-14 21:39:50 +00:00
Zuul
2bb84a02fb Merge "Fix idrac-wsman set_power_state to wait on HW" 2021-09-14 20:58:13 +00:00
Zuul
3f8be1471a Merge "Record node history and manage events in db" 2021-09-14 20:04:53 +00:00
Zuul
b650235b28 Merge "Implements node history: database" 2021-09-14 17:16:32 +00:00
Zuul
e377ec72a2 Merge "Fix in-band cleaning for ramdisk and anaconda deploy" 2021-09-14 17:16:26 +00:00
Zuul
227850979c Merge "Fix iDRAC import configuration job with errors" 2021-09-14 17:12:20 +00:00
Aija Jauntēva
2a0fd1d13f Fix idrac-wsman set_power_state to wait on HW
set_power_state has returned to the caller immediately without
confirming the system has reached the requested state. This fixes that
by synchronously waiting until the target state has been read before
returning.

That bug can cause instance workload deployments to fail on Dell EMC
PowerEdge server models on which IPA ramdisk soft power off fails and
ironic employs its OOB fallback strategy. After an otherwise successful
deployment, the node is active, but is powered off. No error is reported
in last_error. If the subsequent instance workflow expects the system to
be powered on into the operating system, it fails.

Story: 2009204
Task: 43261
Change-Id: I3112a22149c07e5508f26c79f33d09aeb905c308
2021-09-14 16:02:54 +00:00
Julia Kreger
4fc1abf91f Fix driver task pattern to reduce periodic db load
Previously, a pattern of periodic tasks was created where
nodes, and in many cases, all nodes not actively locked nor
those in maintenance state, were pulled in by a periodic task.
These periodic tasks would then create tasks which generated
additional database queries in order to populate the task object.

With the task object populated, the driver would then evaluate
if the driver in question was for the the driver interface in
question and *then* evaluate if work had to be performed.

However, that field containing a pointer to if work needed to be
performed as often already queried from the database on the
very initial query to generate the list of nodes to evaluate.

In essence, we've moved this up in the sequence so we evaluate
that field in question prior to creating the task, potentially
across every conductor, depending on the query, and ultimately
which drivers are enabled.

This saves potentially saves hundreds of thousands of needless
database queries on a medium size deployment per single day,
depending on which drivers and driver interfaces are in use.

Change-Id: I409e87de2808d442d39e4d0ae6e995668230cbba
2021-09-13 10:05:37 -07:00
Aija Jauntēva
c174380db3 Fix iDRAC import configuration missing task handling
Older iDRACs delete the task after 1 minute, since 5.00.00.00
the task is being kept for 10 minutes.
However, if encountering the issue, handle it and advise
user to either upgrade iDRAC if not already or decrease
checking interval.
Prior this node got stuck in wait mode forever if task was
deleted as raised exception by periodic didn't make the step
fail.

Change-Id: I5d500b7d53e9804aa3b54dc400d8621f40cd5d0c
2021-09-13 11:57:03 -04:00
Zuul
fa1c60cbce Merge "Enable parallel downloads and allow tuning concurrency" 2021-09-13 07:19:23 +00:00
Zuul
c2ebeadc8c Merge "Always update cache for HTTP images if Last Modified is unknown" 2021-09-13 07:19:20 +00:00
Julia Kreger
d17749249c Record node history and manage events in db
* Adds periodic task to purge node_history entries based upon
  provided configuration.
* Adds recording of node history entries for errors in the
  core conductor code.
* Also changes the rescue abort behavior to remove the notice
  from being recorded as an error, as this is a likely bug in
  behavior for any process or service evaluating the node
  last_error field.
* Makes use of a semi-free form event_type field to help
  provide some additional context into what is going on and
  why. For example if deployments are repeatedly failing,
  then perhaps it is a configuration issue, as opposed to
  a general failure. If a conductor has no resources, then
  the failure, in theory would point back to the conductor
  itself.

Story: 2002980
Task: 42960

Change-Id: Ibfa8ac4878cacd98a43dd4424f6d53021ad91166
2021-09-10 14:47:27 -07:00
Dmitry Tantsur
d0963378da Document eject_vmedia for Redfish
Change-Id: Id8a3a6a334ea59f676e279ca8bc7eecf19cebea6
2021-09-10 13:08:41 +02:00
Zuul
b0a8d9df9a Merge "Redfish RAID: Use identity instead of durable_name" 2021-09-10 09:38:35 +00:00
Zuul
651a05707c Merge "Fix typo and add subscription docs" 2021-09-10 09:25:25 +00:00
Zuul
6e8d48d860 Merge "update irmc document" 2021-09-10 07:28:03 +00:00
Zuul
b22c60d988 Merge "Clean step to remove CA certificates from iLO" 2021-09-09 20:30:44 +00:00
Zuul
f8763bae57 Merge "Trivial: shorten the deploy/clean step failure message" 2021-09-09 17:32:53 +00:00
Zuul
0affe4de8d Merge "Use packaged grub efi for network boot" 2021-09-09 17:19:17 +00:00
Kaifeng Wang
fbaad948d8 Implements node history: database
This patch provides basic data model change to support node history.
Batch removal is not included in this patch.

Change-Id: I5c7cebd585ee84b5b57bd4690d4074baf0d05699
Story: 2002980
Task: 22989
2021-09-09 09:35:09 -07:00
Aija Jauntēva
391543946b Fix iDRAC import configuration job with errors
In iDRAC import configuration task can be completed with OK health
but having some errors, for example, when one disk failed to be
created and another succeeded.
Also changed to exclude informational messages for error reporting.

Story: 2009198
Task: 43253

Change-Id: I02b63547566c94ffa1a5d0e84bd1b1f10d28bfc3
2021-09-09 05:13:22 -04:00
Dmitry Tantsur
a9d82bb12b Enable parallel downloads and allow tuning concurrency
Currently we set parallel_image_downloads to False, which means that
all downloads that go through the image cache are serialized.
This change enables it by default and deprecates in favour of a new
more fine-grained mechanism: the new option image_download_concurrency
specifies how many downloads (and raw conversions) will run in parallel.

Update logging to trace how long each download takes.

Change-Id: I8b85afda295029f85e82143cf7d4bcb2316860f6
2021-09-09 06:00:30 +00:00
Dmitry Tantsur
94a1560d31 Always update cache for HTTP images if Last Modified is unknown
Currently we default to assuming the cache is up-to-date. This is likely
wrong. Normal web servers provide Last Modified for files they serve.
If it is absent, chances are high the image is served by some sort
of a dynamic service, which may modify the URL on fly.

In any case, always updating the image is a safer choice.

Change-Id: I0548db14a97638d26ebb687e8f47f1b295d1f774
2021-09-08 16:40:49 +00:00
Zuul
743f206d07 Merge "Add release note upgrade version check handling change" 2021-09-08 14:39:52 +00:00
vinay50muddu
df4778605d Clean step to remove CA certificates from iLO
Implements clean step "clear_ca_certificates" to remove any 3rd party
expired/revoked CA certificates from iLO.

Change-Id: I0a3c1da9b94e4037a53ade100354ac51ca08db35
Story: #2008784
Task: #42175
2021-09-08 04:09:01 -07:00
Zuul
a71ed5de26 Merge "Fix to unblock oslodb 11.0.0" 2021-09-08 04:31:11 +00:00
Steve Baker
fc8601cd02 Use packaged grub efi for network boot
Instead of using the efi written by grub-mknetdir, use the packaged
signed binary. The core.efi generated by grub-mknetdir is not signed
so it does not help with end-to-end secure-boot.

Also, the successful run of
ironic-tempest-ipa-partition-uefi-pxe-grub2[1] demonstrates that grub
continues to boot even when the grub-mknetdir generated
grub/x86_64-efi/*.lst are missing. Avoiding using grub-mknetdir makes
for a much simpler setup of /tftpboot for grub network boot.

[1] https://zuul.opendev.org/t/openstack/build/bab62f6bf032474cb80af3cb5a999117/log/tftpd-journal.txt

Change-Id: Ide0aa416391c20371bbb8d1a18288b262872e313
2021-09-08 13:35:45 +12:00
Dmitry Tantsur
a76fc6f54e Trivial: shorten the deploy/clean step failure message
It looks like this now:

Agent returned error for deploy step {'step': 'write_image', 'priority': 80,
'argsinfo': None, 'interface': 'deploy'} on node
878c3113-0035-5033-9f99-46520b89b56d : Error performing deploy_step
write_image: <next level error message>

Change-Id: Iabfb802cbfb96a9a02d6811f450f151623d5ca1f
2021-09-07 14:57:41 +02:00
Iury Gregory Melo Ferreira
b7b0dfb14e Fix to unblock oslodb 11.0.0
Change-Id: If21760f21ad241611e2f6c95879cb44d8df90c94
2021-09-06 17:14:56 +02:00
Dmitry Tantsur
88d6b99cc7 Move ramdisk deploy to its own module
It does not depend on PXE and is actually often used with virtual media.

Change-Id: Ida6edf819dbb3d1a51c465b4e109eafd977fd66c
2021-09-06 16:30:53 +02:00
Dmitry Tantsur
75304deefb Fix in-band cleaning for ramdisk and anaconda deploy
We have implemented the cleaning prepare/tear_down, but haven't
implemented fetching/running in-band clean steps. This change moves
the cleaning logic from AgentDeployMixin to AgentBaseMixin, where it
arguably belongs.

In a follow-up patch I'm planning to reduce the number of mix-ins we
currently have, but that won't be backportable.

Change-Id: Ibc5610b14cea487d26191249e5c0333fdcd4b914
2021-09-06 16:01:10 +02:00
Julia Kreger
3dc887fa78 Add release note upgrade version check handling change
Change-Id: I0ade8663e4f3fcfb69ecc2ec0cd32df44fda57c3
2021-09-03 13:55:27 -07:00
Zuul
8ea1a438d3 Merge "Allow initial versions to not be created yet" 2021-09-03 17:26:11 +00:00
Zuul
5d79347f97 Merge "Fix upgrade logic to allow for bundled changes" 2021-09-03 13:20:09 +00:00
Zuul
a7e0ef8ee9 Merge "Improve edge-case debugging for deployment and cleaning" 2021-09-03 11:10:54 +00:00
Iury Gregory Melo Ferreira
b7ad3f51a1 Fix typo and add subscription docs
This commit fix a typo in the destination parameter
in tests and in doc, also including documentation
abou the subscription methods available via
vendor passthru.

Change-Id: Ifa82562d2ce8f34bd90dc5897a3b83fe9e8eb88b
2021-09-02 22:02:09 +02:00
Dmitry Tantsur
f6781aadfc Improve edge-case debugging for deployment and cleaning
* Log traceback in fail_on_error
  This is a last-resort error handler, it needs to log traceback.
* Use an assertion when we expect a present list of steps
* Log the freshly build list of steps.

Change-Id: I8cd4cd330551b7bc9a44957e0d15c8b75c09c299
2021-09-02 15:54:45 +02:00
Zuul
df92f8089e Merge "Revert "Allow reboot to hard disk following iso ramdisk deploy."" 2021-09-02 12:33:57 +00:00
Zhou Hao
28be9684cb update irmc document
Update the documentation of IRMC according to the changes of python-scciclient v0.10.0

Signed-off-by: Zhou Hao <zhouhao@fujitsu.com>
Change-Id: Ic4c359a1b489fb1e6f37759013b66f7eadaf0765
2021-09-01 14:14:21 +08:00
Aija Jauntēva
c4a538c761 Redfish RAID: Use identity instead of durable_name
- map disks to `Storage`, not `StorageControllersListField`
to have access to `Storage` fields, including
`identity` and possibly `storage_controllers`
- durable_name is not present on all controllers,
for example, BOSS-S1
- in known supported BMC, iDRAC, in target RAID config
user would specify identity such as 'RAID.Integrated.1-1'
instead of durable name such as '54cd98f0c3648800'
- data structure of controller.identifiers make it harder
to get Storage resource by durable_name

This change is not user visible and does not impact
current code as values set are not used for anything yet.

Change-Id: I5fa47c5ee4691a70974caafbc262d62790c03382
2021-08-31 13:35:56 +00:00
Zuul
c57daacf62 Merge "Add iDRAC configuration mold docs" 2021-08-31 12:26:21 +00:00
Zuul
d767f8fe9c Merge "Remove manager param for iDRAC OEM calls" 2021-08-31 11:08:17 +00:00
Jacob Anders
c694c76d7f Split node verification code out of manager.py
Splitting code specific to node verification from manager.py into
verify.py (as well as test_verify.py for tests). This is done in
preparation for adding support for verify steps.

Story: 2009025
Task: 43137
Change-Id: I22a9bd7ceac3dfd65f20e52cbacff4b9d3998c64
2021-08-31 15:37:12 +10:00
Zuul
a3b50a9863 Merge "Minor formatting and doc changes to change boot mode feature commit." 2021-08-30 16:51:54 +00:00
Julia Kreger
0fe0122466 Revert "Allow reboot to hard disk following iso ramdisk deploy."
This reverts commit 3bad548ce3d34207aa70fde8342082fe243bee31
based upon the comments on https://review.opendev.org/c/openstack/ironic/+/801343 which
was the stable/wallaby backport for this change
where Derek points out that the change was redundant and
Dmitry asks if the patch should be reverted.

This change is a partial revert where the documentation
changed in the original commit was actually correct all
along.

Change-Id: I4bc9dbeb334f176b7c8ed4e4b5ec36affc17e9cd
2021-08-30 06:20:57 -07:00