Adds API for retrieving node history events
via a node. Includes pagination and limitation
of the response set.
Story: 2002980
Tas: 42961
Change-Id: I22a92fa6c30d721f6a5dd0670b2e0a9cf76ad7b1
set_power_state has returned to the caller immediately without
confirming the system has reached the requested state. This fixes that
by synchronously waiting until the target state has been read before
returning.
That bug can cause instance workload deployments to fail on Dell EMC
PowerEdge server models on which IPA ramdisk soft power off fails and
ironic employs its OOB fallback strategy. After an otherwise successful
deployment, the node is active, but is powered off. No error is reported
in last_error. If the subsequent instance workflow expects the system to
be powered on into the operating system, it fails.
Story: 2009204
Task: 43261
Change-Id: I3112a22149c07e5508f26c79f33d09aeb905c308
Previously, a pattern of periodic tasks was created where
nodes, and in many cases, all nodes not actively locked nor
those in maintenance state, were pulled in by a periodic task.
These periodic tasks would then create tasks which generated
additional database queries in order to populate the task object.
With the task object populated, the driver would then evaluate
if the driver in question was for the the driver interface in
question and *then* evaluate if work had to be performed.
However, that field containing a pointer to if work needed to be
performed as often already queried from the database on the
very initial query to generate the list of nodes to evaluate.
In essence, we've moved this up in the sequence so we evaluate
that field in question prior to creating the task, potentially
across every conductor, depending on the query, and ultimately
which drivers are enabled.
This saves potentially saves hundreds of thousands of needless
database queries on a medium size deployment per single day,
depending on which drivers and driver interfaces are in use.
Change-Id: I409e87de2808d442d39e4d0ae6e995668230cbba
Older iDRACs delete the task after 1 minute, since 5.00.00.00
the task is being kept for 10 minutes.
However, if encountering the issue, handle it and advise
user to either upgrade iDRAC if not already or decrease
checking interval.
Prior this node got stuck in wait mode forever if task was
deleted as raised exception by periodic didn't make the step
fail.
Change-Id: I5d500b7d53e9804aa3b54dc400d8621f40cd5d0c
* Adds periodic task to purge node_history entries based upon
provided configuration.
* Adds recording of node history entries for errors in the
core conductor code.
* Also changes the rescue abort behavior to remove the notice
from being recorded as an error, as this is a likely bug in
behavior for any process or service evaluating the node
last_error field.
* Makes use of a semi-free form event_type field to help
provide some additional context into what is going on and
why. For example if deployments are repeatedly failing,
then perhaps it is a configuration issue, as opposed to
a general failure. If a conductor has no resources, then
the failure, in theory would point back to the conductor
itself.
Story: 2002980
Task: 42960
Change-Id: Ibfa8ac4878cacd98a43dd4424f6d53021ad91166
This patch provides basic data model change to support node history.
Batch removal is not included in this patch.
Change-Id: I5c7cebd585ee84b5b57bd4690d4074baf0d05699
Story: 2002980
Task: 22989
In iDRAC import configuration task can be completed with OK health
but having some errors, for example, when one disk failed to be
created and another succeeded.
Also changed to exclude informational messages for error reporting.
Story: 2009198
Task: 43253
Change-Id: I02b63547566c94ffa1a5d0e84bd1b1f10d28bfc3
Currently we set parallel_image_downloads to False, which means that
all downloads that go through the image cache are serialized.
This change enables it by default and deprecates in favour of a new
more fine-grained mechanism: the new option image_download_concurrency
specifies how many downloads (and raw conversions) will run in parallel.
Update logging to trace how long each download takes.
Change-Id: I8b85afda295029f85e82143cf7d4bcb2316860f6
Currently we default to assuming the cache is up-to-date. This is likely
wrong. Normal web servers provide Last Modified for files they serve.
If it is absent, chances are high the image is served by some sort
of a dynamic service, which may modify the URL on fly.
In any case, always updating the image is a safer choice.
Change-Id: I0548db14a97638d26ebb687e8f47f1b295d1f774
Implements clean step "clear_ca_certificates" to remove any 3rd party
expired/revoked CA certificates from iLO.
Change-Id: I0a3c1da9b94e4037a53ade100354ac51ca08db35
Story: #2008784
Task: #42175
Instead of using the efi written by grub-mknetdir, use the packaged
signed binary. The core.efi generated by grub-mknetdir is not signed
so it does not help with end-to-end secure-boot.
Also, the successful run of
ironic-tempest-ipa-partition-uefi-pxe-grub2[1] demonstrates that grub
continues to boot even when the grub-mknetdir generated
grub/x86_64-efi/*.lst are missing. Avoiding using grub-mknetdir makes
for a much simpler setup of /tftpboot for grub network boot.
[1] https://zuul.opendev.org/t/openstack/build/bab62f6bf032474cb80af3cb5a999117/log/tftpd-journal.txt
Change-Id: Ide0aa416391c20371bbb8d1a18288b262872e313
We have implemented the cleaning prepare/tear_down, but haven't
implemented fetching/running in-band clean steps. This change moves
the cleaning logic from AgentDeployMixin to AgentBaseMixin, where it
arguably belongs.
In a follow-up patch I'm planning to reduce the number of mix-ins we
currently have, but that won't be backportable.
Change-Id: Ibc5610b14cea487d26191249e5c0333fdcd4b914
This commit fix a typo in the destination parameter
in tests and in doc, also including documentation
abou the subscription methods available via
vendor passthru.
Change-Id: Ifa82562d2ce8f34bd90dc5897a3b83fe9e8eb88b
* Log traceback in fail_on_error
This is a last-resort error handler, it needs to log traceback.
* Use an assertion when we expect a present list of steps
* Log the freshly build list of steps.
Change-Id: I8cd4cd330551b7bc9a44957e0d15c8b75c09c299
Update the documentation of IRMC according to the changes of python-scciclient v0.10.0
Signed-off-by: Zhou Hao <zhouhao@fujitsu.com>
Change-Id: Ic4c359a1b489fb1e6f37759013b66f7eadaf0765
- map disks to `Storage`, not `StorageControllersListField`
to have access to `Storage` fields, including
`identity` and possibly `storage_controllers`
- durable_name is not present on all controllers,
for example, BOSS-S1
- in known supported BMC, iDRAC, in target RAID config
user would specify identity such as 'RAID.Integrated.1-1'
instead of durable name such as '54cd98f0c3648800'
- data structure of controller.identifiers make it harder
to get Storage resource by durable_name
This change is not user visible and does not impact
current code as values set are not used for anything yet.
Change-Id: I5fa47c5ee4691a70974caafbc262d62790c03382
Splitting code specific to node verification from manager.py into
verify.py (as well as test_verify.py for tests). This is done in
preparation for adding support for verify steps.
Story: 2009025
Task: 43137
Change-Id: I22a9bd7ceac3dfd65f20e52cbacff4b9d3998c64
This reverts commit 3bad548ce3d34207aa70fde8342082fe243bee31
based upon the comments on https://review.opendev.org/c/openstack/ironic/+/801343 which
was the stable/wallaby backport for this change
where Derek points out that the change was redundant and
Dmitry asks if the patch should be reverted.
This change is a partial revert where the documentation
changed in the original commit was actually correct all
along.
Change-Id: I4bc9dbeb334f176b7c8ed4e4b5ec36affc17e9cd