ironic

Author	SHA1	Message	Date
Dmitry Tantsur	219bf0c373	Prepare release 16.1 Change-Id: Ia37075d4aa2f39ca0862b03ca02c85bac17400e5	2020-12-14 13:29:36 +01:00
Zuul	eeda309bbd	Merge "Add secure boot support to ilo-uefi-https"	2020-12-14 04:49:49 +00:00
vmud213	681940c8f0	Add secure boot support to ilo-uefi-https Adds secure boot support to ilo-uefi-https boot interface. Change-Id: I1d08b88496764bbee5cf0a1d306eb7be31d0d373 Story: #2008258 Task: #41114	2020-11-26 08:46:01 +00:00
Zuul	a08da8551a	Merge "Add vendor_passthru method for virtual media"	2020-11-25 17:02:34 +00:00
Zuul	4cc375a747	Merge "Allow disabling automated_clean per node"	2020-11-25 12:42:49 +00:00
Zuul	0ac45a8d9d	Merge "Always retry locking when performing task handoff"	2020-11-25 09:16:12 +00:00
Feruzjon Muyassarov	ee6119e774	Allow disabling automated_clean per node This allows users to disable automated cleaning on Node level. Story: #2008113 Task: #40829 Change-Id: If583bae4108b9bfa99cc460509af84696c7003c5	2020-11-24 17:23:13 +00:00
Jason Anderson	bfc2ad56d5	Always retry locking when performing task handoff There are some Ironic execution workflows where there is not an easy way to retry, such as when attempting to hand off the processing of an async task to a conductor. Task handoff can require releasing a lock on the node, so the next entity processing the task can acquire the lock itself. However, this is vulnerable to race conditions, as there is no uniform retry mechanism built in to such handoffs. Consider the continue_node_deploy/clean logic, which does this: method = 'continue_node_%s' % operation # Need to release the lock to let the conductor take it task.release_resources() getattr(rpc, method)(task.context, uuid, topic=topic If another process obtains a lock between the releasing of resources and the acquiring of the lock during the continue_node_* operation, and holds the lock longer than the max attempt * interval window (which defaults to 3 seconds), then the handoff will never complete. Beyond that, because there is no proper queue for processes waiting on the lock, there is no fairness, so it's also possible that instead of one long lock being held, the lock is obtained and held for a short window several times by other competing processes. This manifests as nodes occasionally getting stuck in the "DEPLOYING" state during a deploy. For example, a user may attempt to open or access the serial console before the deploy is complete--the serial console process obtains a lock and starves the conductor of the lock, so the conductor cannot finish the deploy. It's also possible a long heartbeat or badly-timed sequence of heartbeats could do the same. To fix this, this commit introduces the concept of a "patient" lock, which will retry indefinitely until it doesn't encounter the NodeLocked exception. This overrides any retry behavior. .. note:: There may be other cases where such a lock is desired. Story: #2008323 Change-Id: I9937fab18a50111ec56a3fd023cdb9d510a1e990	2020-11-24 09:41:38 -06:00
Bob Fournier	98958cd0a4	Add vendor_passthru method for virtual media Add a vendor_passthru method to eject_vmedia for Redfish and idrac. Story: 2008363 Task: 41271 Change-Id: Ib5ae16bacfd79f479a9aa8fbf69edc5cfdf73ce3	2020-11-24 09:25:44 -05:00
Zuul	3db362e5aa	Merge "Fix disk label to account for UEFI"	2020-11-19 23:34:27 +00:00
Julia Kreger	ed4abbd519	Fix disk label to account for UEFI Previously disk labels would not be populated if not explicitly set by an API user, which lead to a dangerous possible case, which sometimes could work, but was ultimately wrong to setup a UEFI booting machine with a BIOS MBR partition table. Not all systems support this, but UEFI systems are supposed to support GPT partition tables. We now fallback if no explicit override is set and assume GPT if the machine is set to UEFI mode. Change-Id: I001d8c6ee3b1d6c466c71ea5179bdbca9bdd692d	2020-11-18 03:10:27 +00:00
Zuul	385aeb0143	Merge "Limit the default value of [api]api_workers to 4"	2020-11-18 03:06:29 +00:00
Zuul	0f3c63fde2	Merge "Simplify injecting network data into an ISO image"	2020-11-13 09:46:17 +00:00
Zuul	1cdf582d83	Merge "Fix incorrect network_data.json location"	2020-11-13 07:09:34 +00:00
Zuul	44027a1175	Merge "Remove root device hint after delete_configuration"	2020-11-12 23:41:19 +00:00
Zuul	029b875be1	Merge "Fixes the issue that instance bond port can't get IP address"	2020-11-12 23:41:14 +00:00
Zuul	b5932bc6bf	Merge "Fix idrac-wsman RAID step async error handling"	2020-11-12 19:08:59 +00:00
Zuul	887f263c26	Merge "Retrieve BIOS configuration when moving node to ``manageable``"	2020-11-12 18:29:56 +00:00
Zuul	e85c04ba3e	Merge "Fix DHCP-less operations with the noop network interface"	2020-11-11 19:46:09 +00:00
Dmitry Tantsur	d48479b52d	Simplify injecting network data into an ISO image Currently we're building a VFAT image with the network data just to unpack it back on the next step. Just pass the file directly. This fixes a permission denied problem on Bifrost on Fedora (at least). As a nice side effect, the change reduces the amount of IO done for virtual media quite substantially. Change-Id: I5499fa42c1d82a1a29099fbbba6f45d440448b72	2020-11-11 12:20:20 +01:00
Dmitry Tantsur	3fd513ee1c	Fix incorrect network_data.json location There is no metadata subdirectory, it goes right into openstack/latest. Change-Id: I576c3c85515970262b5e7480913ff7daefa1b539	2020-11-11 12:17:36 +01:00
Zuul	d16daa3b52	Merge "Fix redfish BIOS apply config error handling"	2020-11-10 21:27:40 +00:00
Bob Fournier	9b18336f76	Retrieve BIOS configuration when moving node to ``manageable`` When moving the node to ``manageable``, in addition to ``cleaning``, retrieve the BIOS configuration settings. In the case of ``manageable``, this may allow the settings to be used when choosing which node to deploy. Change-Id: Ic2b162f31d4a1465fcb61671e7f48b3d31de788c Story: 2008326 Task: 41224	2020-11-10 14:57:20 -05:00
Dmitry Tantsur	2e5d01d48d	Fix DHCP-less operations with the noop network interface The base implementation of get_node_network_data returns {} and is not overridden in the noop network. Update the base implementation to use task.node.network_data and remove the excessive logging. Change-Id: Ie50dcd1c2a151f5dd09794467792527032249809	2020-11-10 18:53:32 +01:00
Kaifeng Wang	fe01ddb2bc	Fixes the issue that instance bond port can't get IP address The issue is that when a port group doesn't have a mac address assigned by operators, and during provisioning we unbind/bind tenant port with None which causes the mac address to be regenerated twice and differs from the originally one allocated by nova or users which was packed into config drive. The end result is that, bond port has different mac address configured and can't the IP address from neutron. Change-Id: I92ed5d17239216324d6a69e0ed8771fd6948d6ec Story: 2008300 Task: 41185	2020-11-10 21:11:18 +08:00
Zuul	08bf8dee65	Merge "Add node name to ironic-conductor ramdisk log filename"	2020-11-03 15:57:31 +00:00
Dmitry Tantsur	c9c492725e	Limit the default value of [api]api_workers to 4 Each ironic-api process consumes non-negligible amount of RAM, defaulting to CPU core count may result in many hundres of megabytes occupied by ironic-api processes. Limit the default value to 4 and let people who actually need more than that pick their value. Change-Id: I5aefa8c6c7aadc56aea151647e1c0a5af54ada4c	2020-11-03 16:33:14 +01:00
Aija Jauntēva	23951f4b44	Fix idrac-wsman RAID step async error handling Instead of using process_event('fail') use error_handlers, otherwise in case of failure node gets stuck and fails because of timeout, instead of failing earlier due to step failure. Story: 2008307 Task: 41194 Change-Id: Ieec0173f57367587985d2baad77205bb83e8b69a	2020-11-02 12:56:29 -05:00
Aija Jauntēva	70b7ca345f	Fix redfish BIOS apply config error handling Instead of using process_event('fail') use error_handlers, otherwise in case of failure node gets stuck and fails because of timeout, instead of failing earlier due to step failure. Besides adding new unit tests, also update related unit tests to test for success correctly and have realistic data. Story: 2008307 Task: 41196 Change-Id: If28ccb252a87610e3fd3dc78e1ed75bb8ca1cdcf	2020-11-02 12:55:26 -05:00
Zuul	7ea6e41b26	Merge "Prevent timeouts when using fast-track with redfish-virtual-media"	2020-11-02 13:59:05 +00:00
Zuul	31277f2c95	Merge "Handle agent still doing the prior command"	2020-10-30 16:10:10 +00:00
Dmitry Tantsur	551ca9c8f7	Prevent timeouts when using fast-track with redfish-virtual-media Calling prepare_ramdisk may break fast-track, as it's the case with redfish-virtual-media (it powers nodes off unconditionally). To avoid timeouts, check fast-track status again after prepare_ramdisk. Change-Id: Iad2d6f4827bd7e8b2a02005fe18d31ec8d37db97	2020-10-30 16:41:01 +01:00
Julia Kreger	545dc2106b	Handle agent still doing the prior command The agent command exec model is based upon an incoming heartbeat, however heartbeats are independent and commands can take a long time. For example, software RAID setup in CI can encounter this. From an IPA log: [-] Picked root device /dev/md0 for node c6ca0af2-baec-40d6-879d-cbb5c751aafb based on root device hints {'name': '/dev/md0'} [-] Attempting to download image from http://199.204.45.248:3928/agent_images/ c6ca0af2-baec-40d6-879d-cbb5c751aafb [-] Executing command: standby.get_partition_uuids with args: {} execute_command /usr/local/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255 [-] Tried to execute standby.get_partition_uuids, agent is still executing Command name: execute_deploy_step, params: {'step': {'interface': 'deploy', 'step': 'write_image', 'args': {'image_info': {'id': 'cb9e199a-af1b-4a6f-b00e-f284008b8046', 'urls': ['http://199.204.45.248:3928/agent_images/c6ca0af2-baec-40d6-879d-cbb5c751aafb'], 'disk_format': 'raw', 'container_format': 'bare', 'stream_raw_images': True, 'os_hash_algo': 'sha512', 'os_hash_value':<trimed> This was with code built on master, using master images. Inside the conductor log, it notes that it is likely an out of date agent because only AgentAPIError is evaluated, however any API error is evaluated this way. In reality, we need to explicitly flag when we have an error that is because we've tried to soon as something is already being worked upon. The result, is to evaluate and return an exception indicating work is already in flight. Update - It looks like, the original fix to prevent busy agent recognition did not fully detect all cases as getting steps is a command which can get skipped by accident with a busy agent, under certain circumstances. Change I5d86878b5ed6142ed2630adee78c0867c49b663f in ironic-python-agent also changed the string that was being checked for the previous handling, where we really should have just made the string we were checking lower case in ironic. Oh well! This should fix things right up. Story: 2008167 Task: 41175 Change-Id: Ia169640b7084d17d26f22e457c7af512db6d21d6	2020-10-29 14:58:34 -07:00
Dmitry Tantsur	2dfb3f5eca	Make redfish-virtual-media respect default_boot_mode Change-Id: I46c865ba1cc05a60aa9703f0b35247b62ad4235a	2020-10-29 17:44:00 +01:00
Zuul	7155ec7ef9	Merge "Add timeout to image operations in the direct deploy"	2020-10-29 12:19:59 +00:00
Zuul	90536bdbbc	Merge "json-rpc: surround IPv6 address with [] in conductor URL"	2020-10-29 07:49:52 +00:00
Zuul	dc9cd24bc6	Merge "Sync boot mode when changing the boot device via Redfish"	2020-10-28 10:11:16 +00:00
Zane Bitter	85b4892886	json-rpc: surround IPv6 address with [] in conductor URL If the conductor's host option is configured to be an IPv6 address, we need to surround it with [] when incorporating it in a URL to contact the conductor over json-rpc. Change-Id: Ib3bc4c570ec0f2e5c73e3ce15b05684b8e4c1ff9 Story: 2008288 Task: 41166	2020-10-27 11:22:12 -04:00
Kaifeng Wang	91d6426b06	Fixes empty physical_network is not guarded An empty physical_network can be set to port and make the port unusable. Change-Id: I58cf04839f40922cf0c7ddffc08b843cb3c50e06 Story: 2008279 Task: 41153	2020-10-24 22:05:36 +08:00
dujinxiu	da4c583ea9	Add node name to ironic-conductor ramdisk log filename Change-Id: Ide28c16806909f1bbf93bf7c72b5cec6f8ddc260 Story: #2008281 Task: #41155	2020-10-24 16:01:52 +08:00
Dmitry Tantsur	fe37fb6d5d	Add timeout to image operations in the direct deploy Currently they may hang when the remote server is not responding. Change-Id: I1de17fed3b43a3d16795dc614ce76e2cfe1faca0	2020-10-22 13:16:47 +02:00
Dmitry Tantsur	0a68622187	Allow passing rootfs_uuid for the standalone case Using software RAID with whole disk images requires specifying a root partition UUID, but it is only possible through Glance. This change adds an explicit field for that. Change-Id: I55e3727aab3960ef472ec2db1f23c25db405e801	2020-10-20 18:22:25 +02:00
Bob Fournier	685131fd36	Sync boot mode when changing the boot device via Redfish After changing the boot device via Redfish, check that the boot mode being reported matches what is configured and, if not, set it to the configured value. Some BMCs change the boot mode when the device is set via Redfish, this will ensure the mode is set properly. Change-Id: Ib077f7f32de029833e6bd936853c382305bce36e Story: 2008252 Task: 41103	2020-10-19 14:34:44 -04:00
Zuul	a29417f46f	Merge "Fix ipmitool timing argument calculation"	2020-10-19 07:17:02 +00:00
Steve Baker	1de3db3b16	Fix ipmitool timing argument calculation Calculating the ipmitool `-N` and `-R` arguments from ironic.conf [ipmi] `command_retry_timeout` and `min_command_interval` now takes into account the 1 second interval increment that ipmitool adds on each retry event. Failure-path ipmitool run duration will now be just less than `command_retry_timeout` instead of much longer. Change-Id: Ia3d8d85497651290c62341ac121e2aa438b4ac50	2020-10-14 19:33:50 +00:00
Dmitry Tantsur	7a89ddcf0c	Do not pass BOOTIF=None if no BOOTIF can be guessed It breaks inspection with the default add_ports=pxe. Change-Id: I730b4bbd48e7188148669670fdb742b88a62f820	2020-10-13 15:16:43 +02:00
Kaifeng Wang	6a34d47829	Remove root device hint after delete_configuration The root device hint is not guaranteed to be valid after raid configuration in most cases, this could cause no matching device found and fail the deployment. AgentRAID implementations can return the correct root device hint from the create_configuration. Change-Id: Iab97a16ef8ccea8186f0cc7a14b77d508804fc8d	2020-10-11 21:52:43 +08:00
Dmitry Tantsur	e39858dd8c	Wiping agent tokens on reboot via API - take 2 Because of using an incorrect variable, reboot was treated as power on, and the token was not wiped. Change-Id: I656450c2bedc3dc0d20a70de78cc29bf64d5fe85 Story: #2008097 Task: #40799	2020-10-05 17:36:45 +02:00
Iury Gregory Melo Ferreira	db55700384	Fix inspection for idrac This commit fix an issue to inspect nodes using idrac when using redfish virtual media. We are using the redfish configuration so it can be backported. Change-Id: I478c25fac13b49867349c2d9fc8d206c9994c398 Story: #2008221 Task: #41010	2020-10-02 17:49:16 +02:00
Mudit	101fc29686	Add GPU reporting to idrac-wsman inspect interface This patch implements reporting number of NVIDIA Tesla T4 devices connected to a system by discovering such devices and reporting them through capability 'pci_gpu_devices'. Change-Id: If713895f05f08a9827c4c085108abb3e388b2a2e Story: 2008118 Task: 40839 Depends-On: https://review.opendev.org/#/c/750364/	2020-09-30 18:33:53 -04:00

1 2 3 4 5 ...

1779 Commits