diff --git a/doc/source/deploy/cleaning.rst b/doc/source/deploy/cleaning.rst index ad61168d60..0c99affd5d 100644 --- a/doc/source/deploy/cleaning.rst +++ b/doc/source/deploy/cleaning.rst @@ -6,33 +6,181 @@ Node cleaning Overview ======== +Ironic provides two modes for node cleaning: ``automated`` and ``manual``. + +``Automated cleaning`` is automatically performed before the first +workload has been assigned to a node and when hardware is recycled from +one workload to another. + +``Manual cleaning`` must be invoked by the operator. + + +.. _automated_cleaning: + +Automated cleaning +================== + When hardware is recycled from one workload to another, ironic performs -cleaning on the node to ensure it's ready for another workload. This ensures -the tenant will get a consistent bare metal node deployed every time. +automated cleaning on the node to ensure it's ready for another workload. This +ensures the tenant will get a consistent bare metal node deployed every time. -Ironic implements cleaning by collecting a list of steps to perform on a node -from each Power, Deploy, and Management driver assigned to the node. These -steps are then arranged by priority and executed on the node when it is moved -to cleaning state, if cleaning is enabled. +Ironic implements automated cleaning by collecting a list of cleaning steps +to perform on a node from the Power, Deploy, Management, and RAID interfaces +of the driver assigned to the node. These steps are then ordered by priority +and executed on the node when the node is moved +to ``cleaning`` state, if automated cleaning is enabled. -Typically, nodes move to cleaning state when moving from active -> available. -Nodes also traverse cleaning when going from manageable -> available. For a -full understanding of all state transitions into cleaning, please see -:ref:`states`. +With automated cleaning, nodes move to ``cleaning`` state when moving from +``active`` -> ``available`` state (when the hardware is recycled from one +workload to another). Nodes also traverse cleaning when going from +``manageable`` -> ``available`` state (before the first workload is +assigned to the nodes). For a full understanding of all state transitions +into cleaning, please see :ref:`states`. -Ironic added support for cleaning nodes in the Kilo release. +Ironic added support for automated cleaning in the Kilo release. +.. _enabling-cleaning: -Enabling cleaning -================= -To enable cleaning, ensure your ironic.conf is set as follows: :: +Enabling automated cleaning +--------------------------- +To enable automated cleaning, ensure that your ironic.conf is set as follows. +(Prior to Mitaka, this option was named 'clean_nodes'.):: [conductor] automated_clean=true -This will enable the default set of steps, based on your hardware and ironic -drivers. If you're using an agent_* driver, this includes, by default, erasing -all of the previous tenant's data. +This will enable the default set of cleaning steps, based on your hardware and +ironic drivers. If you're using an agent_* driver, this includes, by default, +erasing all of the previous tenant's data. + +You may also need to configure a `Cleaning Network`_. + +Cleaning steps +-------------- + +Cleaning steps used for automated cleaning are ordered from higher to lower +priority, where a larger integer is a higher priority. In case of a conflict +between priorities across drivers, the following resolution order is used: +Power, Management, Deploy, and RAID interfaces. + +You can skip a cleaning step by setting the priority for that cleaning step +to zero or 'None'. + +You can reorder the cleaning steps by modifying the integer priorities of the +cleaning steps. + +See `How do I change the priority of a cleaning step?`_ for more information. + +Manual cleaning +=============== + +``Manual cleaning`` is typically used to handle long running, manual, or +destructive tasks that an operator wishes to perform either before the first +workload has been assigned to a node or between workloads. When initiating a +manual clean, the operator specifies the cleaning steps to be performed. +Manual cleaning can only be performed when a node is in the ``manageable`` +state. Once the manual cleaning is finished, the node will be put in the +``manageable`` state again. + +Ironic added support for manual cleaning in the 4.4 (Mitaka series) +release. + +Setup +----- + +In order for manual cleaning to work, you may need to configure a +`Cleaning Network`_. + +Starting manual cleaning via API +-------------------------------- + +Manual cleaning can only be performed when a node is in the ``manageable`` +state. The REST API request to initiate it is available in API version 1.15 and +higher:: + + PUT /v1/nodes//states/provision + +(Additional information is available `here `_.) + +This API will allow operators to put a node directly into ``cleaning`` +provision state from ``manageable`` state via 'target': 'clean'. +The PUT will also require the argument 'clean_steps' to be specified. This +is an ordered list of cleaning steps. A cleaning step is represented by a +dictionary (JSON), in the form:: + + { + 'interface': , + 'step': , + 'args': {: , ..., : } + } + +The 'interface' and 'step' keys are required for all steps. If a cleaning step +method takes keyword arguments, the 'args' key may be specified. It +is a dictionary of keyword variable arguments, with each keyword-argument entry +being : . + +If any step is missing a required keyword argument, manual cleaning will not be +performed and the node will be put in ``clean failed`` provision state with an +appropriate error message. + +If, during the cleaning process, a cleaning step determines that it has +incorrect keyword arguments, all earlier steps will be performed and then the +node will be put in ``clean failed`` provision state with an appropriate error +message. + +An example of the request body for this API:: + + { + "target":"clean", + "clean_steps": [{ + "interface": "raid", + "step": "create_configuration", + "args": {"create_nonroot_volumes": "False"} + }, + { + "interface": "deploy", + "step": "erase_devices" + }] + } + +In the above example, the driver's RAID interface would configure hardware +RAID without non-root volumes, and then all devices would be erased +(in that order). + +Starting manual cleaning via ``ironic`` CLI +------------------------------------------- + +Manual cleaning is supported in the ``ironic node-set-provision-state`` +command, starting with python-ironicclient 1.2. + +The target/verb is 'clean' and the argument 'clean-steps' must be specified. +Its value is one of: + +- a JSON string +- path to a JSON file whose contents are passed to the API +- '-', to read from stdin. This allows piping in the clean steps. + Using '-' to signify stdin is common in Unix utilities. + +Keep in mind that manual cleaning is only supported in API version 1.15 and +higher. + +An example of doing this with a JSON string:: + + ironic --ironic-api-version 1.15 node-set-provision-state \ + clean --clean-steps '{"clean_steps": [...]}' + +Or with a file:: + + ironic --ironic-api-version 1.15 node-set-provision-state \ + clean --clean-steps my-clean-steps.txt + +Or with stdin:: + + cat my-clean-steps.txt | ironic --ironic-api-version 1.15 \ + node-set-provision-state clean --clean-steps - + +Cleaning Network +================ If you are using the Neutron DHCP provider (the default) you will also need to ensure you have configured a cleaning network. This network will be used to @@ -73,38 +221,48 @@ FAQ How are cleaning steps ordered? ------------------------------- -Cleaning steps are ordered by integer priority, where a larger integer is a -higher priority. In case of a conflict between priorities across drivers, -the following resolution order is used: Power, Management, Deploy. +For automated cleaning, cleaning steps are ordered by integer priority, where +a larger integer is a higher priority. In case of a conflict between priorities +across drivers, the following resolution order is used: Power, Management, +Deploy, and RAID interfaces. + +For manual cleaning, the cleaning steps should be specified in the desired +order. How do I skip a cleaning step? ------------------------------ -Cleaning steps with a priority of 0 or None are skipped. +For automated cleaning, cleaning steps with a priority of 0 or None are skipped. + How do I change the priority of a cleaning step? ------------------------------------------------ +For manual cleaning, specify the cleaning steps in the desired order. + +For automated cleaning, it depends on whether the cleaning steps are +out-of-band or in-band. + Most out-of-band cleaning steps have an explicit configuration option for priority. Changing the priority of an in-band (ironic-python-agent) cleaning step -currently requires use of a custom HardwareManager. The only exception is -erase_devices, which can have its priority set in ironic.conf. For instance, -to disable erase_devices, you'd use the following config:: +requires use of a custom HardwareManager. The only exception is +``erase_devices``, which can have its priority set in ironic.conf. For instance, +to disable erase_devices, you'd set the following configuration option:: [deploy] erase_devices_priority=0 To enable/disable the in-band disk erase using ``agent_ilo`` driver, use the -following config:: +following configuration option:: [ilo] clean_priority_erase_devices=0 -Generic hardware manager first tries to perform ATA disk erase by using +The generic hardware manager first tries to perform ATA disk erase by using ``hdparm`` utility. If ATA disk erase is not supported, it performs software based disk erase using ``shred`` utility. By default, the number of iterations performed by ``shred`` for software based disk erase is 1. To configure -the number of iterations, use the following config:: +the number of iterations, use the following configuration option:: [deploy] erase_devices_iterations=1 @@ -115,14 +273,14 @@ What cleaning step is running? To check what cleaning step the node is performing or attempted to perform and failed, either query the node endpoint for the node or run ``ironic node-show $node_ident`` and look in the `internal_driver_info` field. The `clean_steps` -field will contain a list of all remaining steps with their priority, and the +field will contain a list of all remaining steps with their priorities, and the first one listed is the step currently in progress or that the node failed -before going into cleanfail state. +before going into ``clean failed`` state. -Should I disable cleaning? --------------------------- -Cleaning is recommended for ironic deployments, however, there are some -tradeoffs to having it enabled. For instance, ironic cannot deploy a new +Should I disable automated cleaning? +------------------------------------ +Automated cleaning is recommended for ironic deployments, however, there are +some tradeoffs to having it enabled. For instance, ironic cannot deploy a new instance to a node that is currently cleaning, and cleaning can be a time consuming process. To mitigate this, we suggest using disks with support for cryptographic ATA Security Erase, as typically the erase_devices step in the @@ -138,17 +296,18 @@ cleaning. Troubleshooting =============== -If cleaning fails on a node, the node will be put into cleanfail state and -placed in maintenance mode, to prevent ironic from taking actions on the +If cleaning fails on a node, the node will be put into ``clean failed`` state +and placed in maintenance mode, to prevent ironic from taking actions on the node. -Nodes in cleanfail will not be powered off, as the node might be in a state -such that powering it off could damage the node or remove useful information -about the nature of the cleaning failure. +Nodes in ``clean failed`` will not be powered off, as the node might be in a +state such that powering it off could damage the node or remove useful +information about the nature of the cleaning failure. -A cleanfail node can be moved to manageable state, where they cannot be -scheduled by nova and you can safely attempt to fix the node. To move a node -from cleanfail to manageable: ``ironic node-set-provision-state manage``. +A ``clean failed`` node can be moved to ``manageable`` state, where it cannot +be scheduled by nova and you can safely attempt to fix the node. To move a node +from ``clean failed`` to ``manageable``: +``ironic node-set-provision-state manage``. You can now take actions on the node, such as replacing a bad disk drive. Strategies for determining why a cleaning step failed include checking the @@ -156,8 +315,8 @@ ironic conductor logs, viewing logs on the still-running ironic-python-agent (if an in-band step failed), or performing general hardware troubleshooting on the node. -When the node is repaired, you can move the node back to available state, to -allow it to be scheduled by nova. +When the node is repaired, you can move the node back to ``available`` state, +to allow it to be scheduled by nova. :: @@ -167,5 +326,5 @@ allow it to be scheduled by nova. # Now, make the node available for scheduling by nova ironic node-set-provision-state $node_ident provide -The node will begin cleaning from the start, and move to available state -when complete. +The node will begin automated cleaning from the start, and move to +``available`` state when complete. diff --git a/doc/source/deploy/install-guide.rst b/doc/source/deploy/install-guide.rst index 85fa51fdf2..b1e5cc9b25 100644 --- a/doc/source/deploy/install-guide.rst +++ b/doc/source/deploy/install-guide.rst @@ -658,9 +658,8 @@ Configure the Bare Metal service for cleaning [neutron] ... - # UUID of the network to create Neutron ports on when booting - # to a ramdisk for cleaning/zapping using Neutron DHCP (string - # value) + # UUID of the network to create Neutron ports on, when booting + # to a ramdisk for cleaning using Neutron DHCP. (string value) #cleaning_network_uuid= cleaning_network_uuid = NETWORK_UUID @@ -1731,7 +1730,7 @@ To move a node from ``enroll`` to ``manageable`` provision state:: +------------------------+--------------------------------------------------------------------+ When a node is moved from the ``manageable`` to ``available`` provision -state, the node will be cleaned if configured to do so (see +state, the node will go through automated cleaning if configured to do so (see :ref:`CleaningNetworkSetup`). To move a node from ``manageable`` to ``available`` provision state:: diff --git a/doc/source/deploy/upgrade-guide.rst b/doc/source/deploy/upgrade-guide.rst index f114a058eb..532cbadf68 100644 --- a/doc/source/deploy/upgrade-guide.rst +++ b/doc/source/deploy/upgrade-guide.rst @@ -84,13 +84,13 @@ upgrade has completed. Cleaning -------- -A new feature in Kilo is support for the cleaning of nodes between workloads to -ensure the node is ready for another workload. This can include erasing the -hard drives, updating firmware, and other steps. For more information, see -:ref:`cleaning`. +A new feature in Kilo is support for the automated cleaning of nodes between +workloads to ensure the node is ready for another workload. This can include +erasing the hard drives, updating firmware, and other steps. For more +information, see :ref:`automated_cleaning`. -If Ironic is configured with cleaning enabled (defaults to True) and to use -Neutron as the DHCP provider (also the default), you will need to set the +If Ironic is configured with automated cleaning enabled (defaults to True) and +to use Neutron as the DHCP provider (also the default), you will need to set the `cleaning_network_uuid` option in the Ironic configuration file before starting the Kilo Ironic service. See :ref:`CleaningNetworkSetup` for information on how to set up the cleaning network for Ironic. diff --git a/doc/source/drivers/wol.rst b/doc/source/drivers/wol.rst index 2ea9701d3a..6e3c67ef62 100644 --- a/doc/source/drivers/wol.rst +++ b/doc/source/drivers/wol.rst @@ -114,7 +114,7 @@ Additional requirements * BIOS must try next boot device if PXE boot failed -* Cleaning should be disabled, see :ref:`cleaning` +* Automated cleaning should be disabled, see :ref:`automated_cleaning` * Node should be powered off before start of deploy diff --git a/doc/source/webapi/v1.rst b/doc/source/webapi/v1.rst index 6be4810145..6a3d13c6c9 100644 --- a/doc/source/webapi/v1.rst +++ b/doc/source/webapi/v1.rst @@ -61,10 +61,10 @@ API Versions History Newly registered nodes begin in the ``enroll`` provision state by default, instead of ``available``. To get them to the ``available`` state, - the ``manage`` action must first be ran, to verify basic hardware control. - On success the node moves to ``manageable`` provision state, then the - ``provide`` action must be run, which will clean the node and - make it available. + the ``manage`` action must first be run to verify basic hardware control. + On success the node moves to ``manageable`` provision state. Then the + ``provide`` action must be run. Automated cleaning of the node is done and + the node is made ``available``. **1.10** diff --git a/releasenotes/notes/manual-clean-4cc2437be1aea69a.yaml b/releasenotes/notes/manual-clean-4cc2437be1aea69a.yaml index 370c62ecc1..d14e1920dd 100644 --- a/releasenotes/notes/manual-clean-4cc2437be1aea69a.yaml +++ b/releasenotes/notes/manual-clean-4cc2437be1aea69a.yaml @@ -2,4 +2,4 @@ features: - Adds support for manual cleaning. This is available with API version 1.15. For more information, see - http://specs.openstack.org/openstack/ironic-specs/specs/approved/manual-cleaning.html + http://docs.openstack.org/developer/ironic/deploy/cleaning.html#manual-cleaning