Add manual cleaning to documentation
This updates the documentation to include manual cleaning. Change-Id: I8f91214911e8916c329c20a140e1d0957b1cc137 Partial-Bug: #1526290
This commit is contained in:
parent
a70b5365d3
commit
f2d9886f99
@ -6,33 +6,181 @@ Node cleaning
|
||||
|
||||
Overview
|
||||
========
|
||||
Ironic provides two modes for node cleaning: ``automated`` and ``manual``.
|
||||
|
||||
``Automated cleaning`` is automatically performed before the first
|
||||
workload has been assigned to a node and when hardware is recycled from
|
||||
one workload to another.
|
||||
|
||||
``Manual cleaning`` must be invoked by the operator.
|
||||
|
||||
|
||||
.. _automated_cleaning:
|
||||
|
||||
Automated cleaning
|
||||
==================
|
||||
|
||||
When hardware is recycled from one workload to another, ironic performs
|
||||
cleaning on the node to ensure it's ready for another workload. This ensures
|
||||
the tenant will get a consistent bare metal node deployed every time.
|
||||
automated cleaning on the node to ensure it's ready for another workload. This
|
||||
ensures the tenant will get a consistent bare metal node deployed every time.
|
||||
|
||||
Ironic implements cleaning by collecting a list of steps to perform on a node
|
||||
from each Power, Deploy, and Management driver assigned to the node. These
|
||||
steps are then arranged by priority and executed on the node when it is moved
|
||||
to cleaning state, if cleaning is enabled.
|
||||
Ironic implements automated cleaning by collecting a list of cleaning steps
|
||||
to perform on a node from the Power, Deploy, Management, and RAID interfaces
|
||||
of the driver assigned to the node. These steps are then ordered by priority
|
||||
and executed on the node when the node is moved
|
||||
to ``cleaning`` state, if automated cleaning is enabled.
|
||||
|
||||
Typically, nodes move to cleaning state when moving from active -> available.
|
||||
Nodes also traverse cleaning when going from manageable -> available. For a
|
||||
full understanding of all state transitions into cleaning, please see
|
||||
:ref:`states`.
|
||||
With automated cleaning, nodes move to ``cleaning`` state when moving from
|
||||
``active`` -> ``available`` state (when the hardware is recycled from one
|
||||
workload to another). Nodes also traverse cleaning when going from
|
||||
``manageable`` -> ``available`` state (before the first workload is
|
||||
assigned to the nodes). For a full understanding of all state transitions
|
||||
into cleaning, please see :ref:`states`.
|
||||
|
||||
Ironic added support for cleaning nodes in the Kilo release.
|
||||
Ironic added support for automated cleaning in the Kilo release.
|
||||
|
||||
.. _enabling-cleaning:
|
||||
|
||||
Enabling cleaning
|
||||
=================
|
||||
To enable cleaning, ensure your ironic.conf is set as follows: ::
|
||||
Enabling automated cleaning
|
||||
---------------------------
|
||||
To enable automated cleaning, ensure that your ironic.conf is set as follows.
|
||||
(Prior to Mitaka, this option was named 'clean_nodes'.)::
|
||||
|
||||
[conductor]
|
||||
automated_clean=true
|
||||
|
||||
This will enable the default set of steps, based on your hardware and ironic
|
||||
drivers. If you're using an agent_* driver, this includes, by default, erasing
|
||||
all of the previous tenant's data.
|
||||
This will enable the default set of cleaning steps, based on your hardware and
|
||||
ironic drivers. If you're using an agent_* driver, this includes, by default,
|
||||
erasing all of the previous tenant's data.
|
||||
|
||||
You may also need to configure a `Cleaning Network`_.
|
||||
|
||||
Cleaning steps
|
||||
--------------
|
||||
|
||||
Cleaning steps used for automated cleaning are ordered from higher to lower
|
||||
priority, where a larger integer is a higher priority. In case of a conflict
|
||||
between priorities across drivers, the following resolution order is used:
|
||||
Power, Management, Deploy, and RAID interfaces.
|
||||
|
||||
You can skip a cleaning step by setting the priority for that cleaning step
|
||||
to zero or 'None'.
|
||||
|
||||
You can reorder the cleaning steps by modifying the integer priorities of the
|
||||
cleaning steps.
|
||||
|
||||
See `How do I change the priority of a cleaning step?`_ for more information.
|
||||
|
||||
Manual cleaning
|
||||
===============
|
||||
|
||||
``Manual cleaning`` is typically used to handle long running, manual, or
|
||||
destructive tasks that an operator wishes to perform either before the first
|
||||
workload has been assigned to a node or between workloads. When initiating a
|
||||
manual clean, the operator specifies the cleaning steps to be performed.
|
||||
Manual cleaning can only be performed when a node is in the ``manageable``
|
||||
state. Once the manual cleaning is finished, the node will be put in the
|
||||
``manageable`` state again.
|
||||
|
||||
Ironic added support for manual cleaning in the 4.4 (Mitaka series)
|
||||
release.
|
||||
|
||||
Setup
|
||||
-----
|
||||
|
||||
In order for manual cleaning to work, you may need to configure a
|
||||
`Cleaning Network`_.
|
||||
|
||||
Starting manual cleaning via API
|
||||
--------------------------------
|
||||
|
||||
Manual cleaning can only be performed when a node is in the ``manageable``
|
||||
state. The REST API request to initiate it is available in API version 1.15 and
|
||||
higher::
|
||||
|
||||
PUT /v1/nodes/<node_ident>/states/provision
|
||||
|
||||
(Additional information is available `here <http://docs.openstack.org/developer/ironic/webapi/v1.html#nodes>`_.)
|
||||
|
||||
This API will allow operators to put a node directly into ``cleaning``
|
||||
provision state from ``manageable`` state via 'target': 'clean'.
|
||||
The PUT will also require the argument 'clean_steps' to be specified. This
|
||||
is an ordered list of cleaning steps. A cleaning step is represented by a
|
||||
dictionary (JSON), in the form::
|
||||
|
||||
{
|
||||
'interface': <interface>,
|
||||
'step': <name of cleaning step>,
|
||||
'args': {<arg1>: <value1>, ..., <argn>: <valuen>}
|
||||
}
|
||||
|
||||
The 'interface' and 'step' keys are required for all steps. If a cleaning step
|
||||
method takes keyword arguments, the 'args' key may be specified. It
|
||||
is a dictionary of keyword variable arguments, with each keyword-argument entry
|
||||
being <name>: <value>.
|
||||
|
||||
If any step is missing a required keyword argument, manual cleaning will not be
|
||||
performed and the node will be put in ``clean failed`` provision state with an
|
||||
appropriate error message.
|
||||
|
||||
If, during the cleaning process, a cleaning step determines that it has
|
||||
incorrect keyword arguments, all earlier steps will be performed and then the
|
||||
node will be put in ``clean failed`` provision state with an appropriate error
|
||||
message.
|
||||
|
||||
An example of the request body for this API::
|
||||
|
||||
{
|
||||
"target":"clean",
|
||||
"clean_steps": [{
|
||||
"interface": "raid",
|
||||
"step": "create_configuration",
|
||||
"args": {"create_nonroot_volumes": "False"}
|
||||
},
|
||||
{
|
||||
"interface": "deploy",
|
||||
"step": "erase_devices"
|
||||
}]
|
||||
}
|
||||
|
||||
In the above example, the driver's RAID interface would configure hardware
|
||||
RAID without non-root volumes, and then all devices would be erased
|
||||
(in that order).
|
||||
|
||||
Starting manual cleaning via ``ironic`` CLI
|
||||
-------------------------------------------
|
||||
|
||||
Manual cleaning is supported in the ``ironic node-set-provision-state``
|
||||
command, starting with python-ironicclient 1.2.
|
||||
|
||||
The target/verb is 'clean' and the argument 'clean-steps' must be specified.
|
||||
Its value is one of:
|
||||
|
||||
- a JSON string
|
||||
- path to a JSON file whose contents are passed to the API
|
||||
- '-', to read from stdin. This allows piping in the clean steps.
|
||||
Using '-' to signify stdin is common in Unix utilities.
|
||||
|
||||
Keep in mind that manual cleaning is only supported in API version 1.15 and
|
||||
higher.
|
||||
|
||||
An example of doing this with a JSON string::
|
||||
|
||||
ironic --ironic-api-version 1.15 node-set-provision-state \
|
||||
clean --clean-steps '{"clean_steps": [...]}'
|
||||
|
||||
Or with a file::
|
||||
|
||||
ironic --ironic-api-version 1.15 node-set-provision-state \
|
||||
clean --clean-steps my-clean-steps.txt
|
||||
|
||||
Or with stdin::
|
||||
|
||||
cat my-clean-steps.txt | ironic --ironic-api-version 1.15 \
|
||||
node-set-provision-state clean --clean-steps -
|
||||
|
||||
Cleaning Network
|
||||
================
|
||||
|
||||
If you are using the Neutron DHCP provider (the default) you will also need to
|
||||
ensure you have configured a cleaning network. This network will be used to
|
||||
@ -73,38 +221,48 @@ FAQ
|
||||
|
||||
How are cleaning steps ordered?
|
||||
-------------------------------
|
||||
Cleaning steps are ordered by integer priority, where a larger integer is a
|
||||
higher priority. In case of a conflict between priorities across drivers,
|
||||
the following resolution order is used: Power, Management, Deploy.
|
||||
For automated cleaning, cleaning steps are ordered by integer priority, where
|
||||
a larger integer is a higher priority. In case of a conflict between priorities
|
||||
across drivers, the following resolution order is used: Power, Management,
|
||||
Deploy, and RAID interfaces.
|
||||
|
||||
For manual cleaning, the cleaning steps should be specified in the desired
|
||||
order.
|
||||
|
||||
How do I skip a cleaning step?
|
||||
------------------------------
|
||||
Cleaning steps with a priority of 0 or None are skipped.
|
||||
For automated cleaning, cleaning steps with a priority of 0 or None are skipped.
|
||||
|
||||
|
||||
How do I change the priority of a cleaning step?
|
||||
------------------------------------------------
|
||||
For manual cleaning, specify the cleaning steps in the desired order.
|
||||
|
||||
For automated cleaning, it depends on whether the cleaning steps are
|
||||
out-of-band or in-band.
|
||||
|
||||
Most out-of-band cleaning steps have an explicit configuration option for
|
||||
priority.
|
||||
|
||||
Changing the priority of an in-band (ironic-python-agent) cleaning step
|
||||
currently requires use of a custom HardwareManager. The only exception is
|
||||
erase_devices, which can have its priority set in ironic.conf. For instance,
|
||||
to disable erase_devices, you'd use the following config::
|
||||
requires use of a custom HardwareManager. The only exception is
|
||||
``erase_devices``, which can have its priority set in ironic.conf. For instance,
|
||||
to disable erase_devices, you'd set the following configuration option::
|
||||
|
||||
[deploy]
|
||||
erase_devices_priority=0
|
||||
|
||||
To enable/disable the in-band disk erase using ``agent_ilo`` driver, use the
|
||||
following config::
|
||||
following configuration option::
|
||||
|
||||
[ilo]
|
||||
clean_priority_erase_devices=0
|
||||
|
||||
Generic hardware manager first tries to perform ATA disk erase by using
|
||||
The generic hardware manager first tries to perform ATA disk erase by using
|
||||
``hdparm`` utility. If ATA disk erase is not supported, it performs software
|
||||
based disk erase using ``shred`` utility. By default, the number of iterations
|
||||
performed by ``shred`` for software based disk erase is 1. To configure
|
||||
the number of iterations, use the following config::
|
||||
the number of iterations, use the following configuration option::
|
||||
|
||||
[deploy]
|
||||
erase_devices_iterations=1
|
||||
@ -115,14 +273,14 @@ What cleaning step is running?
|
||||
To check what cleaning step the node is performing or attempted to perform and
|
||||
failed, either query the node endpoint for the node or run ``ironic node-show
|
||||
$node_ident`` and look in the `internal_driver_info` field. The `clean_steps`
|
||||
field will contain a list of all remaining steps with their priority, and the
|
||||
field will contain a list of all remaining steps with their priorities, and the
|
||||
first one listed is the step currently in progress or that the node failed
|
||||
before going into cleanfail state.
|
||||
before going into ``clean failed`` state.
|
||||
|
||||
Should I disable cleaning?
|
||||
--------------------------
|
||||
Cleaning is recommended for ironic deployments, however, there are some
|
||||
tradeoffs to having it enabled. For instance, ironic cannot deploy a new
|
||||
Should I disable automated cleaning?
|
||||
------------------------------------
|
||||
Automated cleaning is recommended for ironic deployments, however, there are
|
||||
some tradeoffs to having it enabled. For instance, ironic cannot deploy a new
|
||||
instance to a node that is currently cleaning, and cleaning can be a time
|
||||
consuming process. To mitigate this, we suggest using disks with support for
|
||||
cryptographic ATA Security Erase, as typically the erase_devices step in the
|
||||
@ -138,17 +296,18 @@ cleaning.
|
||||
|
||||
Troubleshooting
|
||||
===============
|
||||
If cleaning fails on a node, the node will be put into cleanfail state and
|
||||
placed in maintenance mode, to prevent ironic from taking actions on the
|
||||
If cleaning fails on a node, the node will be put into ``clean failed`` state
|
||||
and placed in maintenance mode, to prevent ironic from taking actions on the
|
||||
node.
|
||||
|
||||
Nodes in cleanfail will not be powered off, as the node might be in a state
|
||||
such that powering it off could damage the node or remove useful information
|
||||
about the nature of the cleaning failure.
|
||||
Nodes in ``clean failed`` will not be powered off, as the node might be in a
|
||||
state such that powering it off could damage the node or remove useful
|
||||
information about the nature of the cleaning failure.
|
||||
|
||||
A cleanfail node can be moved to manageable state, where they cannot be
|
||||
scheduled by nova and you can safely attempt to fix the node. To move a node
|
||||
from cleanfail to manageable: ``ironic node-set-provision-state manage``.
|
||||
A ``clean failed`` node can be moved to ``manageable`` state, where it cannot
|
||||
be scheduled by nova and you can safely attempt to fix the node. To move a node
|
||||
from ``clean failed`` to ``manageable``:
|
||||
``ironic node-set-provision-state manage``.
|
||||
You can now take actions on the node, such as replacing a bad disk drive.
|
||||
|
||||
Strategies for determining why a cleaning step failed include checking the
|
||||
@ -156,8 +315,8 @@ ironic conductor logs, viewing logs on the still-running ironic-python-agent
|
||||
(if an in-band step failed), or performing general hardware troubleshooting on
|
||||
the node.
|
||||
|
||||
When the node is repaired, you can move the node back to available state, to
|
||||
allow it to be scheduled by nova.
|
||||
When the node is repaired, you can move the node back to ``available`` state,
|
||||
to allow it to be scheduled by nova.
|
||||
|
||||
::
|
||||
|
||||
@ -167,5 +326,5 @@ allow it to be scheduled by nova.
|
||||
# Now, make the node available for scheduling by nova
|
||||
ironic node-set-provision-state $node_ident provide
|
||||
|
||||
The node will begin cleaning from the start, and move to available state
|
||||
when complete.
|
||||
The node will begin automated cleaning from the start, and move to
|
||||
``available`` state when complete.
|
||||
|
@ -658,9 +658,8 @@ Configure the Bare Metal service for cleaning
|
||||
[neutron]
|
||||
...
|
||||
|
||||
# UUID of the network to create Neutron ports on when booting
|
||||
# to a ramdisk for cleaning/zapping using Neutron DHCP (string
|
||||
# value)
|
||||
# UUID of the network to create Neutron ports on, when booting
|
||||
# to a ramdisk for cleaning using Neutron DHCP. (string value)
|
||||
#cleaning_network_uuid=<None>
|
||||
cleaning_network_uuid = NETWORK_UUID
|
||||
|
||||
@ -1731,7 +1730,7 @@ To move a node from ``enroll`` to ``manageable`` provision state::
|
||||
+------------------------+--------------------------------------------------------------------+
|
||||
|
||||
When a node is moved from the ``manageable`` to ``available`` provision
|
||||
state, the node will be cleaned if configured to do so (see
|
||||
state, the node will go through automated cleaning if configured to do so (see
|
||||
:ref:`CleaningNetworkSetup`).
|
||||
To move a node from ``manageable`` to ``available`` provision state::
|
||||
|
||||
|
@ -84,13 +84,13 @@ upgrade has completed.
|
||||
|
||||
Cleaning
|
||||
--------
|
||||
A new feature in Kilo is support for the cleaning of nodes between workloads to
|
||||
ensure the node is ready for another workload. This can include erasing the
|
||||
hard drives, updating firmware, and other steps. For more information, see
|
||||
:ref:`cleaning`.
|
||||
A new feature in Kilo is support for the automated cleaning of nodes between
|
||||
workloads to ensure the node is ready for another workload. This can include
|
||||
erasing the hard drives, updating firmware, and other steps. For more
|
||||
information, see :ref:`automated_cleaning`.
|
||||
|
||||
If Ironic is configured with cleaning enabled (defaults to True) and to use
|
||||
Neutron as the DHCP provider (also the default), you will need to set the
|
||||
If Ironic is configured with automated cleaning enabled (defaults to True) and
|
||||
to use Neutron as the DHCP provider (also the default), you will need to set the
|
||||
`cleaning_network_uuid` option in the Ironic configuration file before starting
|
||||
the Kilo Ironic service. See :ref:`CleaningNetworkSetup` for information on
|
||||
how to set up the cleaning network for Ironic.
|
||||
|
@ -114,7 +114,7 @@ Additional requirements
|
||||
|
||||
* BIOS must try next boot device if PXE boot failed
|
||||
|
||||
* Cleaning should be disabled, see :ref:`cleaning`
|
||||
* Automated cleaning should be disabled, see :ref:`automated_cleaning`
|
||||
|
||||
* Node should be powered off before start of deploy
|
||||
|
||||
|
@ -61,10 +61,10 @@ API Versions History
|
||||
|
||||
Newly registered nodes begin in the ``enroll`` provision state by default,
|
||||
instead of ``available``. To get them to the ``available`` state,
|
||||
the ``manage`` action must first be ran, to verify basic hardware control.
|
||||
On success the node moves to ``manageable`` provision state, then the
|
||||
``provide`` action must be run, which will clean the node and
|
||||
make it available.
|
||||
the ``manage`` action must first be run to verify basic hardware control.
|
||||
On success the node moves to ``manageable`` provision state. Then the
|
||||
``provide`` action must be run. Automated cleaning of the node is done and
|
||||
the node is made ``available``.
|
||||
|
||||
**1.10**
|
||||
|
||||
|
@ -2,4 +2,4 @@
|
||||
features:
|
||||
- Adds support for manual cleaning. This is available with API
|
||||
version 1.15. For more information, see
|
||||
http://specs.openstack.org/openstack/ironic-specs/specs/approved/manual-cleaning.html
|
||||
http://docs.openstack.org/developer/ironic/deploy/cleaning.html#manual-cleaning
|
||||
|
Loading…
Reference in New Issue
Block a user