454ad95666
I've been asked a number of questions w/r/t RAID recently, and thoguht it would be easiest to just update the documentation for the purposes of clarity. Change-Id: I940d4a625ac635a6e4c374a6e3e5e5228a6f8a77
522 lines
19 KiB
ReStructuredText
522 lines
19 KiB
ReStructuredText
.. _raid:
|
|
|
|
==================
|
|
RAID Configuration
|
|
==================
|
|
|
|
Overview
|
|
========
|
|
Ironic supports RAID configuration for bare metal nodes. It allows operators
|
|
to specify the desired RAID configuration via the OpenStackClient CLI or REST
|
|
API. The desired RAID configuration is applied on the bare metal during manual
|
|
cleaning.
|
|
|
|
The examples described here use the OpenStackClient CLI; please see the
|
|
`REST API reference <https://docs.openstack.org/api-ref/baremetal/>`_
|
|
for their corresponding REST API requests.
|
|
|
|
Prerequisites
|
|
=============
|
|
The bare metal node needs to use a hardware type that supports RAID
|
|
configuration. RAID interfaces may implement RAID configuration either in-band
|
|
or out-of-band. Software RAID is supported on all hardware, although with some
|
|
caveats - see `Software RAID`_ for details.
|
|
|
|
In-band RAID configuration (including software RAID) is done using the
|
|
Ironic Python Agent ramdisk. For in-band hardware RAID configuration,
|
|
a hardware manager which supports RAID should be bundled with the ramdisk.
|
|
|
|
Whether a node supports RAID configuration could be found using the CLI
|
|
command ``baremetal node validate <node>``. In-band RAID is
|
|
usually implemented by the ``agent`` RAID interface.
|
|
|
|
Build agent ramdisk which supports RAID configuration
|
|
=====================================================
|
|
|
|
For doing in-band hardware RAID configuration, Ironic needs an agent ramdisk
|
|
bundled with a hardware manager which supports RAID configuration for your
|
|
hardware. For example, the :ref:`DIB_raid_support` should be used for HPE
|
|
Proliant Servers.
|
|
|
|
.. note::
|
|
For in-band software RAID, the agent ramdisk does not need to be bundled
|
|
with a hardware manager as the generic hardware manager in the Ironic
|
|
Python Agent already provides (basic) support for software RAID.
|
|
|
|
RAID configuration JSON format
|
|
==============================
|
|
The desired RAID configuration and current RAID configuration are represented
|
|
in JSON format.
|
|
|
|
Target RAID configuration
|
|
-------------------------
|
|
This is the desired RAID configuration on the bare metal node. Using the
|
|
OpenStackClient CLI (or REST API), the operator sets ``target_raid_config``
|
|
field of the node. The target RAID configuration will be applied during manual
|
|
cleaning.
|
|
|
|
Target RAID configuration is a dictionary having ``logical_disks``
|
|
as the key. The value for the ``logical_disks`` is a list of JSON
|
|
dictionaries. It looks like::
|
|
|
|
{
|
|
"logical_disks": [
|
|
{<desired properties of logical disk 1>},
|
|
{<desired properties of logical disk 2>},
|
|
...
|
|
]
|
|
}
|
|
|
|
If the ``target_raid_config`` is an empty dictionary, it unsets the value of
|
|
``target_raid_config`` if the value was set with previous RAID configuration
|
|
done on the node.
|
|
|
|
Each dictionary of logical disk contains the desired properties of logical
|
|
disk supported by the hardware type. These properties are discoverable by::
|
|
|
|
baremetal driver raid property list <driver name>
|
|
|
|
Mandatory properties
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
These properties must be specified for each logical
|
|
disk and have no default values:
|
|
|
|
- ``size_gb`` - Size (Integer) of the logical disk to be created in GiB.
|
|
``MAX`` may be specified if the logical disk should use all of the
|
|
remaining space available. This can be used only when backing physical
|
|
disks are specified (see below).
|
|
|
|
- ``raid_level`` - RAID level for the logical disk. Ironic supports the
|
|
following RAID levels: 0, 1, 2, 5, 6, 1+0, 5+0, 6+0.
|
|
|
|
Optional properties
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
These properties have default values and they may be overridden in the
|
|
specification of any logical disk. None of these options are supported for
|
|
software RAID.
|
|
|
|
- ``volume_name`` - Name of the volume. Should be unique within the Node.
|
|
If not specified, volume name will be auto-generated.
|
|
|
|
- ``is_root_volume`` - Set to ``true`` if this is the root volume. At
|
|
most one logical disk can have this set to ``true``; the other
|
|
logical disks must have this set to ``false``. The
|
|
``root device hint`` will be saved, if the RAID interface is capable of
|
|
retrieving it. This is ``false`` by default.
|
|
|
|
Backing physical disk hints
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
These hints are specified for each logical disk to let Ironic find the desired
|
|
disks for RAID configuration. This is machine-independent information. This
|
|
serves the use-case where the operator doesn't want to provide individual
|
|
details for each bare metal node. None of these options are supported for
|
|
software RAID.
|
|
|
|
- ``share_physical_disks`` - Set to ``true`` if this logical disk can
|
|
share physical disks with other logical disks. The default value is
|
|
``false``, except for software RAID which always shares disks.
|
|
|
|
- ``disk_type`` - ``hdd`` or ``ssd``. If this is not specified, disk type
|
|
will not be a criterion to find backing physical disks.
|
|
|
|
- ``interface_type`` - ``sata`` or ``scsi`` or ``sas``. If this is not
|
|
specified, interface type will not be a criterion to
|
|
find backing physical disks.
|
|
|
|
- ``number_of_physical_disks`` - Integer, number of disks to use for the
|
|
logical disk. Defaults to minimum number of disks required for the
|
|
particular RAID level, except for software RAID which always spans all disks.
|
|
|
|
Backing physical disks
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
These are the actual machine-dependent information. This is suitable for
|
|
environments where the operator wants to automate the selection of physical
|
|
disks with a 3rd-party tool based on a wider range of attributes
|
|
(eg. S.M.A.R.T. status, physical location). The values for these properties
|
|
are hardware dependent.
|
|
|
|
- ``controller`` - The name of the controller as read by the RAID interface.
|
|
In order to trigger the setup of a Software RAID via the Ironic Python
|
|
Agent, the value of this property needs to be set to ``software``.
|
|
- ``physical_disks`` - A list of physical disks to use as read by the
|
|
RAID interface.
|
|
|
|
For software RAID ``physical_disks`` is a list of device hints in the same
|
|
format as used for :ref:`root-device-hints`. The number of provided hints
|
|
must match the expected number of backing devices (repeat the same hint if
|
|
necessary).
|
|
|
|
.. note::
|
|
If properties from both "Backing physical disk hints" or
|
|
"Backing physical disks" are specified, they should be consistent with
|
|
each other. If they are not consistent, then the RAID configuration
|
|
will fail (because the appropriate backing physical disks could
|
|
not be found).
|
|
|
|
.. _raid-config-examples:
|
|
|
|
Examples for ``target_raid_config``
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
*Example 1*. Single RAID disk of RAID level 5 with all of the space
|
|
available. Make this the root volume to which Ironic deploys the image:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"logical_disks": [
|
|
{
|
|
"size_gb": "MAX",
|
|
"raid_level": "5",
|
|
"is_root_volume": true
|
|
}
|
|
]
|
|
}
|
|
|
|
*Example 2*. Two RAID disks. One with RAID level 5 of 100 GiB and make it
|
|
root volume and use SSD. Another with RAID level 1 of 500 GiB and use
|
|
HDD:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"logical_disks": [
|
|
{
|
|
"size_gb": 100,
|
|
"raid_level": "5",
|
|
"is_root_volume": true,
|
|
"disk_type": "ssd"
|
|
},
|
|
{
|
|
"size_gb": 500,
|
|
"raid_level": "1",
|
|
"disk_type": "hdd"
|
|
}
|
|
]
|
|
}
|
|
|
|
*Example 3*. Single RAID disk. I know which disks and controller to use:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"logical_disks": [
|
|
{
|
|
"size_gb": 100,
|
|
"raid_level": "5",
|
|
"controller": "Smart Array P822 in Slot 3",
|
|
"physical_disks": ["6I:1:5", "6I:1:6", "6I:1:7"],
|
|
"is_root_volume": true
|
|
}
|
|
]
|
|
}
|
|
|
|
*Example 4*. Using backing physical disks:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"logical_disks": [
|
|
{
|
|
"size_gb": 50,
|
|
"raid_level": "1+0",
|
|
"controller": "RAID.Integrated.1-1",
|
|
"volume_name": "root_volume",
|
|
"is_root_volume": true,
|
|
"physical_disks": [
|
|
"Disk.Bay.0:Encl.Int.0-1:RAID.Integrated.1-1",
|
|
"Disk.Bay.1:Encl.Int.0-1:RAID.Integrated.1-1"
|
|
]
|
|
},
|
|
{
|
|
"size_gb": 100,
|
|
"raid_level": "5",
|
|
"controller": "RAID.Integrated.1-1",
|
|
"volume_name": "data_volume",
|
|
"physical_disks": [
|
|
"Disk.Bay.2:Encl.Int.0-1:RAID.Integrated.1-1",
|
|
"Disk.Bay.3:Encl.Int.0-1:RAID.Integrated.1-1",
|
|
"Disk.Bay.4:Encl.Int.0-1:RAID.Integrated.1-1"
|
|
]
|
|
}
|
|
]
|
|
}
|
|
|
|
*Example 5*. Software RAID with two RAID devices:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"logical_disks": [
|
|
{
|
|
"size_gb": 100,
|
|
"raid_level": "1",
|
|
"controller": "software"
|
|
},
|
|
{
|
|
"size_gb": "MAX",
|
|
"raid_level": "0",
|
|
"controller": "software"
|
|
}
|
|
]
|
|
}
|
|
|
|
*Example 6*. Software RAID, limiting backing block devices to exactly two
|
|
devices with the size exceeding 100 GiB:
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"logical_disks": [
|
|
{
|
|
"size_gb": "MAX",
|
|
"raid_level": "0",
|
|
"controller": "software",
|
|
"physical_disks": [
|
|
{"size": "> 100"},
|
|
{"size": "> 100"}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
|
|
Current RAID configuration
|
|
--------------------------
|
|
After target RAID configuration is applied on the bare metal node, Ironic
|
|
populates the current RAID configuration. This is populated in the
|
|
``raid_config`` field in the Ironic node. This contains the details about
|
|
every logical disk after they were created on the bare metal node. It
|
|
contains details like RAID controller used, the backing physical disks used,
|
|
WWN of each logical disk, etc. It also contains information about each
|
|
physical disk found on the bare metal node.
|
|
|
|
To get the current RAID configuration::
|
|
|
|
baremetal node show <node-uuid-or-name>
|
|
|
|
Workflow
|
|
========
|
|
|
|
* Operator configures the bare metal node with a hardware type that has
|
|
a ``RAIDInterface`` other than ``no-raid``. For instance, for Software RAID,
|
|
this would be ``agent``.
|
|
|
|
* For in-band RAID configuration, operator builds an agent ramdisk which
|
|
supports RAID configuration by bundling the hardware manager with the
|
|
ramdisk. See `Build agent ramdisk which supports RAID configuration`_ for
|
|
more information.
|
|
|
|
* Operator prepares the desired target RAID configuration as mentioned in
|
|
`Target RAID configuration`_. The target RAID configuration is set on
|
|
the Ironic node::
|
|
|
|
baremetal node set <node-uuid-or-name> \
|
|
--target-raid-config <JSON file containing target RAID configuration>
|
|
|
|
The CLI command can accept the input from standard input also::
|
|
|
|
baremetal node set <node-uuid-or-name> \
|
|
--target-raid-config -
|
|
|
|
* Create a JSON file with the RAID clean steps for manual cleaning. Add other
|
|
clean steps as desired::
|
|
|
|
[{
|
|
"interface": "raid",
|
|
"step": "delete_configuration"
|
|
},
|
|
{
|
|
"interface": "raid",
|
|
"step": "create_configuration"
|
|
}]
|
|
|
|
.. note::
|
|
'create_configuration' doesn't remove existing disks. It is recommended
|
|
to add 'delete_configuration' before 'create_configuration' to make
|
|
sure that only the desired logical disks exist in the system after
|
|
manual cleaning.
|
|
|
|
* Bring the node to ``manageable`` state and do a ``clean`` action to start
|
|
cleaning on the node::
|
|
|
|
baremetal node clean <node-uuid-or-name> \
|
|
--clean-steps <JSON file containing clean steps created above>
|
|
|
|
* After manual cleaning is complete, the current RAID configuration is
|
|
reported in the ``raid_config`` field when running::
|
|
|
|
baremetal node show <node-uuid-or-name>
|
|
|
|
Software RAID
|
|
=============
|
|
|
|
Building Linux software RAID in-band (via the Ironic Python Agent ramdisk)
|
|
is supported starting with the Train release. It is requested by using the
|
|
``agent`` RAID interface and RAID configuration with all controllers set
|
|
to ``software``. You can find a software RAID configuration example in
|
|
:ref:`raid-config-examples`.
|
|
|
|
There are certain limitations to be aware of:
|
|
|
|
* Only the mandatory properties (plus the required ``controller`` property)
|
|
from `Target RAID configuration`_ are currently supported.
|
|
|
|
* The number of created Software RAID devices must be 1 or 2. If there is only
|
|
one Software RAID device, it has to be a RAID-1. If there are two, the first
|
|
one has to be a RAID-1, while the RAID level for the second one can be
|
|
0, 1, 1+0, 5, or 6. As the first RAID device will be the deployment device,
|
|
enforcing a RAID-1 reduces the risk of ending up with a non-booting node
|
|
in case of a disk failure.
|
|
|
|
* Building RAID will fail if the target disks are already partitioned. Wipe the
|
|
disks using e.g. the ``erase_devices_metadata`` clean step before building
|
|
RAID::
|
|
|
|
[{
|
|
"interface": "raid",
|
|
"step": "delete_configuration"
|
|
},
|
|
{
|
|
"interface": "deploy",
|
|
"step": "erase_devices_metadata"
|
|
},
|
|
{
|
|
"interface": "raid",
|
|
"step": "create_configuration"
|
|
}]
|
|
|
|
* The final instance image must have the ``mdadm`` utility installed
|
|
and needs to be able to detect software RAID devices at boot time
|
|
(which is usually done by having the RAID drivers embedded in the
|
|
image's initrd).
|
|
|
|
* Regular cleaning will not remove RAID configuration (similarly to hardware
|
|
RAID). To destroy RAID run the ``delete_configuration`` manual clean step.
|
|
|
|
* There is no support for partition images, only whole-disk images are
|
|
supported with Software RAID. See :doc:`/install/configure-glance-images`.
|
|
This includes flavors requesting dynamic creation of swap filesystems.
|
|
Swap should be pre-allocated inside of a disk image partition layout.
|
|
|
|
* Images utilizing LVM for their root filesystem are not supported. Patches
|
|
are welcome to explicitly support such functionality.
|
|
|
|
* If the root filesystem UUID is not known to Ironic via metadata, then the
|
|
disk image layout **MUST** have the first partition consist of the root
|
|
filesystem. Ironic is agnostic if the partition table is a DOS MBR or a
|
|
GPT partition.
|
|
|
|
Starting in Ironic 14.0.0 (Ussuri), the root filesystem UUID can be set
|
|
and passed through to Ironic through the Glance Image Service ``properties``
|
|
sub-field ``rootfs_uuid`` for the image to be deployed.
|
|
|
|
Starting in Ironic 16.1.0 (Wallaby), similar functionality is available
|
|
via the baremetal node ``instance_info`` field value ``image_rootfs_uuid``.
|
|
See :doc:`/install/standalone` for more details on standalone usage
|
|
including an example command.
|
|
|
|
* In UEFI mode, the Ironic Python Agent creates EFI system partitions (ESPs)
|
|
for the bootloader and the boot configuration (grub.cfg or grubenv) on all
|
|
holder devices. The content of these partitions is populated upon deployment
|
|
from the deployed user image. Depending on how the partitions are mounted,
|
|
the content of the partitions may get out of sync, e.g. when new kernels
|
|
are installed or the bootloader is updated, so measures to keep these
|
|
partitions in sync need to be taken. Note that starting with the Victoria
|
|
release, the Ironic Python Agent configures a RAID-1 mirror for the ESPs,
|
|
so no additional measures to ensure consistency of the ESPs should be
|
|
required any longer.
|
|
|
|
* In BIOS mode, the Ironic Python Agent installs the boot loader onto all
|
|
disks. While nothing is required for kernel or grub package updates,
|
|
re-installing the bootloader on one disk, e.g. during a disk replacement,
|
|
may require to re-install the bootloader on all disks. Otherwise, there
|
|
is a risk of an incompatibility of the grub components stored on the device
|
|
(i.e. stage1/boot.img in the MBR and stage1.5/core.img in the MBR gap) with
|
|
the ones stored in /boot (stage2). This incompatibility can render the node
|
|
unbootable if the wrong disk is selected for booting.
|
|
|
|
* Linux kernel device naming is not consistent across reboots for RAID devices
|
|
and may be numbered in a distribution specific pattern. Operators will need
|
|
to be mindful of this if a root device hint is utilized.
|
|
A particular example of this is that the first "md0" device on a Ubuntu
|
|
based ramdisk may start as device "md0", whereas on a Centos or Red Hat
|
|
Enterprise Linux based ramdisk may start at device "md127". After a reboot,
|
|
these device names may change entirely.
|
|
|
|
.. NOTE::
|
|
:ref:`Root device hints <root-device-hints>` should not be explicitly
|
|
required to utilize software RAID. Candidate devices are chosen by
|
|
sorting the usable device list looking for the smallest usable
|
|
device which is then sorted by name. The secondary sort by name
|
|
improves the odds for matching the first initialized block device.
|
|
In the case of software RAID, they are always a little smaller than
|
|
the primary block devices due to metadata overhead, which helps make
|
|
them the most likely candidate devices.
|
|
|
|
Image requirements
|
|
------------------
|
|
|
|
Since Ironic needs to perform additional steps when deploying nodes
|
|
with software RAID, there are some requirements the deployed images need
|
|
to fulfill. Up to and including the Train release, the image needs to
|
|
have its root file system on the first partition. Starting with Ussuri,
|
|
the image can also have additional metadata to point Ironic to the
|
|
partition with the root file system: for this, the image needs to set
|
|
the ``rootfs_uuid`` property with the file system UUID of the root file
|
|
system. One way to extract this UUID from an existing image is to
|
|
download the image, mount it as a loopback device, and use ``blkid``:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ sudo losetup -f
|
|
$ sudo losetup /dev/loop0 /tmp/myimage.raw
|
|
$ sudo kpartx -a /dev/loop0
|
|
$ blkid
|
|
|
|
The pre-Ussuri approach, i.e. to have the root file system on
|
|
the first partition, is kept as a fallback and hence allows software
|
|
RAID deployments where Ironic does not have access to any image metadata
|
|
(e.g. Ironic stand-alone).
|
|
|
|
Using RAID in nova flavor for scheduling
|
|
========================================
|
|
|
|
The operator can specify the `raid_level` capability in nova flavor for node to be selected
|
|
for scheduling::
|
|
|
|
openstack flavor set my-baremetal-flavor --property capabilities:raid_level="1+0"
|
|
|
|
Developer documentation
|
|
=======================
|
|
In-band RAID configuration is done using IPA ramdisk. IPA ramdisk has
|
|
support for pluggable hardware managers which can be used to extend the
|
|
functionality offered by IPA ramdisk using stevedore plugins. For more
|
|
information, see Ironic Python Agent
|
|
:ironic-python-agent-doc:`Hardware Manager <install/index.html#hardware-managers>`
|
|
documentation.
|
|
|
|
The hardware manager that supports RAID configuration should do the following:
|
|
|
|
#. Implement a method named ``create_configuration``. This method creates
|
|
the RAID configuration as given in ``target_raid_config``. After successful
|
|
RAID configuration, it returns the current RAID configuration information
|
|
which ironic uses to set ``node.raid_config``.
|
|
|
|
#. Implement a method named ``delete_configuration``. This method deletes
|
|
all the RAID disks on the bare metal.
|
|
|
|
#. Return these two clean steps in ``get_clean_steps`` method with priority
|
|
as 0. Example::
|
|
|
|
return [{'step': 'create_configuration',
|
|
'interface': 'raid',
|
|
'priority': 0},
|
|
{'step': 'delete_configuration',
|
|
'interface': 'raid',
|
|
'priority': 0}]
|
|
|