docs: Add context around asynchronous device initialization
Centos Stream and ultimately RHEL have switched to asynchronous device initialization, which impacts root device hints and their usability on those systems, in large part because context which people have traditionally had, no longer holds true on those newer kernels. This doc update attempts to provide the needful context to guide operators to the best possible outcome given the distribution changes. Change-Id: I541086cfe235b10f1f1dba95fad95022a22f9ce7
This commit is contained in:
parent
462f86889b
commit
ac31720ac1
@ -1283,3 +1283,62 @@ related to image files.
|
|||||||
Image safety checks are generally performed as the deployment process begins
|
Image safety checks are generally performed as the deployment process begins
|
||||||
and stages artifacts, however a late stage check is performed when
|
and stages artifacts, however a late stage check is performed when
|
||||||
needed by the ironic-python-agent.
|
needed by the ironic-python-agent.
|
||||||
|
|
||||||
|
Using /dev/sda does not write to the first disk
|
||||||
|
===============================================
|
||||||
|
|
||||||
|
Alternative name: I chose /dev/sda but I found it as /dev/sdb after rebooting.
|
||||||
|
|
||||||
|
Historically, Linux users have grown accustom to a context where /dev/sda is
|
||||||
|
the first device in a physical machine. Meaning, if you look at the device
|
||||||
|
by_path information or the HCTL, or device LUN, the device ends with a zero.
|
||||||
|
|
||||||
|
For example, assuming 3 disks, two controllers, with a single disk on the
|
||||||
|
second controller would look something like this:
|
||||||
|
|
||||||
|
* /dev/sda maps to a device with lun 0, HCTL 0:0:0:0
|
||||||
|
* /dev/sdb maps to a device with lun 1, HCTL 0:0:1:0
|
||||||
|
* /dev/sdc maps to a device with lun 2, HCTL 0:1:0:0
|
||||||
|
|
||||||
|
However, this was a pattern we grew accustom to because the order of device
|
||||||
|
discovery was sequential *and* synchronous. In other words the kernel stepped
|
||||||
|
through all possible devices one at a time. Where this breaks is when the
|
||||||
|
kernel is operating in a mode where device initialization is asynchronous as
|
||||||
|
some distributions have decided to adopt.
|
||||||
|
|
||||||
|
The result of a move to an asynchronous initialization is /dev/sda has always
|
||||||
|
been the *first* device to initialize, *not* the first device in the system.
|
||||||
|
As a result, we can end up with something looking like:
|
||||||
|
|
||||||
|
* /dev/sda maps to a device with lun 1, HCTL 0:0:1:0
|
||||||
|
* /dev/sdb maps to a device with lun 2, HCTL 0:1:0:0
|
||||||
|
* /dev/sdc maps to a device with lun 0, HCTL 0:0:0:0
|
||||||
|
|
||||||
|
Generally, most operators might then consider referencing the
|
||||||
|
/dev/disk/by-path structure to match disk devices because that seems to imply
|
||||||
|
a static order, *however* a kernel operating with asynchronous device
|
||||||
|
initialization will order *everything*, including PCI devices the same way,
|
||||||
|
meaning by-path can also be unreliable. Furthermore, if your server hardware
|
||||||
|
is using multipath IO, you should be operating with multipath enabled such
|
||||||
|
that the device is used.
|
||||||
|
|
||||||
|
The net result is the best criteria to match on is:
|
||||||
|
|
||||||
|
* Serial Number
|
||||||
|
* World Wide Name
|
||||||
|
* Device HCTL, which *does* appear to be static in these cases, but is not
|
||||||
|
applicable for hosts using multipathing. It may, ultimately, not be static
|
||||||
|
enough, just depending on the hardware in use.
|
||||||
|
|
||||||
|
.. NOTE: Some RAID controllers will generate fake WWN and Serial numbers for
|
||||||
|
"disks" being supplied by the RAID controller. Some may also use the same
|
||||||
|
WWN for *all* devices, which is a valid approach as the device Logical Unit
|
||||||
|
Numbers or Device identifier number would be different. Ultimately this
|
||||||
|
means labels on disks may not be able to be matched to volumes through a
|
||||||
|
RAID controller, and operators will need to simply "know their hardware"
|
||||||
|
to navigate the best path depending on the configuration and behavior of
|
||||||
|
their hardware.
|
||||||
|
|
||||||
|
.. NOTE: Centos Stream-9 appears to have a probe_type="sync" option which
|
||||||
|
reverts this behavior. For more information please see
|
||||||
|
this `centos stream-9 changeset <https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2819/diffs?commit_id=a93f405246083f0c2e81d0e6c37ba31c6c1b29c3>`_.
|
||||||
|
@ -9,6 +9,12 @@ which disk it should pick for the deployment. The list of supported hints is:
|
|||||||
* model (STRING): device identifier
|
* model (STRING): device identifier
|
||||||
* vendor (STRING): device vendor
|
* vendor (STRING): device vendor
|
||||||
* serial (STRING): disk serial number
|
* serial (STRING): disk serial number
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Some RAID controllers will generate serial numbers to represent volumes
|
||||||
|
provided to the operating system which do not match or align to physical
|
||||||
|
disks in a system.
|
||||||
|
|
||||||
* size (INT): size of the device in GiB
|
* size (INT): size of the device in GiB
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
@ -18,7 +24,9 @@ which disk it should pick for the deployment. The list of supported hints is:
|
|||||||
should be the actual size. For example, for a 128 GiB disk ``local_gb``
|
should be the actual size. For example, for a 128 GiB disk ``local_gb``
|
||||||
will be 127, but size hint will be 128.
|
will be 127, but size hint will be 128.
|
||||||
|
|
||||||
* wwn (STRING): unique storage identifier
|
* wwn (STRING): unique storage identifier and typically mapping to a device.
|
||||||
|
This can be a single device, or a SAN storage controller,
|
||||||
|
or a RAID controller.
|
||||||
* wwn_with_extension (STRING): unique storage identifier with the vendor extension appended
|
* wwn_with_extension (STRING): unique storage identifier with the vendor extension appended
|
||||||
* wwn_vendor_extension (STRING): unique vendor storage identifier
|
* wwn_vendor_extension (STRING): unique vendor storage identifier
|
||||||
* rotational (BOOLEAN): whether it's a rotational device or not. This
|
* rotational (BOOLEAN): whether it's a rotational device or not. This
|
||||||
@ -28,6 +36,11 @@ which disk it should pick for the deployment. The list of supported hints is:
|
|||||||
e.g '1:0:0:0'
|
e.g '1:0:0:0'
|
||||||
* by_path (STRING): the alternate device name corresponding to a particular
|
* by_path (STRING): the alternate device name corresponding to a particular
|
||||||
PCI or iSCSI path, e.g /dev/disk/by-path/pci-0000:00
|
PCI or iSCSI path, e.g /dev/disk/by-path/pci-0000:00
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Device identification by-path may not be reliable on Linux kernels using
|
||||||
|
asynchronous device initialization.
|
||||||
|
|
||||||
* name (STRING): the device name, e.g /dev/md0
|
* name (STRING): the device name, e.g /dev/md0
|
||||||
|
|
||||||
|
|
||||||
@ -39,6 +52,13 @@ which disk it should pick for the deployment. The list of supported hints is:
|
|||||||
devices like /dev/sda and /dev/sdb `switching around at boot time
|
devices like /dev/sda and /dev/sdb `switching around at boot time
|
||||||
<https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/persistent_naming.html>`_.
|
<https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/persistent_naming.html>`_.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
Furthermore, recent move to asynchronous device initialization among
|
||||||
|
some Linux distribution kernels means that the actual device name string
|
||||||
|
is entirely unreliable when multiple devices are present in the host, as
|
||||||
|
the device name is claimed by the device which responded first, as opposed
|
||||||
|
to the previous pattern where it was the first initialized device in
|
||||||
|
a synchronous process.
|
||||||
|
|
||||||
To associate one or more hints with a node, update the node's properties
|
To associate one or more hints with a node, update the node's properties
|
||||||
with a ``root_device`` key, for example::
|
with a ``root_device`` key, for example::
|
||||||
|
Loading…
Reference in New Issue
Block a user