Update troubleshooting docs on no valid host found error

* Add examples of correct data
* Regroup bullet points into more logical flow
* Use OSC-based commands for Nova
* Add short information on using nova scheduler logs

Based on real world debugging experience.

Change-Id: I0f2441764a1b434fca6a1589c72ec95b083d19ab
This commit is contained in:
Dmitry Tantsur 2017-01-12 19:33:33 +01:00
parent 713a440884
commit 98aa3bbf47

View File

@ -20,48 +20,98 @@ to find and resources that Ironic advertised to Nova.
A few things should be checked in this case: A few things should be checked in this case:
#. Make sure that enough nodes are in ``available`` state, not in
maintenance mode and not already used by an existing instance.
Check with the following command::
ironic node-list --provision-state available --maintenance false --associated false
If this command does not show enough nodes, use generic ``ironic
node-list`` to check other nodes. For example, nodes in ``manageable`` state
should be made available::
ironic node-set-provision-state <IRONIC NODE> provide
The Bare metal service automatically puts a node in maintenance mode if
there are issues with accessing its management interface. Check the power
credentials (e.g. ``ipmi_address``, ``ipmi_username`` and ``ipmi_password``)
and then move the node out of maintenance mode::
ironic node-set-maintenance <IRONIC NODE> off
The ``node-validate`` command can be used to verify that all required fields
are present. The following command should not return anything::
ironic node-validate baremetal-0 | grep -E '(power|management)\W*False'
Maintenance mode will be also set on a node if automated cleaning has
failed for it previously.
#. Inspection should have succeeded for you before, or you should have #. Inspection should have succeeded for you before, or you should have
entered the required Ironic node properties manually. For each node with entered the required Ironic node properties manually. For each node with
available state in ``ironic node-list --provision-state available`` use ``available`` state make sure that the ``properties`` JSON field has valid
:: values for the keys ``cpus``, ``cpu_arch``, ``memory_mb`` and ``local_gb``.
Example of valid properties::
ironic node-show <IRONIC-NODE-UUID> $ ironic node-show <IRONIC NODE> --fields properties
+------------+------------------------------------------------------------------------------------+
| Property | Value |
+------------+------------------------------------------------------------------------------------+
| properties | {u'memory_mb': u'8192', u'cpu_arch': u'x86_64', u'local_gb': u'41', u'cpus': u'4'} |
+------------+------------------------------------------------------------------------------------+
and make sure that ``properties`` JSON field has valid values for keys .. warning::
``cpus``, ``cpu_arch``, ``memory_mb`` and ``local_gb``. If you're using exact match filters in the Nova Scheduler, make sure
the flavor and the node properties match exactly.
#. The Nova flavor that you are using does not match any properties of the #. The Nova flavor that you are using does not match any properties of the
available Ironic nodes. Use available Ironic nodes. Use
:: ::
nova flavor-show <FLAVOR NAME> openstack flavor show <FLAVOR NAME>
to compare. If you're using exact match filters in Nova Scheduler, please to compare. The extra specs in your flavor starting with ``capability:``
make sure the flavor and the node properties match exactly. Regarding should match ones in ``node.properties['capabilities']``.
the extra specs in flavor, you should make sure they map to
``node.properties['capabilities']``.
#. Make sure that enough nodes are in ``available`` state according to .. note::
``ironic node-list --provision-state available``. The format of capabilities is different in Nova and Ironic.
E.g. in Nova flavor::
#. Make sure nodes you're going to deploy to are not in maintenance mode. $ openstack flavor show <FLAVOR NAME> -c properties
Again, use ``ironic node-list`` to check. A node automatically going to +------------+----------------------------------+
maintenance mode usually means wrong power credentials for this node. Check | Field | Value |
them and then remove maintenance mode:: +------------+----------------------------------+
| properties | capabilities:boot_option='local' |
+------------+----------------------------------+
ironic node-set-maintenance <IRONIC-NODE-UUID> off But in Ironic node::
$ ironic node-show <IRONIC NODE> --fields properties
+------------+-----------------------------------------+
| Property | Value |
+------------+-----------------------------------------+
| properties | {u'capabilities': u'boot_option:local'} |
+------------+-----------------------------------------+
#. After making changes to nodes in Ironic, it takes time for those changes #. After making changes to nodes in Ironic, it takes time for those changes
to propagate from Ironic to Nova. to propagate from Ironic to Nova. Check that
Check that
:: ::
nova hypervisor-stats openstack hypervisor stats show
correctly shows total amount of resources in your system. You can also correctly shows total amount of resources in your system. You can also
check ``nova hypervisor-list`` to see the status of individual Ironic check ``openstack hypervisor show <IRONIC NODE>`` to see the status of
nodes as reported to Nova. And you can correlate the Nova "hypervisor individual Ironic nodes as reported to Nova.
hostname" to the Ironic node UUID.
#. Figure out which Nova Scheduler filter ruled out your nodes. Check the
``nova-scheduler`` logs for lines containing something like::
Filter ComputeCapabilitiesFilter returned 0 hosts
The name of the filter that removed the last hosts may give some hints on
what exactly was not matched. See `Nova filters documentation
<http://docs.openstack.org/developer/nova/filter_scheduler.html>`_ for more
details.
#. If none of the above helped, check Ironic conductor log carefully to see #. If none of the above helped, check Ironic conductor log carefully to see
if there are any conductor-related errors which are the root cause for if there are any conductor-related errors which are the root cause for