README: More updates for Troubleshooting section

* mention DHCP logs and tcpdump
* clarify mention Red Hat systems
* small tweaks

Change-Id: I68713f278d0a132588edcf09d338e7c7d9891fb6
This commit is contained in:
Dmitry Tantsur 2015-04-09 16:21:18 +02:00
parent 77412ad05b
commit 43530a5330

View File

@ -306,8 +306,8 @@ Node States
ironic node-update <UUID> replace maintenance=false
.. note::
Due to how Ironic Nova driver works, you should wait up to 1 minute before
Nova becomes aware of available nodes after issuing these commands.
Due to how Nova interacts with Ironic driver, you should wait 1 minute
before Nova becomes aware of available nodes after issuing these commands.
Setting IPMI Credentials
~~~~~~~~~~~~~~~~~~~~~~~~
@ -384,21 +384,21 @@ Troubleshooting
Errors when starting introspection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``Refusing to introspect node <UUID> with provision state "available"
and maintenance mode off``
* *Refusing to introspect node <UUID> with provision state "available"
and maintenance mode off*
In Kilo release with *python-ironicclient* 0.5.0 or newer Ironic
defaults to reporting provision state ``AVAILABLE`` for newly enrolled
nodes. **ironic-discoverd** will refuse to conduct introspection in
this state, as such nodes are supposed to be used by Nova for scheduling.
See `Node States`_ for instructions on how to put nodes into
the correct state.
In Kilo release with *python-ironicclient* 0.5.0 or newer Ironic
defaults to reporting provision state ``AVAILABLE`` for newly enrolled
nodes. **ironic-discoverd** will refuse to conduct introspection in
this state, as such nodes are supposed to be used by Nova for scheduling.
See `Node States`_ for instructions on how to put nodes into
the correct state.
Introspection times out
~~~~~~~~~~~~~~~~~~~~~~~
There may be 3 reasons why introspection can time out after some time
(defaulting to 30 minutes):
(defaulting to 60 minutes, altered by ``timeout`` configuration option):
#. Fatal failure in processing chain before node was found in the local cache.
See `Troubleshooting data processing`_ for the hints.
@ -412,10 +412,14 @@ There may be 3 reasons why introspection can time out after some time
Troubleshooting data processing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this case **ironic-discoverd** logs should give a good idea what went wrong.
E.g. for Red Hat systems the following command will output the full log::
E.g. for RDO or Fedora the following command will output the full log::
sudo journalctl -u openstack-ironic-discoverd
.. note::
Service name and specific command might be different for other Linux
distributions.
If ``ramdisk_error`` plugin is enabled and ``ramdisk_logs_dir`` configuration
option is set, **ironic-discoverd** will store logs received from the ramdisk
to the ``ramdisk_logs_dir`` directory. This depends, however, on the ramdisk
@ -429,6 +433,20 @@ several physical networks. If the hardware vendor provides a remote console
(e.g. iDRAC for DELL), use it to connect to the machine and see what is going
on. You may need to restart introspection.
Another source of information is DHCP and TFTP server logs. Their location
depends on how the servers were installed and run. For RDO or Fedora use::
$ sudo journalctl -u openstack-ironic-discoverd-dnsmasq
The last resort is ``tcpdump`` utility. Use something like
::
$ sudo tcpdump -i any port 67 or port 68 or port 69
to watch both DHCP and TFTP traffic going through your machine. Replace
``any`` with a specific network interface to check that DHCP and TFTP
requests really reach it.
If you see node not attempting PXE boot or attempting PXE boot on the wrong
network, reboot the machine into BIOS settings and make sure that only one
relevant NIC is allowed to PXE boot.
@ -439,7 +457,8 @@ sure that:
#. network switches configuration does not prevent PXE boot requests from
propagating,
#. there is no additional firewall rules preventing access to port 67.
#. there is no additional firewall rules preventing access to port 67 on the
machine where *ironic-discoverd* and its DHCP server are installed.
If you see node receiving DHCP address and then failing to get kernel and/or
ramdisk or to boot them, make sure that: