openstack-manuals/doc/admin-guide/source/baremetal.rst
Joseph Robinson 2ce5b11b1a [User Guides] Rename Admin-Guide-Cloud to Admin-Guide
This patch changes the name of the Admin-Guide from the Cloud
Admin Guide to the Administrator guide. This affects the
filename in the repository, and references to cloud administrators
within the document texts.

1.) Changing instances of 'cloud administrator'
    to 'administrator'.

2.) Change links from '/admin-guide-cloud/' to
    '/admin-guide/' within the Admin Guide.

3.) Adjust .htaccess file.

Change-Id: I7f21a710e922981aa295afc0616de36fd819b523
Implements: blueprint user-guides-reorganised
2016-04-01 19:50:13 +09:00

152 lines
4.9 KiB
ReStructuredText

.. _baremetal:
==========
Bare Metal
==========
The Bare Metal service provides physical hardware management features.
Introduction
~~~~~~~~~~~~
The Bare Metal service provides physical hardware as opposed to
virtual machines. It also provides several reference drivers, which
leverage common technologies like PXE and IPMI, to cover a wide range
of hardware. The pluggable driver architecture also allows
vendor-specific drivers to be added for improved performance or
functionality not provided by reference drivers. The Bare Metal
service makes physical servers as easy to provision as virtual
machines in a cloud, which in turn will open up new avenues for
enterprises and service providers.
System architecture
~~~~~~~~~~~~~~~~~~~
The Bare Metal service is composed of the following components:
#. An admin-only RESTful API service, by which privileged users, such
as operators and other services within the cloud control
plane, may interact with the managed bare-metal servers.
#. A conductor service, which conducts all activity related to
bare-metal deployments. Functionality is exposed via the API
service. The Bare Metal service conductor and API service
communicate via RPC.
#. Various drivers that support heterogeneous hardware, which enable
features specific to unique hardware platforms and leverage
divergent capabilities via a common API.
#. A message queue, which is a central hub for passing messages, such
as RabbitMQ. It should use the same implementation as that of the
Compute service.
#. A database for storing information about the resources. Among other
things, this includes the state of the conductors, nodes (physical
servers), and drivers.
When a user requests to boot an instance, the request is passed to the
Compute service via the Compute service API and scheduler. The Compute
service hands over this request to the Bare Metal service, where the
request passes from the Bare Metal service API, to the conductor which
will invoke a driver to successfully provision a physical server for
the user.
Bare Metal deployment
~~~~~~~~~~~~~~~~~~~~~
#. PXE deploy process
#. Agent deploy process
.. TODO Add the detail about the process of Bare Metal deployment.
Use Bare Metal
~~~~~~~~~~~~~~
#. Install the Bare Metal service.
#. Setup the Bare Metal driver in the compute node's ``nova.conf`` file.
#. Setup TFTP folder and prepare PXE boot loader file.
#. Prepare the bare metal flavor.
#. Register the nodes with correct drivers.
#. Configure the driver information.
#. Register the ports information.
#. Use nova boot to kick off the bare metal provision.
#. Check nodes' provision state and power state.
.. TODO Add the detail command line later on.
Troubleshooting
~~~~~~~~~~~~~~~
No valid host found error
-------------------------
Problem
-------
Sometimes ``/var/log/nova/nova-conductor.log`` contains the following error:
.. code-block:: console
NoValidHost: No valid host was found. There are not enough hosts available.
The message ``No valid host was found`` means that the Compute service
scheduler could not find a bare metal node suitable for booting the new
instance.
This means there will be some mismatch between resources that the Compute
service expects to find and resources that Bare Metal service advertised to
the Compute service.
Solution
--------
If you get this message, check the following:
#. Introspection should have succeeded for you before, or you should have
entered the required bare-metal node properties manually.
For each node in :command:`ironic node-list` use:
.. code-block:: console
$ ironic node-show <IRONIC-NODE-UUID>
and make sure that ``properties`` JSON field has valid values for keys
``cpus``, ``cpu_arch``, ``memory_mb`` and ``local_gb``.
#. The flavor in the Compute service that you are using does not exceed the
bare-metal node properties above for a required number of nodes. Use:
.. code-block:: console
$ nova flavor-show <FLAVOR NAME>
#. Make sure that enough nodes are in ``available`` state according to
:command:`ironic node-list`. Nodes in ``manageable`` state usually mean they
have failed introspection.
#. Make sure nodes you are going to deploy to are not in maintenance mode.
Use :command:`ironic node-list` to check. A node automatically going to
maintenance mode usually means the incorrect credentials for this node.
Check them and then remove maintenance mode:
.. code-block:: console
$ ironic node-set-maintenance <IRONIC-NODE-UUID> off
#. It takes some time for nodes information to propagate from the Bare Metal
service to the Compute service after introspection. Our tooling usually
accounts for it, but if you did some steps manually there may be a period
of time when nodes are not available to the Compute service yet. Check that
:command:`nova hypervisor-stats` correctly shows total amount of resources
in your system.