Reference architecture: small cloud with trusted tenants
This document describes the design of a small bare metal cloud with flat networking and conductors as part of (HA) controller nodes. Also adds missing information about rescuing network to the common reference architecture guide. Change-Id: Ifd3bfcc89263cd9810cd5cfb459ffeeaf146caf7 Story: 2001745 Task: 12108
This commit is contained in:
parent
cf0b64c4c5
commit
1ffa7571d3
@ -17,6 +17,8 @@ not support trunk ports belonging to multiple networks.
|
||||
Concepts
|
||||
========
|
||||
|
||||
.. _network-interfaces:
|
||||
|
||||
Network interfaces
|
||||
------------------
|
||||
|
||||
|
@ -328,6 +328,8 @@ distribution that uses ``systemd``:
|
||||
ip_addr
|
||||
iptables
|
||||
|
||||
.. _troubleshooting-stp:
|
||||
|
||||
DHCP during PXE or iPXE is inconsistent or unreliable
|
||||
=====================================================
|
||||
|
||||
|
@ -7,6 +7,8 @@ architectures.
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _refarch-common-components:
|
||||
|
||||
Components
|
||||
----------
|
||||
|
||||
@ -46,7 +48,7 @@ components.
|
||||
|
||||
.. warning::
|
||||
The ``ironic-python-agent`` service is not intended to be used or executed
|
||||
anywhere other than a provisioning/cleaning ramdisk.
|
||||
anywhere other than a provisioning/cleaning/rescue ramdisk.
|
||||
|
||||
Hardware and drivers
|
||||
--------------------
|
||||
@ -82,6 +84,8 @@ also supports :doc:`/admin/drivers/redfish`. Several vendors
|
||||
provide more specific drivers that usually provide additional capabilities.
|
||||
Check :doc:`/admin/drivers` to find the most suitable one.
|
||||
|
||||
.. _refarch-common-boot:
|
||||
|
||||
Boot interface
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
@ -182,6 +186,8 @@ node. See :ref:`local-boot-partition-images` for details.
|
||||
Currently, network boot is used by default. However, we plan on changing it
|
||||
in the future, so it's safer to set the ``default_boot_option`` explicitly.
|
||||
|
||||
.. _refarch-common-networking:
|
||||
|
||||
Networking
|
||||
----------
|
||||
|
||||
@ -194,13 +200,20 @@ documentation. However, several considerations are common for all of them:
|
||||
not be accessible by end users, and should not have access to the internet.
|
||||
|
||||
* There has to be a *cleaning* network, which is used by nodes during
|
||||
the cleaning process. In the majority of cases, the same network should
|
||||
be used for cleaning and provisioning for simplicity.
|
||||
the cleaning process.
|
||||
|
||||
Unless noted otherwise, everything in these sections apply to both networks.
|
||||
* There should be a *rescuing* network, which is used by nodes during
|
||||
the rescue process. It can be skipped if the rescue process is not supported.
|
||||
|
||||
.. note::
|
||||
In the majority of cases, the same network should be used for cleaning,
|
||||
provisioning and rescue for simplicity.
|
||||
|
||||
Unless noted otherwise, everything in these sections apply to all three
|
||||
networks.
|
||||
|
||||
* The baremetal nodes must have access to the Bare Metal API while connected
|
||||
to the provisioning/cleaning network.
|
||||
to the provisioning/cleaning/rescuing network.
|
||||
|
||||
.. note::
|
||||
Only two endpoints need to be exposed there::
|
||||
@ -213,25 +226,32 @@ Unless noted otherwise, everything in these sections apply to both networks.
|
||||
|
||||
* If the ``pxe`` boot interface (or any boot interface based on it) is used,
|
||||
then the baremetal nodes should have untagged (access mode) connectivity
|
||||
to the provisioning/cleaning networks. It allows PXE firmware, which does not
|
||||
support VLANs, to communicate with the services required for provisioning.
|
||||
to the provisioning/cleaning/rescuing networks. It allows PXE firmware, which
|
||||
does not support VLANs, to communicate with the services required
|
||||
for provisioning.
|
||||
|
||||
.. note::
|
||||
It depends on the *network interface* whether the Bare Metal service will
|
||||
handle it automatically. Check the networking documentation for the
|
||||
specific architecture.
|
||||
|
||||
Sometimes it may be necessary to disable the spanning tree protocol delay on
|
||||
the switch - see :ref:`troubleshooting-stp`.
|
||||
|
||||
* The Baremetal nodes need to have access to any services required for
|
||||
provisioning/cleaning, while connected to the provisioning/cleaning network.
|
||||
This may include:
|
||||
provisioning/cleaning/rescue, while connected to the
|
||||
provisioning/cleaning/rescuing network. This may include:
|
||||
|
||||
* a TFTP server for PXE boot and also an HTTP server when iPXE is enabled
|
||||
* either an HTTP server or the Object Storage service in case of the
|
||||
``direct`` deploy interface and some virtual media boot interfaces
|
||||
|
||||
* The Baremetal Conductors need to have access to the booted baremetal nodes
|
||||
during provisioning/cleaning. A conductor communicates with an internal
|
||||
API, provided by **ironic-python-agent**, to conduct actions on nodes.
|
||||
during provisioning/cleaning/rescue. A conductor communicates with
|
||||
an internal API, provided by **ironic-python-agent**, to conduct actions
|
||||
on nodes.
|
||||
|
||||
.. _refarch-common-ha:
|
||||
|
||||
HA and Scalability
|
||||
------------------
|
||||
|
@ -10,3 +10,11 @@ to get better familiar with the concepts used in this guide.
|
||||
:maxdepth: 2
|
||||
|
||||
common
|
||||
|
||||
Scenarios
|
||||
---------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
small-cloud-trusted-tenants
|
||||
|
248
doc/source/install/refarch/small-cloud-trusted-tenants.rst
Normal file
248
doc/source/install/refarch/small-cloud-trusted-tenants.rst
Normal file
@ -0,0 +1,248 @@
|
||||
Small cloud with trusted tenants
|
||||
================================
|
||||
|
||||
Story
|
||||
-----
|
||||
|
||||
As an operator I would like to build a small cloud with both virtual and bare
|
||||
metal instances or add bare metal provisioning to my existing small or medium
|
||||
scale single-site OpenStack cloud. The expected number of bare metal machines
|
||||
is less than 100, and the rate of provisioning and unprovisioning is expected
|
||||
to be low. All users of my cloud are trusted by me to not conduct malicious
|
||||
actions towards each other or the cloud infrastructure itself.
|
||||
|
||||
As a user I would like to occasionally provision bare metal instances through
|
||||
the Compute API by selecting an appropriate Compute flavor. I would like
|
||||
to be able to boot them from images provided by the Image service or from
|
||||
volumes provided by the Volume service.
|
||||
|
||||
Components
|
||||
----------
|
||||
|
||||
This architecture assumes `an OpenStack installation`_ with the following
|
||||
components participating in the bare metal provisioning:
|
||||
|
||||
* The `Compute service`_ manages bare metal instances.
|
||||
|
||||
* The `Networking service`_ provides DHCP for bare metal instances.
|
||||
|
||||
* The `Image service`_ provides images for bare metal instances.
|
||||
|
||||
The following services can be optionally used by the Bare Metal service:
|
||||
|
||||
* The `Volume service`_ provides volumes to boot bare metal instances from.
|
||||
|
||||
* The `Bare Metal Introspection service`_ simplifies enrolling new bare metal
|
||||
machines by conducting in-band introspection.
|
||||
|
||||
Node roles
|
||||
----------
|
||||
|
||||
An OpenStack installation in this guide has at least these three types of
|
||||
nodes:
|
||||
|
||||
* A *controller* node hosts the control plane services.
|
||||
|
||||
* A *compute* node runs the virtual machines and hosts a subset of Compute
|
||||
and Networking components.
|
||||
|
||||
* A *block storage* node provides persistent storage space for both virtual
|
||||
and bare metal nodes.
|
||||
|
||||
The *compute* and *block storage* nodes are configured as described in the
|
||||
installation guides of the `Compute service`_ and the `Volume service`_
|
||||
respectively. The *controller* nodes host the Bare Metal service components.
|
||||
|
||||
Networking
|
||||
----------
|
||||
|
||||
The networking architecture will highly depend on the exact operating
|
||||
requirements. This guide expects the following existing networks:
|
||||
*control plane*, *storage* and *public*. Additionally, two more networks
|
||||
will be needed specifically for bare metal provisioning: *bare metal* and
|
||||
*management*.
|
||||
|
||||
.. TODO(dtantsur): describe the storage network?
|
||||
|
||||
.. TODO(dtantsur): a nice picture to illustrate the layout
|
||||
|
||||
Control plane network
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The *control plane network* is the network where OpenStack control plane
|
||||
services provide their public API.
|
||||
|
||||
The Bare Metal API will be served to the operators and to the Compute service
|
||||
through this network.
|
||||
|
||||
Public network
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The *public network* is used in a typical OpenStack deployment to create
|
||||
floating IPs for outside access to instances. Its role is the same for a bare
|
||||
metal deployment.
|
||||
|
||||
.. note::
|
||||
Since, as explained below, bare metal nodes will be put on a flat provider
|
||||
network, it is also possible to organize direct access to them, without
|
||||
using floating IPs and bypassing the Networking service completely.
|
||||
|
||||
Bare metal network
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The *Bare metal network* is a dedicated network for bare metal nodes managed by
|
||||
the Bare Metal service.
|
||||
|
||||
This architecture uses :ref:`flat bare metal networking <network-interfaces>`,
|
||||
in which both tenant traffic and technical traffic related to the Bare Metal
|
||||
service operation flow through this one network. Specifically, this network
|
||||
will serve as the *provisioning*, *cleaning* and *rescuing* network. It will
|
||||
also be used for introspection via the Bare Metal Introspection service.
|
||||
See :ref:`common networking considerations <refarch-common-networking>` for
|
||||
an in-depth explanation of the networks used by the Bare Metal service.
|
||||
|
||||
DHCP and boot parameters will be provided on this network by the Networking
|
||||
service's DHCP agents.
|
||||
|
||||
For booting from volumes this network has to have a route to
|
||||
the *storage network*.
|
||||
|
||||
Management network
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
*Management network* is an independent network on which BMCs of the bare
|
||||
metal nodes are located.
|
||||
|
||||
The ``ironic-conductor`` process needs access to this network. The tenants
|
||||
of the bare metal nodes must not have access to it.
|
||||
|
||||
.. note::
|
||||
The :ref:`direct deploy interface <direct-deploy>` and certain
|
||||
:doc:`/admin/drivers` require the *management network* to have access
|
||||
to the Object storage service backend.
|
||||
|
||||
Controllers
|
||||
-----------
|
||||
|
||||
A *controller* hosts the OpenStack control plane services as described in the
|
||||
`control plane design guide`_. While this architecture allows using
|
||||
*controllers* in a non-HA configuration, it is recommended to have at least
|
||||
three of them for HA. See :ref:`refarch-common-ha` for more details.
|
||||
|
||||
Bare Metal services
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The following components of the Bare Metal service are installed on a
|
||||
*controller* (see :ref:`components of the Bare Metal service
|
||||
<refarch-common-components>`):
|
||||
|
||||
* The Bare Metal API service either as a WSGI application or the ``ironic-api``
|
||||
process. Typically, a load balancer, such as HAProxy, spreads the load
|
||||
between the API instances on the *controllers*.
|
||||
|
||||
The API has to be served on the *control plane network*. Additionally,
|
||||
it has to be exposed to the *bare metal network* for the ramdisk callback
|
||||
API.
|
||||
|
||||
* The ``ironic-conductor`` process. These processes work in active/active HA
|
||||
mode as explained in :ref:`refarch-common-ha`, thus they can be installed on
|
||||
all *controllers*. Each will handle a subset of bare metal nodes.
|
||||
|
||||
The ``ironic-conductor`` processes have to have access to the following
|
||||
networks:
|
||||
|
||||
* *control plane* for interacting with other services
|
||||
* *management* for contacting node's BMCs
|
||||
* *bare metal* for contacting deployment, cleaning or rescue ramdisks
|
||||
|
||||
* TFTP and HTTP service for booting the nodes. Each ``ironic-conductor``
|
||||
process has to have a matching TFTP and HTTP service. They should be exposed
|
||||
only to the *bare metal network* and must not be behind a load balancer.
|
||||
|
||||
* The ``nova-compute`` process (from the Compute service). These processes work
|
||||
in active/active HA mode when dealing with bare metal nodes, thus they can be
|
||||
installed on all *controllers*. Each will handle a subset of bare metal
|
||||
nodes.
|
||||
|
||||
.. note::
|
||||
There is no 1-1 mapping between ``ironic-conductor`` and ``nova-compute``
|
||||
processes, as they communicate only through the Bare Metal API service.
|
||||
|
||||
* The networking-baremetal_ ML2 plugin should be loaded into the Networking
|
||||
service to assist with binding bare metal ports.
|
||||
|
||||
The ironic-neutron-agent_ service should be started as well.
|
||||
|
||||
* If the Bare Metal introspection is used, its ``ironic-inspector`` process
|
||||
has to be installed on all *controllers*. Each such process works as both
|
||||
Bare Metal Introspection API and conductor service. A load balancer should
|
||||
be used to spread the API load between *controllers*.
|
||||
|
||||
The API has to be served on the *control plane network*. Additionally,
|
||||
it has to be exposed to the *bare metal network* for the ramdisk callback
|
||||
API.
|
||||
|
||||
.. TODO(dtantsur): a nice picture to illustrate the above
|
||||
|
||||
Shared services
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
A *controller* also hosts two services required for the normal operation
|
||||
of OpenStack:
|
||||
|
||||
* Database service (MySQL/MariaDB is typically used, but other
|
||||
enterprise-grade database solutions can be used as well).
|
||||
|
||||
All Bare Metal service components need access to the database service.
|
||||
|
||||
* Message queue service (RabbitMQ is typically used, but other
|
||||
enterprise-grade message queue brokers can be used as well).
|
||||
|
||||
Both Bare Metal API (WSGI application or ``ironic-api`` process) and
|
||||
the ``ironic-conductor`` processes need access to the message queue service.
|
||||
The Bare Metal Introspection service does not need it.
|
||||
|
||||
.. note::
|
||||
These services are required for all OpenStack services. If you're adding
|
||||
the Bare Metal service to your cloud, you may reuse the existing
|
||||
database and messaging queue services.
|
||||
|
||||
Bare metal nodes
|
||||
----------------
|
||||
|
||||
Each bare metal node must be capable of booting from network, virtual media
|
||||
or other boot technology supported by the Bare Metal service as explained
|
||||
in :ref:`refarch-common-boot`. Each node must have one NIC on the *bare metal
|
||||
network*, and this NIC (and **only** it) must be configured to be able to boot
|
||||
from network. This is usually done in the *BIOS setup* or a similar firmware
|
||||
configuration utility. There is no need to alter the boot order, as it is
|
||||
managed by the Bare Metal service. Other NICs, if present, will not be managed
|
||||
by OpenStack.
|
||||
|
||||
The NIC on the *bare metal network* should have untagged connectivity to it,
|
||||
since PXE firmware usually does not support VLANs - see
|
||||
:ref:`refarch-common-networking` for details.
|
||||
|
||||
Storage
|
||||
-------
|
||||
|
||||
If your hardware **and** its bare metal :doc:`driver </admin/drivers>` support
|
||||
booting from remote volumes, please check the driver documentation for
|
||||
information on how to enable it. It may include routing *management* and/or
|
||||
*bare metal* networks to the *storage network*.
|
||||
|
||||
In case of the standard :ref:`pxe-boot`, booting from remote volumes is done
|
||||
via iPXE. In that case, the Volume storage backend must support iSCSI_
|
||||
protocol, and the *bare metal network* has to have a route to the *storage
|
||||
network*. See :doc:`/admin/boot-from-volume` for more details.
|
||||
|
||||
.. _an OpenStack installation: https://docs.openstack.org/arch-design/use-cases/use-case-general-compute.html
|
||||
.. _Compute service: https://docs.openstack.org/nova/latest/
|
||||
.. _Networking service: https://docs.openstack.org/neutron/latest/
|
||||
.. _Image service: https://docs.openstack.org/glance/latest/
|
||||
.. _Volume service: https://docs.openstack.org/cinder/latest/
|
||||
.. _Bare Metal Introspection service: https://docs.openstack.org/ironic-inspector/latest/
|
||||
.. _control plane design guide: https://docs.openstack.org/arch-design/design-control-plane.html
|
||||
.. _networking-baremetal: https://docs.openstack.org/networking-baremetal/latest/
|
||||
.. _ironic-neutron-agent: https://docs.openstack.org/networking-baremetal/latest/install/index.html#configure-ironic-neutron-agent
|
||||
.. _iSCSI: https://en.wikipedia.org/wiki/ISCSI
|
Loading…
Reference in New Issue
Block a user