diff --git a/doc/source/index.rst b/doc/source/index.rst index 33307b2d56..26105e1925 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -38,7 +38,7 @@ User Guide ========== .. toctree:: - :maxdepth: 2 + :maxdepth: 3 user/index diff --git a/doc/source/install/creating-images.rst b/doc/source/install/creating-images.rst index 155dfa2c4a..3fd15e01d6 100644 --- a/doc/source/install/creating-images.rst +++ b/doc/source/install/creating-images.rst @@ -1,109 +1,4 @@ Create user images for the Bare Metal service ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Bare Metal provisioning requires two sets of images: the deploy images -and the user images. The :ref:`deploy images ` are used by the -Bare Metal service to prepare the bare metal server for actual OS deployment. -Whereas the user images are installed on the bare metal server to be used by -the end user. There are two types of user images: - -*partition images* - contain only the contents of the root partition. Additionally, two more - images are used together with them: an image with a kernel and with - an initramfs. - - .. warning:: - To use partition images with local boot, Grub2 must be installed on - them. - -*whole disk images* - contain a complete partition table with one or more partitions. - - .. warning:: - The kernel/initramfs pair must not be used with whole disk images, - otherwise they'll be mistaken for partition images. - -Many distributions publish their own cloud images. These are usually whole disk -images that are built for legacy boot mode (not UEFI), with Ubuntu being an -exception (they publish images that work in both modes). - -Building user images -^^^^^^^^^^^^^^^^^^^^ - -disk-image-builder ------------------- - -The `disk-image-builder`_ can be used to create user images required for -deployment and the actual OS which the user is going to run. - -- Install diskimage-builder package (use virtualenv, if you don't - want to install anything globally): - - .. code-block:: console - - # pip install diskimage-builder - -- Build the image your users will run (Ubuntu image has been taken as - an example): - - - Partition images - - .. code-block:: console - - $ disk-image-create ubuntu baremetal dhcp-all-interfaces grub2 -o my-image - - - Whole disk images - - .. code-block:: console - - $ disk-image-create ubuntu vm dhcp-all-interfaces -o my-image - - … with an EFI partition: - - .. code-block:: console - - $ disk-image-create ubuntu vm block-device-efi dhcp-all-interfaces -o my-image - -The partition image command creates ``my-image.qcow2``, -``my-image.vmlinuz`` and ``my-image.initrd`` files. The ``grub2`` element -in the partition image creation command is only needed if local boot will -be used to deploy ``my-image.qcow2``, otherwise the images -``my-image.vmlinuz`` and ``my-image.initrd`` will be used for PXE booting -after deploying the bare metal with ``my-image.qcow2``. For whole disk images -only the main image is used. - -If you want to use Fedora image, replace ``ubuntu`` with ``fedora`` in the -chosen command. - -.. _disk-image-builder: https://docs.openstack.org/diskimage-builder/latest/ - -Virtual machine ---------------- - -Virtual machine software can also be used to build user images. There are -different software options available, qemu-kvm is usually a good choice on -linux platform, it supports emulating many devices and even building images -for architectures other than the host machine by software emulation. -VirtualBox is another good choice for non-linux host. - -The procedure varies depending on the software used, but the steps for -building an image are similar, the user creates a virtual machine, and -installs the target system just like what is done for a real hardware. The -system can be highly customized like partition layout, drivers or software -shipped, etc. - -Usually libvirt and its management tools are used to make interaction with -qemu-kvm easier, for example, to create a virtual machine with -``virt-install``:: - - $ virt-install --name centos8 --ram 4096 --vcpus=2 -f centos8.qcow2 \ - > --cdrom CentOS-8-x86_64-1905-dvd1.iso - -Graphic frontend like ``virt-manager`` can also be utilized. - -The disk file can be used as user image after the system is set up and powered -off. The path of the disk file varies depending on the software used, usually -it's stored in a user-selected part of the local file system. For qemu-kvm or -GUI frontend building upon it, it's typically stored at -``/var/lib/libvirt/images``. - +The content has been migrated, please see :doc:`/user/creating-images`. diff --git a/doc/source/install/index.rst b/doc/source/install/index.rst index 0aec40d320..373aef2796 100644 --- a/doc/source/install/index.rst +++ b/doc/source/install/index.rst @@ -15,7 +15,6 @@ It contains the following sections: get_started.rst refarch/index install.rst - creating-images.rst deploy-ramdisk.rst configure-integration.rst setup-drivers.rst @@ -25,3 +24,8 @@ It contains the following sections: advanced.rst troubleshooting.rst next-steps.rst + +.. toctree:: + :hidden: + + creating-images.rst diff --git a/doc/source/install/standalone.rst b/doc/source/install/standalone.rst index 66e4879335..1dcf66e790 100644 --- a/doc/source/install/standalone.rst +++ b/doc/source/install/standalone.rst @@ -10,4 +10,11 @@ the bare metal API directly, not though OpenStack Compute. standalone/configure standalone/enrollment + +Once the installation is done, please see :doc:`/user/deploy` for information +on how to deploy bare metal machines. + +.. toctree:: + :hidden: + standalone/deploy diff --git a/doc/source/install/standalone/deploy.rst b/doc/source/install/standalone/deploy.rst index d909e4162d..6e9900c93a 100644 --- a/doc/source/install/standalone/deploy.rst +++ b/doc/source/install/standalone/deploy.rst @@ -1,212 +1,4 @@ Deploying ========= -Populating instance_info ------------------------- - -Image information -~~~~~~~~~~~~~~~~~ - -You need to specify image information in the node's ``instance_info`` -(see :doc:`../creating-images`): - -* ``image_source`` - URL of the whole disk or root partition image, - mandatory. - -* ``root_gb`` - size of the root partition, required for partition images. - - .. note:: - Older versions of the Bare Metal service used to require a positive - integer for ``root_gb`` even for whole-disk images. You may want to set - it for compatibility. - -* ``image_checksum`` - MD5 checksum of the image specified by - ``image_source``, only required for ``http://`` images when using - :ref:`direct-deploy`. - - .. note:: - Additional checksum support exists via the ``image_os_hash_algo`` and - ``image_os_hash_value`` fields. They may be used instead of the - ``image_checksum`` field. - - .. warning:: - If your operating system is running in FIPS 140-2 mode, MD5 will not be - available, and you **must** use SHA256 or another modern algorithm. - - Starting with the Stein release of ironic-python-agent can also be a URL - to a checksums file, e.g. one generated with: - - .. code-block:: shell - - cd /path/to/http/root - md5sum *.img > checksums - -* ``kernel``, ``ramdisk`` - HTTP(s) or file URLs of the kernel and - initramfs of the target OS. Must be added **only** for partition images. - -For example: - -.. code-block:: shell - - baremetal node set $NODE_UUID \ - --instance-info image_source=$IMG \ - --instance-info image_checksum=$MD5HASH \ - --instance-info kernel=$KERNEL \ - --instance-info ramdisk=$RAMDISK \ - --instance-info root_gb=10 - -With a SHA256 hash: - -.. code-block:: shell - - baremetal node set $NODE_UUID \ - --instance-info image_source=$IMG \ - --instance-info image_os_hash_algo=sha256 \ - --instance-info image_os_hash_value=$SHA256HASH \ - --instance-info kernel=$KERNEL \ - --instance-info ramdisk=$RAMDISK \ - --instance-info root_gb=10 - -With a whole disk image: - -.. code-block:: shell - - baremetal node set $NODE_UUID \ - --instance-info image_source=$IMG \ - --instance-info image_checksum=$MD5HASH - -.. note:: - For iLO drivers, fields that should be provided are: - - * ``ilo_deploy_iso`` under ``driver_info``; - - * ``ilo_boot_iso``, ``image_source``, ``root_gb`` under ``instance_info``. - -When using low RAM nodes with ``http://`` images that are not in the RAW -format, you may want them cached locally, converted to raw and served from -the conductor's HTTP server: - -.. code-block:: shell - - baremetal node set $NODE_UUID --instance-info image_download_source=local - -For software RAID with whole-disk images, the root UUID of the root -partition has to be provided so that the bootloader can be correctly -installed: - -.. code-block:: shell - - baremetal node set $NODE_UUID --instance-info image_rootfs_uuid= - -Capabilities -~~~~~~~~~~~~ - -* :ref:`Boot mode ` can be specified per instance: - - .. code-block:: shell - - baremetal node set $NODE_UUID \ - --instance-info capabilities='{"boot_mode": "uefi"}' - - Otherwise, the ``boot_mode`` capability from the node's ``properties`` will - be used. - - .. warning:: - The two settings must not contradict each other. - - .. note:: - This capability was introduced in the Wallaby release series, - previously ironic used a separate ``instance_info/deploy_boot_mode`` - field instead. - -* To override the :ref:`boot option ` used for - this instance, set the ``boot_option`` capability: - - .. code-block:: shell - - baremetal node set $NODE_UUID \ - --instance-info capabilities='{"boot_option": "local"}' - -* Starting with the Ussuri release, you can set :ref:`root device hints - ` per instance: - - .. code-block:: shell - - baremetal node set $NODE_UUID \ - --instance-info root_device='{"wwn": "0x4000cca77fc4dba1"}' - - This setting overrides any previous setting in ``properties`` and will be - removed on undeployment. - -Overriding a hardware interface -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Non-admins with temporary access to a node, may wish to specify different node -interfaces. However, allowing them to set these interface values directly on -the node is problematic, as there is no automated way to ensure that the -original interface values are restored. - -In order to temporarily override a hardware interface, simply set the -appropriate value in ``instance_info``. For example, if you'd like to -override a node's storage interface, run the following: - -.. code-block:: shell - - baremetal node set $NODE_UUID --instance-info storage_interface=cinder - -``instance_info`` values persist until after a node is cleaned. - -.. note:: - This feature is available starting with the Wallaby release. - -Deployment ----------- - -#. Validate that all parameters are correct: - - .. code-block:: console - - $ baremetal node validate $NODE_UUID - +------------+--------+----------------------------------------------------------------+ - | Interface | Result | Reason | - +------------+--------+----------------------------------------------------------------+ - | boot | True | | - | console | False | Missing 'ipmi_terminal_port' parameter in node's driver_info. | - | deploy | True | | - | inspect | True | | - | management | True | | - | network | True | | - | power | True | | - | raid | True | | - | storage | True | | - +------------+--------+----------------------------------------------------------------+ - -#. Now you can start the deployment, run: - - .. code-block:: shell - - baremetal node deploy $NODE_UUID - -#. You can provide a configdrive as a JSON or as an ISO image, e.g.: - - .. code-block:: shell - - baremetal node deploy $NODE_UUID \ - --config-drive '{"meta_data": {"public_keys": {"0": "ssh key contents"}}}' - - See :doc:`/install/configdrive` for details. - -#. Starting with the Wallaby release you can also request custom deploy steps, - see :ref:`standalone-deploy-steps` for details. - -Ramdisk booting ---------------- - -Advanced operators, specifically ones working with ephemeral workloads, -may find it more useful to explicitly treat a node as one that would always -boot from a Ramdisk. See :doc:`/admin/ramdisk-boot` for details. - -Other references ----------------- - -* :ref:`local-boot-without-compute` +The content has been migrated, please see :doc:`/user/deploy`. diff --git a/doc/source/user/architecture.rst b/doc/source/user/architecture.rst new file mode 100644 index 0000000000..714c9e16e4 --- /dev/null +++ b/doc/source/user/architecture.rst @@ -0,0 +1,303 @@ +================================ +Understanding Bare Metal service +================================ + +.. TODO: this file needs to be cleaned up + +Why Provision Bare Metal +======================== + +Here are a few use-cases for bare metal (physical server) provisioning in +cloud; there are doubtless many more interesting ones: + +- High-performance computing clusters +- Computing tasks that require access to hardware devices which can't be + virtualized +- Database hosting (some databases run poorly in a hypervisor) +- Single tenant, dedicated hardware for performance, security, dependability + and other regulatory requirements +- Or, rapidly deploying a cloud infrastructure + +Conceptual Architecture +======================= + +The following diagram shows the relationships and how all services come into +play during the provisioning of a physical server. (Note that Ceilometer and +Swift can be used with Ironic, but are missing from this diagram.) + + +.. figure:: ../images/conceptual_architecture.png + :alt: ConceptualArchitecture + + +Key Technologies for Bare Metal Hosting +======================================= + +Preboot Execution Environment (PXE) +----------------------------------- +PXE is part of the Wired for Management (WfM) specification developed by Intel +and Microsoft. The PXE enables system's BIOS and network interface card (NIC) +to bootstrap a computer from the network in place of a disk. Bootstrapping is +the process by which a system loads the OS into local memory so that it can be +executed by the processor. This capability of allowing a system to boot over a +network simplifies server deployment and server management for administrators. + +Dynamic Host Configuration Protocol (DHCP) +------------------------------------------ +DHCP is a standardized networking protocol used on Internet Protocol (IP) +networks for dynamically distributing network configuration parameters, such +as IP addresses for interfaces and services. Using PXE, the BIOS uses DHCP to +obtain an IP address for the network interface and to locate the server that +stores the network bootstrap program (NBP). + +Network Bootstrap Program (NBP) +------------------------------- +NBP is equivalent to GRUB (GRand Unified Bootloader) or LILO (LInux LOader) - +loaders which are traditionally used in local booting. Like the boot program +in a hard drive environment, the NBP is responsible for loading the OS kernel +into memory so that the OS can be bootstrapped over a network. + +Trivial File Transfer Protocol (TFTP) +------------------------------------- +TFTP is a simple file transfer protocol that is generally used for automated +transfer of configuration or boot files between machines in a local +environment. In a PXE environment, TFTP is used to download NBP over the +network using information from the DHCP server. + +Intelligent Platform Management Interface (IPMI) +------------------------------------------------ +IPMI is a standardized computer system interface used by system administrators +for out-of-band management of computer systems and monitoring of their +operation. It is a method to manage systems that may be unresponsive or powered +off by using only a network connection to the hardware rather than to an +operating system. + +.. _understanding-deployment: + +Understanding Bare Metal Deployment +=================================== + +What happens when a boot instance request comes in? The below diagram walks +through the steps involved during the provisioning of a bare metal instance. + +These pre-requisites must be met before the deployment process: + +* Dependent packages to be configured on the Bare Metal service node(s) + where ironic-conductor is running like tftp-server, ipmi, syslinux etc for + bare metal provisioning. +* Nova must be configured to make use of the bare metal service endpoint + and compute driver should be configured to use ironic driver on the Nova + compute node(s). +* Flavors to be created for the available hardware. Nova must know the flavor + to boot from. +* Images to be made available in Glance. Listed below are some image types + required for successful bare metal deployment: + + - bm-deploy-kernel + - bm-deploy-ramdisk + - user-image + - user-image-vmlinuz + - user-image-initrd + +* Hardware to be enrolled via the bare metal API service. + +Deploy Process +-------------- + +This describes a typical bare metal node deployment within OpenStack using PXE +to boot the ramdisk. Depending on the ironic driver interfaces used, some of +the steps might be marginally different, however the majority of them will +remain the same. + +#. A boot instance request comes in via the Nova API, through the message + queue to the Nova scheduler. + +#. Nova scheduler applies filters and finds the eligible hypervisor. The nova + scheduler also uses the flavor's ``extra_specs``, such as ``cpu_arch``, to + match the target physical node. + +#. Nova compute manager claims the resources of the selected hypervisor. + +#. Nova compute manager creates (unbound) tenant virtual interfaces (VIFs) in + the Networking service according to the network interfaces requested in the + nova boot request. A caveat here is, the MACs of the ports are going to be + randomly generated, and will be updated when the VIF is attached to some + node to correspond to the node network interface card's (or bond's) MAC. + +#. A spawn task is created by the nova compute which contains all + the information such as which image to boot from etc. It invokes the + ``driver.spawn`` from the virt layer of Nova compute. During the spawn + process, the virt driver does the following: + + #. Updates the target ironic node with the information about deploy image, + instance UUID, requested capabilities and various flavor properties. + + #. Validates node's power and deploy interfaces, by calling the ironic API. + + #. Attaches the previously created VIFs to the node. Each neutron port can + be attached to any ironic port or port group, with port groups having + higher priority than ports. On ironic side, this work is done by the + network interface. Attachment here means saving the VIF identifier + into ironic port or port group and updating VIF MAC to match the port's + or port group's MAC, as described in bullet point 4. + + #. Generates config drive, if requested. + +#. Nova's ironic virt driver issues a deploy request via the Ironic API to the + Ironic conductor servicing the bare metal node. + +#. Virtual interfaces are plugged in and Neutron API updates DHCP port to + set PXE/TFTP options. In case of using ``neutron`` network interface, + ironic creates separate provisioning ports in the Networking service, while + in case of ``flat`` network interface, the ports created by nova are used + both for provisioning and for deployed instance networking. + +#. The ironic node's boot interface prepares (i)PXE configuration and caches + deploy kernel and ramdisk. + +#. The ironic node's management interface issues commands to enable network + boot of a node. + +#. The ironic node's deploy interface caches the instance image, kernel and + ramdisk if needed (it is needed in case of netboot for example). + +#. The ironic node's power interface instructs the node to power on. + +#. The node boots the deploy ramdisk. + +#. Depending on the exact driver used, the deploy ramdisk downloads the image + from a URL (:ref:`direct-deploy`) or the conductor uses SSH to execute + commands (:ref:`ansible-deploy`). The URL can be generated by Swift + API-compatible object stores, for example Swift itself or RadosGW, or + provided by a user. + + The image deployment is done. + +#. The node's boot interface switches pxe config to refer to instance images + (or, in case of local boot, sets boot device to disk), and asks the ramdisk + agent to soft power off the node. If the soft power off by the ramdisk agent + fails, the bare metal node is powered off via IPMI/BMC call. + +#. The deploy interface triggers the network interface to remove provisioning + ports if they were created, and binds the tenant ports to the node if not + already bound. Then the node is powered on. + + .. note:: There are 2 power cycles during bare metal deployment; the + first time the node is powered-on when ramdisk is booted, the + second time after the image is deployed. + +#. The bare metal node's provisioning state is updated to ``active``. + +Below is the diagram that describes the above process. + +.. graphviz:: + + digraph "Deployment Steps" { + + node [shape=box, style=rounded, fontsize=10]; + edge [fontsize=10]; + + /* cylinder shape works only in graphviz 2.39+ */ + { rank=same; node [shape=cylinder]; "Nova DB"; "Ironic DB"; } + { rank=same; "Nova API"; "Ironic API"; } + { rank=same; "Nova Message Queue"; "Ironic Message Queue"; } + { rank=same; "Ironic Conductor"; "TFTP Server"; } + { rank=same; "Deploy Interface"; "Boot Interface"; "Power Interface"; + "Management Interface"; } + { rank=same; "Glance"; "Neutron"; } + "Bare Metal Nodes" [shape=box3d]; + + "Nova API" -> "Nova Message Queue" [label=" 1"]; + "Nova Message Queue" -> "Nova Conductor" [dir=both]; + "Nova Message Queue" -> "Nova Scheduler" [label=" 2"]; + "Nova Conductor" -> "Nova DB" [dir=both, label=" 3"]; + "Nova Message Queue" -> "Nova Compute" [dir=both]; + "Nova Compute" -> "Neutron" [label=" 4"]; + "Nova Compute" -> "Nova Ironic Virt Driver" [label=5]; + "Nova Ironic Virt Driver" -> "Ironic API" [label=6]; + "Ironic API" -> "Ironic Message Queue"; + "Ironic Message Queue" -> "Ironic Conductor" [dir=both]; + "Ironic API" -> "Ironic DB" [dir=both]; + "Ironic Conductor" -> "Ironic DB" [dir=both, label=16]; + "Ironic Conductor" -> "Boot Interface" [label="8, 14"]; + "Ironic Conductor" -> "Management Interface" [label=" 9"]; + "Ironic Conductor" -> "Deploy Interface" [label=10]; + "Deploy Interface" -> "Network Interface" [label="7, 15"]; + "Ironic Conductor" -> "Power Interface" [label=11]; + "Ironic Conductor" -> "Glance"; + "Network Interface" -> "Neutron"; + "Power Interface" -> "Bare Metal Nodes"; + "Management Interface" -> "Bare Metal Nodes"; + "TFTP Server" -> "Bare Metal Nodes" [label=12]; + "Ironic Conductor" -> "Bare Metal Nodes" [style=dotted, label=13]; + "Boot Interface" -> "TFTP Server"; + + } + +The following two examples describe what ironic is doing in more detail, +leaving out the actions performed by nova and some of the more advanced +options. + +.. _direct-deploy-example: + +Example: PXE Boot and Direct Deploy Process +--------------------------------------------- + +This process is how :ref:`direct-deploy` works. + +.. seqdiag:: + :scale: 75 + + diagram { + Nova; API; Conductor; Neutron; HTTPStore; "TFTP/HTTPd"; Node; + activation = none; + edge_length = 250; + span_height = 1; + default_note_color = white; + default_fontsize = 14; + + Nova -> API [label = "Set instance_info\n(image_source,\nroot_gb, etc.)"]; + Nova -> API [label = "Validate power and deploy\ninterfaces"]; + Nova -> API [label = "Plug VIFs to the node"]; + Nova -> API [label = "Set provision_state,\noptionally pass configdrive"]; + API -> Conductor [label = "do_node_deploy()"]; + Conductor -> Conductor [label = "Validate power and deploy interfaces"]; + Conductor -> HTTPStore [label = "Store configdrive if configdrive_use_swift \noption is set"]; + Conductor -> Node [label = "POWER OFF"]; + Conductor -> Neutron [label = "Attach provisioning network to port(s)"]; + Conductor -> Neutron [label = "Update DHCP boot options"]; + Conductor -> Conductor [label = "Prepare PXE\nenvironment for\ndeployment"]; + Conductor -> Node [label = "Set PXE boot device \nthrough the BMC"]; + Conductor -> Conductor [label = "Cache deploy\nand instance\nkernel and ramdisk"]; + Conductor -> Node [label = "REBOOT"]; + Node -> Neutron [label = "DHCP request"]; + Neutron -> Node [label = "next-server = Conductor"]; + Node -> Node [label = "Runs agent\nramdisk"]; + Node -> API [label = "lookup()"]; + API -> Node [label = "Pass UUID"]; + Node -> API [label = "Heartbeat (UUID)"]; + API -> Conductor [label = "Heartbeat"]; + Conductor -> Node [label = "Continue deploy asynchronously: Pass image, disk info"]; + Node -> HTTPStore [label = "Downloads image, writes to disk, \nwrites configdrive if present"]; + === Heartbeat periodically === + Conductor -> Node [label = "Is deploy done?"]; + Node -> Conductor [label = "Still working..."]; + === ... === + Node -> Conductor [label = "Deploy is done"]; + Conductor -> Node [label = "Install boot loader, if requested"]; + Conductor -> Neutron [label = "Update DHCP boot options"]; + Conductor -> Conductor [label = "Prepare PXE\nenvironment for\ninstance image\nif needed"]; + Conductor -> Node [label = "Set boot device either to PXE or to disk"]; + Conductor -> Node [label = "Collect ramdisk logs"]; + Conductor -> Node [label = "POWER OFF"]; + Conductor -> Neutron [label = "Detach provisioning network\nfrom port(s)"]; + Conductor -> Neutron [label = "Bind tenant port"]; + Conductor -> Node [label = "POWER ON"]; + Conductor -> Conductor [label = "Mark node as\nACTIVE"]; + } + +(From a `talk`_ and `slides`_) + +.. _talk: https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/isn-and-039t-it-ironic-the-bare-metal-cloud +.. _slides: http://www.slideshare.net/devananda1/isnt-it-ironic-managing-a-bare-metal-cloud-osl-tes-2015 + diff --git a/doc/source/user/creating-images.rst b/doc/source/user/creating-images.rst new file mode 100644 index 0000000000..6aeaaf529c --- /dev/null +++ b/doc/source/user/creating-images.rst @@ -0,0 +1,106 @@ +Creating instance images +======================== + +Bare Metal provisioning requires two sets of images: the deploy images +and the user images. The :ref:`deploy images ` are used by the +Bare Metal service to prepare the bare metal server for actual OS deployment. +Whereas the user images are installed on the bare metal server to be used by +the end user. There are two types of user images: + +*partition images* + contain only the contents of the root partition. Additionally, two more + images are used together with them: an image with a kernel and with + an initramfs. + + .. warning:: + To use partition images with local boot, Grub2 must be installed on + them. + +*whole disk images* + contain a complete partition table with one or more partitions. + + .. warning:: + The kernel/initramfs pair must not be used with whole disk images, + otherwise they'll be mistaken for partition images. + +Many distributions publish their own cloud images. These are usually whole disk +images that are built for legacy boot mode (not UEFI), with Ubuntu being an +exception (they publish images that work in both modes). + +disk-image-builder +------------------ + +The `disk-image-builder`_ can be used to create user images required for +deployment and the actual OS which the user is going to run. + +- Install diskimage-builder package (use virtualenv, if you don't + want to install anything globally): + + .. code-block:: console + + # pip install diskimage-builder + +- Build the image your users will run (Ubuntu image has been taken as + an example): + + - Partition images + + .. code-block:: console + + $ disk-image-create ubuntu baremetal dhcp-all-interfaces grub2 -o my-image + + - Whole disk images + + .. code-block:: console + + $ disk-image-create ubuntu vm dhcp-all-interfaces -o my-image + + … with an EFI partition: + + .. code-block:: console + + $ disk-image-create ubuntu vm block-device-efi dhcp-all-interfaces -o my-image + +The partition image command creates ``my-image.qcow2``, +``my-image.vmlinuz`` and ``my-image.initrd`` files. The ``grub2`` element +in the partition image creation command is only needed if local boot will +be used to deploy ``my-image.qcow2``, otherwise the images +``my-image.vmlinuz`` and ``my-image.initrd`` will be used for PXE booting +after deploying the bare metal with ``my-image.qcow2``. For whole disk images +only the main image is used. + +If you want to use Fedora image, replace ``ubuntu`` with ``fedora`` in the +chosen command. + +.. _disk-image-builder: https://docs.openstack.org/diskimage-builder/latest/ + +Virtual machine +--------------- + +Virtual machine software can also be used to build user images. There are +different software options available, qemu-kvm is usually a good choice on +linux platform, it supports emulating many devices and even building images +for architectures other than the host machine by software emulation. +VirtualBox is another good choice for non-linux host. + +The procedure varies depending on the software used, but the steps for +building an image are similar, the user creates a virtual machine, and +installs the target system just like what is done for a real hardware. The +system can be highly customized like partition layout, drivers or software +shipped, etc. + +Usually libvirt and its management tools are used to make interaction with +qemu-kvm easier, for example, to create a virtual machine with +``virt-install``:: + + $ virt-install --name centos8 --ram 4096 --vcpus=2 -f centos8.qcow2 \ + > --cdrom CentOS-8-x86_64-1905-dvd1.iso + +Graphic frontend like ``virt-manager`` can also be utilized. + +The disk file can be used as user image after the system is set up and powered +off. The path of the disk file varies depending on the software used, usually +it's stored in a user-selected part of the local file system. For qemu-kvm or +GUI frontend building upon it, it's typically stored at +``/var/lib/libvirt/images``. + diff --git a/doc/source/user/deploy.rst b/doc/source/user/deploy.rst new file mode 100644 index 0000000000..11911a90b9 --- /dev/null +++ b/doc/source/user/deploy.rst @@ -0,0 +1,217 @@ +Deploying with Bare Metal service +================================= + +This guide explains how to use Ironic to deploy nodes without any front-end +service, such as OpenStack Compute (nova) or Metal3_. + +.. _Metal3: http://metal3.io/ + +Populating instance_info +------------------------ + +Image information +~~~~~~~~~~~~~~~~~ + +You need to specify image information in the node's ``instance_info`` +(see :doc:`/user/creating-images`): + +* ``image_source`` - URL of the whole disk or root partition image, + mandatory. + +* ``root_gb`` - size of the root partition, required for partition images. + + .. note:: + Older versions of the Bare Metal service used to require a positive + integer for ``root_gb`` even for whole-disk images. You may want to set + it for compatibility. + +* ``image_checksum`` - MD5 checksum of the image specified by + ``image_source``, only required for ``http://`` images when using + :ref:`direct-deploy`. + + .. note:: + Additional checksum support exists via the ``image_os_hash_algo`` and + ``image_os_hash_value`` fields. They may be used instead of the + ``image_checksum`` field. + + .. warning:: + If your operating system is running in FIPS 140-2 mode, MD5 will not be + available, and you **must** use SHA256 or another modern algorithm. + + Starting with the Stein release of ironic-python-agent can also be a URL + to a checksums file, e.g. one generated with: + + .. code-block:: shell + + cd /path/to/http/root + md5sum *.img > checksums + +* ``kernel``, ``ramdisk`` - HTTP(s) or file URLs of the kernel and + initramfs of the target OS. Must be added **only** for partition images. + +For example: + +.. code-block:: shell + + baremetal node set $NODE_UUID \ + --instance-info image_source=$IMG \ + --instance-info image_checksum=$MD5HASH \ + --instance-info kernel=$KERNEL \ + --instance-info ramdisk=$RAMDISK \ + --instance-info root_gb=10 + +With a SHA256 hash: + +.. code-block:: shell + + baremetal node set $NODE_UUID \ + --instance-info image_source=$IMG \ + --instance-info image_os_hash_algo=sha256 \ + --instance-info image_os_hash_value=$SHA256HASH \ + --instance-info kernel=$KERNEL \ + --instance-info ramdisk=$RAMDISK \ + --instance-info root_gb=10 + +With a whole disk image: + +.. code-block:: shell + + baremetal node set $NODE_UUID \ + --instance-info image_source=$IMG \ + --instance-info image_checksum=$MD5HASH + +.. note:: + For iLO drivers, fields that should be provided are: + + * ``ilo_deploy_iso`` under ``driver_info``; + + * ``ilo_boot_iso``, ``image_source``, ``root_gb`` under ``instance_info``. + +When using low RAM nodes with ``http://`` images that are not in the RAW +format, you may want them cached locally, converted to raw and served from +the conductor's HTTP server: + +.. code-block:: shell + + baremetal node set $NODE_UUID --instance-info image_download_source=local + +For software RAID with whole-disk images, the root UUID of the root +partition has to be provided so that the bootloader can be correctly +installed: + +.. code-block:: shell + + baremetal node set $NODE_UUID --instance-info image_rootfs_uuid= + +Capabilities +~~~~~~~~~~~~ + +* :ref:`Boot mode ` can be specified per instance: + + .. code-block:: shell + + baremetal node set $NODE_UUID \ + --instance-info capabilities='{"boot_mode": "uefi"}' + + Otherwise, the ``boot_mode`` capability from the node's ``properties`` will + be used. + + .. warning:: + The two settings must not contradict each other. + + .. note:: + This capability was introduced in the Wallaby release series, + previously ironic used a separate ``instance_info/deploy_boot_mode`` + field instead. + +* To override the :ref:`boot option ` used for + this instance, set the ``boot_option`` capability: + + .. code-block:: shell + + baremetal node set $NODE_UUID \ + --instance-info capabilities='{"boot_option": "local"}' + +* Starting with the Ussuri release, you can set :ref:`root device hints + ` per instance: + + .. code-block:: shell + + baremetal node set $NODE_UUID \ + --instance-info root_device='{"wwn": "0x4000cca77fc4dba1"}' + + This setting overrides any previous setting in ``properties`` and will be + removed on undeployment. + +Overriding a hardware interface +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Non-admins with temporary access to a node, may wish to specify different node +interfaces. However, allowing them to set these interface values directly on +the node is problematic, as there is no automated way to ensure that the +original interface values are restored. + +In order to temporarily override a hardware interface, simply set the +appropriate value in ``instance_info``. For example, if you'd like to +override a node's storage interface, run the following: + +.. code-block:: shell + + baremetal node set $NODE_UUID --instance-info storage_interface=cinder + +``instance_info`` values persist until after a node is cleaned. + +.. note:: + This feature is available starting with the Wallaby release. + +Deployment +---------- + +#. Validate that all parameters are correct: + + .. code-block:: console + + $ baremetal node validate $NODE_UUID + +------------+--------+----------------------------------------------------------------+ + | Interface | Result | Reason | + +------------+--------+----------------------------------------------------------------+ + | boot | True | | + | console | False | Missing 'ipmi_terminal_port' parameter in node's driver_info. | + | deploy | True | | + | inspect | True | | + | management | True | | + | network | True | | + | power | True | | + | raid | True | | + | storage | True | | + +------------+--------+----------------------------------------------------------------+ + +#. Now you can start the deployment, run: + + .. code-block:: shell + + baremetal node deploy $NODE_UUID + +#. You can provide a configdrive as a JSON or as an ISO image, e.g.: + + .. code-block:: shell + + baremetal node deploy $NODE_UUID \ + --config-drive '{"meta_data": {"public_keys": {"0": "ssh key contents"}}}' + + See :doc:`/install/configdrive` for details. + +#. Starting with the Wallaby release you can also request custom deploy steps, + see :ref:`standalone-deploy-steps` for details. + +Ramdisk booting +--------------- + +Advanced operators, specifically ones working with ephemeral workloads, +may find it more useful to explicitly treat a node as one that would always +boot from a Ramdisk. See :doc:`/admin/ramdisk-boot` for details. + +Other references +---------------- + +* :ref:`local-boot-without-compute` diff --git a/doc/source/user/index.rst b/doc/source/user/index.rst index 41eba082a6..806919a00c 100644 --- a/doc/source/user/index.rst +++ b/doc/source/user/index.rst @@ -1,5 +1,3 @@ -.. _user-guide: - ============================= Bare Metal Service User Guide ============================= @@ -22,301 +20,9 @@ pluggable driver architecture also allows hardware vendors to write and contribute drivers that may improve performance or add functionality not provided by the community drivers. -.. TODO: the remainder of this file needs to be cleaned up still +.. toctree:: + :maxdepth: 2 -Why Provision Bare Metal -======================== - -Here are a few use-cases for bare metal (physical server) provisioning in -cloud; there are doubtless many more interesting ones: - -- High-performance computing clusters -- Computing tasks that require access to hardware devices which can't be - virtualized -- Database hosting (some databases run poorly in a hypervisor) -- Single tenant, dedicated hardware for performance, security, dependability - and other regulatory requirements -- Or, rapidly deploying a cloud infrastructure - -Conceptual Architecture -======================= - -The following diagram shows the relationships and how all services come into -play during the provisioning of a physical server. (Note that Ceilometer and -Swift can be used with Ironic, but are missing from this diagram.) - - -.. figure:: ../images/conceptual_architecture.png - :alt: ConceptualArchitecture - - -Key Technologies for Bare Metal Hosting -======================================= - -Preboot Execution Environment (PXE) ------------------------------------ -PXE is part of the Wired for Management (WfM) specification developed by Intel -and Microsoft. The PXE enables system's BIOS and network interface card (NIC) -to bootstrap a computer from the network in place of a disk. Bootstrapping is -the process by which a system loads the OS into local memory so that it can be -executed by the processor. This capability of allowing a system to boot over a -network simplifies server deployment and server management for administrators. - -Dynamic Host Configuration Protocol (DHCP) ------------------------------------------- -DHCP is a standardized networking protocol used on Internet Protocol (IP) -networks for dynamically distributing network configuration parameters, such -as IP addresses for interfaces and services. Using PXE, the BIOS uses DHCP to -obtain an IP address for the network interface and to locate the server that -stores the network bootstrap program (NBP). - -Network Bootstrap Program (NBP) -------------------------------- -NBP is equivalent to GRUB (GRand Unified Bootloader) or LILO (LInux LOader) - -loaders which are traditionally used in local booting. Like the boot program -in a hard drive environment, the NBP is responsible for loading the OS kernel -into memory so that the OS can be bootstrapped over a network. - -Trivial File Transfer Protocol (TFTP) -------------------------------------- -TFTP is a simple file transfer protocol that is generally used for automated -transfer of configuration or boot files between machines in a local -environment. In a PXE environment, TFTP is used to download NBP over the -network using information from the DHCP server. - -Intelligent Platform Management Interface (IPMI) ------------------------------------------------- -IPMI is a standardized computer system interface used by system administrators -for out-of-band management of computer systems and monitoring of their -operation. It is a method to manage systems that may be unresponsive or powered -off by using only a network connection to the hardware rather than to an -operating system. - -.. _understanding-deployment: - -Understanding Bare Metal Deployment -=================================== - -What happens when a boot instance request comes in? The below diagram walks -through the steps involved during the provisioning of a bare metal instance. - -These pre-requisites must be met before the deployment process: - -* Dependent packages to be configured on the Bare Metal service node(s) - where ironic-conductor is running like tftp-server, ipmi, syslinux etc for - bare metal provisioning. -* Nova must be configured to make use of the bare metal service endpoint - and compute driver should be configured to use ironic driver on the Nova - compute node(s). -* Flavors to be created for the available hardware. Nova must know the flavor - to boot from. -* Images to be made available in Glance. Listed below are some image types - required for successful bare metal deployment: - - - bm-deploy-kernel - - bm-deploy-ramdisk - - user-image - - user-image-vmlinuz - - user-image-initrd - -* Hardware to be enrolled via the bare metal API service. - -Deploy Process --------------- - -This describes a typical bare metal node deployment within OpenStack using PXE -to boot the ramdisk. Depending on the ironic driver interfaces used, some of -the steps might be marginally different, however the majority of them will -remain the same. - -#. A boot instance request comes in via the Nova API, through the message - queue to the Nova scheduler. - -#. Nova scheduler applies filters and finds the eligible hypervisor. The nova - scheduler also uses the flavor's ``extra_specs``, such as ``cpu_arch``, to - match the target physical node. - -#. Nova compute manager claims the resources of the selected hypervisor. - -#. Nova compute manager creates (unbound) tenant virtual interfaces (VIFs) in - the Networking service according to the network interfaces requested in the - nova boot request. A caveat here is, the MACs of the ports are going to be - randomly generated, and will be updated when the VIF is attached to some - node to correspond to the node network interface card's (or bond's) MAC. - -#. A spawn task is created by the nova compute which contains all - the information such as which image to boot from etc. It invokes the - ``driver.spawn`` from the virt layer of Nova compute. During the spawn - process, the virt driver does the following: - - #. Updates the target ironic node with the information about deploy image, - instance UUID, requested capabilities and various flavor properties. - - #. Validates node's power and deploy interfaces, by calling the ironic API. - - #. Attaches the previously created VIFs to the node. Each neutron port can - be attached to any ironic port or port group, with port groups having - higher priority than ports. On ironic side, this work is done by the - network interface. Attachment here means saving the VIF identifier - into ironic port or port group and updating VIF MAC to match the port's - or port group's MAC, as described in bullet point 4. - - #. Generates config drive, if requested. - -#. Nova's ironic virt driver issues a deploy request via the Ironic API to the - Ironic conductor servicing the bare metal node. - -#. Virtual interfaces are plugged in and Neutron API updates DHCP port to - set PXE/TFTP options. In case of using ``neutron`` network interface, - ironic creates separate provisioning ports in the Networking service, while - in case of ``flat`` network interface, the ports created by nova are used - both for provisioning and for deployed instance networking. - -#. The ironic node's boot interface prepares (i)PXE configuration and caches - deploy kernel and ramdisk. - -#. The ironic node's management interface issues commands to enable network - boot of a node. - -#. The ironic node's deploy interface caches the instance image, kernel and - ramdisk if needed (it is needed in case of netboot for example). - -#. The ironic node's power interface instructs the node to power on. - -#. The node boots the deploy ramdisk. - -#. Depending on the exact driver used, the deploy ramdisk downloads the image - from a URL (:ref:`direct-deploy`) or the conductor uses SSH to execute - commands (:ref:`ansible-deploy`). The URL can be generated by Swift - API-compatible object stores, for example Swift itself or RadosGW, or - provided by a user. - - The image deployment is done. - -#. The node's boot interface switches pxe config to refer to instance images - (or, in case of local boot, sets boot device to disk), and asks the ramdisk - agent to soft power off the node. If the soft power off by the ramdisk agent - fails, the bare metal node is powered off via IPMI/BMC call. - -#. The deploy interface triggers the network interface to remove provisioning - ports if they were created, and binds the tenant ports to the node if not - already bound. Then the node is powered on. - - .. note:: There are 2 power cycles during bare metal deployment; the - first time the node is powered-on when ramdisk is booted, the - second time after the image is deployed. - -#. The bare metal node's provisioning state is updated to ``active``. - -Below is the diagram that describes the above process. - -.. graphviz:: - - digraph "Deployment Steps" { - - node [shape=box, style=rounded, fontsize=10]; - edge [fontsize=10]; - - /* cylinder shape works only in graphviz 2.39+ */ - { rank=same; node [shape=cylinder]; "Nova DB"; "Ironic DB"; } - { rank=same; "Nova API"; "Ironic API"; } - { rank=same; "Nova Message Queue"; "Ironic Message Queue"; } - { rank=same; "Ironic Conductor"; "TFTP Server"; } - { rank=same; "Deploy Interface"; "Boot Interface"; "Power Interface"; - "Management Interface"; } - { rank=same; "Glance"; "Neutron"; } - "Bare Metal Nodes" [shape=box3d]; - - "Nova API" -> "Nova Message Queue" [label=" 1"]; - "Nova Message Queue" -> "Nova Conductor" [dir=both]; - "Nova Message Queue" -> "Nova Scheduler" [label=" 2"]; - "Nova Conductor" -> "Nova DB" [dir=both, label=" 3"]; - "Nova Message Queue" -> "Nova Compute" [dir=both]; - "Nova Compute" -> "Neutron" [label=" 4"]; - "Nova Compute" -> "Nova Ironic Virt Driver" [label=5]; - "Nova Ironic Virt Driver" -> "Ironic API" [label=6]; - "Ironic API" -> "Ironic Message Queue"; - "Ironic Message Queue" -> "Ironic Conductor" [dir=both]; - "Ironic API" -> "Ironic DB" [dir=both]; - "Ironic Conductor" -> "Ironic DB" [dir=both, label=16]; - "Ironic Conductor" -> "Boot Interface" [label="8, 14"]; - "Ironic Conductor" -> "Management Interface" [label=" 9"]; - "Ironic Conductor" -> "Deploy Interface" [label=10]; - "Deploy Interface" -> "Network Interface" [label="7, 15"]; - "Ironic Conductor" -> "Power Interface" [label=11]; - "Ironic Conductor" -> "Glance"; - "Network Interface" -> "Neutron"; - "Power Interface" -> "Bare Metal Nodes"; - "Management Interface" -> "Bare Metal Nodes"; - "TFTP Server" -> "Bare Metal Nodes" [label=12]; - "Ironic Conductor" -> "Bare Metal Nodes" [style=dotted, label=13]; - "Boot Interface" -> "TFTP Server"; - - } - -The following two examples describe what ironic is doing in more detail, -leaving out the actions performed by nova and some of the more advanced -options. - -.. _direct-deploy-example: - -Example: PXE Boot and Direct Deploy Process ---------------------------------------------- - -This process is how :ref:`direct-deploy` works. - -.. seqdiag:: - :scale: 75 - - diagram { - Nova; API; Conductor; Neutron; HTTPStore; "TFTP/HTTPd"; Node; - activation = none; - edge_length = 250; - span_height = 1; - default_note_color = white; - default_fontsize = 14; - - Nova -> API [label = "Set instance_info\n(image_source,\nroot_gb, etc.)"]; - Nova -> API [label = "Validate power and deploy\ninterfaces"]; - Nova -> API [label = "Plug VIFs to the node"]; - Nova -> API [label = "Set provision_state,\noptionally pass configdrive"]; - API -> Conductor [label = "do_node_deploy()"]; - Conductor -> Conductor [label = "Validate power and deploy interfaces"]; - Conductor -> HTTPStore [label = "Store configdrive if configdrive_use_swift \noption is set"]; - Conductor -> Node [label = "POWER OFF"]; - Conductor -> Neutron [label = "Attach provisioning network to port(s)"]; - Conductor -> Neutron [label = "Update DHCP boot options"]; - Conductor -> Conductor [label = "Prepare PXE\nenvironment for\ndeployment"]; - Conductor -> Node [label = "Set PXE boot device \nthrough the BMC"]; - Conductor -> Conductor [label = "Cache deploy\nand instance\nkernel and ramdisk"]; - Conductor -> Node [label = "REBOOT"]; - Node -> Neutron [label = "DHCP request"]; - Neutron -> Node [label = "next-server = Conductor"]; - Node -> Node [label = "Runs agent\nramdisk"]; - Node -> API [label = "lookup()"]; - API -> Node [label = "Pass UUID"]; - Node -> API [label = "Heartbeat (UUID)"]; - API -> Conductor [label = "Heartbeat"]; - Conductor -> Node [label = "Continue deploy asynchronously: Pass image, disk info"]; - Node -> HTTPStore [label = "Downloads image, writes to disk, \nwrites configdrive if present"]; - === Heartbeat periodically === - Conductor -> Node [label = "Is deploy done?"]; - Node -> Conductor [label = "Still working..."]; - === ... === - Node -> Conductor [label = "Deploy is done"]; - Conductor -> Node [label = "Install boot loader, if requested"]; - Conductor -> Neutron [label = "Update DHCP boot options"]; - Conductor -> Conductor [label = "Prepare PXE\nenvironment for\ninstance image\nif needed"]; - Conductor -> Node [label = "Set boot device either to PXE or to disk"]; - Conductor -> Node [label = "Collect ramdisk logs"]; - Conductor -> Node [label = "POWER OFF"]; - Conductor -> Neutron [label = "Detach provisioning network\nfrom port(s)"]; - Conductor -> Neutron [label = "Bind tenant port"]; - Conductor -> Node [label = "POWER ON"]; - Conductor -> Conductor [label = "Mark node as\nACTIVE"]; - } - -(From a `talk`_ and `slides`_) - -.. _talk: https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/isn-and-039t-it-ironic-the-bare-metal-cloud -.. _slides: http://www.slideshare.net/devananda1/isnt-it-ironic-managing-a-bare-metal-cloud-osl-tes-2015 + architecture + creating-images + deploy