From b164e18f080d6497e3bfffd48dab07998ab1bb13 Mon Sep 17 00:00:00 2001 From: Dmitry Tantsur Date: Mon, 23 Sep 2019 16:02:35 +0200 Subject: [PATCH] Document PXE retries Change-Id: I5937fa190e780269ffa677aa01efaa1048fa20b0 --- doc/source/install/configure-pxe.rst | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/doc/source/install/configure-pxe.rst b/doc/source/install/configure-pxe.rst index f44f8ea2ea..88b8c1b7d0 100644 --- a/doc/source/install/configure-pxe.rst +++ b/doc/source/install/configure-pxe.rst @@ -490,3 +490,29 @@ nodes will be deployed by 'grubaa64.efi', and ppc64 nodes by 'bootppc64':: # configuration per node architecture. For example: # aarch64:/opt/share/grubaa64_pxe_config.template (dict value) pxe_config_template_by_arch=aarch64:pxe_grubaa64_config.template,ppc64:pxe_ppc64_config.template + +PXE timeouts tuning +------------------- + +Because of its reliance on UDP-based protocols (DHCP and TFTP), PXE is +particularly vulnerable to random failures during the booting stage. If the +deployment ramdisk never calls back to the bare metal conductor, the build will +be aborted, and the node will be moved to the ``deploy failed`` state, after +the deploy callback timeout. This timeout can be changed via the +:oslo.config:option:`conductor.deploy_callback_timeout` configuration option. + +Starting with the Train release, the Bare Metal service can retry PXE boot if +it takes too long. The timeout is defined via +:oslo.config:option:`pxe.boot_retry_timeout` and must be smaller than the +``deploy_callback_timeout``, otherwise it will have no effect. + +For example, the following configuration sets the overall timeout to 60 +minutes, allowing two retries after 20 minutes: + +.. code-block:: ini + + [conductor] + deploy_callback_timeout = 3600 + + [pxe] + boot_retry_timeout = 1200