From 28b97db915b0773e368889f97aa121588b588cda Mon Sep 17 00:00:00 2001
From: Arne Wiebalck <Arne.Wiebalck@cern.ch>
Date: Thu, 19 Aug 2021 10:30:07 +0200
Subject: [PATCH] [doc] Update power sync documentation

Add some notes on potential UDP packet loss during conductor/BMC
power sync with IPMI, the corresponding increase in retries and
how to mitigate.

Change-Id: I4bc9a8f6f7f4da7f719a65f76ae97b1244701ee9
---
 doc/source/admin/index.rst      |  2 +-
 doc/source/admin/power-sync.rst | 26 +++++++++++++++++++++-----
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/doc/source/admin/index.rst b/doc/source/admin/index.rst
index 28f55d021c..3e11d35818 100644
--- a/doc/source/admin/index.rst
+++ b/doc/source/admin/index.rst
@@ -26,7 +26,7 @@ the services.
    Upgrade Guide <upgrade-guide>
    Security <security>
    Troubleshooting FAQ <troubleshooting>
-   Power Sync with the Compute Service <power-sync>
+   Power Synchronization <power-sync>
    Node Multi-Tenancy <node-multitenancy>
    Fast-Track Deployment <fast-track>
    Booting a Ramdisk or an ISO <ramdisk-boot>
diff --git a/doc/source/admin/power-sync.rst b/doc/source/admin/power-sync.rst
index f19ff6c1b3..f4d10aa3c9 100644
--- a/doc/source/admin/power-sync.rst
+++ b/doc/source/admin/power-sync.rst
@@ -1,6 +1,6 @@
-===================================
-Power Sync with the Compute Service
-===================================
+=====================
+Power Synchronization
+=====================
 
 Baremetal Power Sync
 ====================
@@ -10,8 +10,24 @@ value of the :oslo.config:option:`conductor.force_power_state_during_sync`
 option is set to ``true`` the power state in the database will be forced on
 the hardware and if it is set to ``false`` the hardware state will be forced
 on the database. If this periodic task is enabled, it runs at an interval
-defined by the :oslo.config:option:`conductor.sync_power_state_interval` config
-option for those nodes which are not in maintenance.
+defined by the :oslo.config:option:`conductor.sync_power_state_interval`
+config option for those nodes which are not in maintenance. The requests sent
+to Baseboard Management Controllers (BMCs) are done with a parallelism
+controlled by :oslo.config:option:`conductor.sync_power_state_workers`.
+The motivation to send out requests to BMCs in parallel is to handle
+misbehaving BMCs which may delay or even block the synchronization otherwise.
+
+.. note::
+    In deployments with many nodes and IPMI as the configured BMC protocol,
+    the default values of a 60 seconds power sync interval and 8 worker
+    threads may lead to a high rate of required retries due to client-side UDP
+    packet loss (visible via the corresponding warnings in the conductor
+    logs). While Ironic automatically retries to get the power status
+    for the affected nodes, the failure rate may be reduced by increasing
+    the power sync cycle, e.g. to 300 seconds, and/or by reducing the number
+    of power sync workers, e.g. to 2. Pleae keep in mind, however, that
+    depending on the concrete setup increasing the power sync interval may
+    have an impact on other components relying on up-to-date power states.
 
 Compute-Baremetal Power Sync
 ============================