2ab8364649
Currently, if heartbeat fails, we reschedule it after 5 seconds. This is fine for the first retry, but it can cause a thundering herd problem when a lot of nodes fail to heartbeat at once. This change adds jitter to the minimum wait of 5 seconds. The jitter is not applied for forced heartbeats: they still have a minimum wait of exactly 5 seconds from the last heartbeat. The code is re-ordered to move the interval calculation to one place. Bonus: correctly logging the next interval. The unit tests have been rewritten to test the heartbeat process step by step and not rely on the exact sequence of the calls. Closes-Bug: #2038438 Change-Id: I4c4207b15fb3d48b55e340b7b3b54af833f92cb5 |
||
---|---|---|
.. | ||
api | ||
cmd | ||
extensions | ||
hardware_managers | ||
tests | ||
__init__.py | ||
agent.py | ||
burnin.py | ||
config.py | ||
dmi_inspector.py | ||
efi_utils.py | ||
encoding.py | ||
errors.py | ||
hardware.py | ||
inject_files.py | ||
inspect.py | ||
inspector.py | ||
ironic_api_client.py | ||
netutils.py | ||
numa_inspector.py | ||
partition_utils.py | ||
raid_utils.py | ||
tls_utils.py | ||
utils.py | ||
version.py |