
Currently, subcloud status alarms are updated strictly when the subcloud availability changes. An audit which doesn't update the subcloud availability will not update the status alarm. Typically, this is fine; however, it can be problematic when alarming services are unavailable (i.e., FM failure) during an availability update. In this rare case, the subcloud status alarm will not match the availability status. Ultimately, this can result in an inconsistent stale alarm status, with an offline alarm raised indefinitely for an available subcloud. Overall, the subcloud state manager should not assume that the subcloud availability status and the subcloud status alarms are aligned. This change ensures that the subcloud alarm status is eventually aligned with the actual availability by forcing alarm updates when the availability remains unchanged (during audit’s update_subcloud_availability). Test Plan: 1. PASS: Ensure subcloud offline (280.001) alarm is cleared for subcloud restarts interleaved with a host-swact. - Power off subcloud, confirm subcloud offline alarm raised, power-on subcloud and initiate host-swact 2. PASS: Induce FM failure during an availability update and ensure that the subcloud offline (280.001) alarm status is eventually cleared: - Power-off subcloud - Wait for availability status of subcloud to show offline (dcmanager subcloud list) Subcloud offline alarm should be raised - unmanage FM-mgr service, ps kill FM and power-on subcloud - Check alarm list, subcloud offline should remain raised It should FAIL to CLEAR at this point - Manage FM-mgr (ensure FM is connected) and wait for next "Handling update_subcloud_availability request" in state.log - Check offline alarm has been cleared Closes-Bug: 2040204 Change-Id: I8c3dd10ca0b3cdfadf7672adfb6165b3194f64aa Signed-off-by: Salman Rana <salman.rana@windriver.com>
Service
- DC Manager State Service has responsibility for:
-
Subcloud state updates coming from dcmanager-manager service
- service.py:
-
run DC Manager State Service in multi-worker mode, and establish RPC server
- subcloud_state_manager.py:
-
Provide subcloud state updates