Fix: Clear Stale Subcloud Offline Alarm
Currently, subcloud status alarms are updated strictly when the subcloud availability changes. An audit which doesn't update the subcloud availability will not update the status alarm. Typically, this is fine; however, it can be problematic when alarming services are unavailable (i.e., FM failure) during an availability update. In this rare case, the subcloud status alarm will not match the availability status. Ultimately, this can result in an inconsistent stale alarm status, with an offline alarm raised indefinitely for an available subcloud. Overall, the subcloud state manager should not assume that the subcloud availability status and the subcloud status alarms are aligned. This change ensures that the subcloud alarm status is eventually aligned with the actual availability by forcing alarm updates when the availability remains unchanged (during audit’s update_subcloud_availability). Test Plan: 1. PASS: Ensure subcloud offline (280.001) alarm is cleared for subcloud restarts interleaved with a host-swact. - Power off subcloud, confirm subcloud offline alarm raised, power-on subcloud and initiate host-swact 2. PASS: Induce FM failure during an availability update and ensure that the subcloud offline (280.001) alarm status is eventually cleared: - Power-off subcloud - Wait for availability status of subcloud to show offline (dcmanager subcloud list) Subcloud offline alarm should be raised - unmanage FM-mgr service, ps kill FM and power-on subcloud - Check alarm list, subcloud offline should remain raised It should FAIL to CLEAR at this point - Manage FM-mgr (ensure FM is connected) and wait for next "Handling update_subcloud_availability request" in state.log - Check offline alarm has been cleared Closes-Bug: 2040204 Change-Id: I8c3dd10ca0b3cdfadf7672adfb6165b3194f64aa Signed-off-by: Salman Rana <salman.rana@windriver.com>
This commit is contained in:
@@ -429,6 +429,13 @@ class SubcloudStateManager(manager.Manager):
|
||||
raise
|
||||
|
||||
if update_state_only:
|
||||
# Ensure that the status alarm is consistent with the
|
||||
# subcloud's availability. This is required to compensate
|
||||
# for rare alarm update failures, which may occur during
|
||||
# availability updates.
|
||||
self._raise_or_clear_subcloud_status_alarm(subcloud.name,
|
||||
availability_status)
|
||||
|
||||
# Nothing has changed, but we want to send a state update for this
|
||||
# subcloud as an audit. Get the most up-to-date data.
|
||||
self._update_subcloud_state(context, subcloud.name,
|
||||
|
@@ -1484,6 +1484,37 @@ class TestSubcloudManager(base.DCManagerTestCase):
|
||||
fake_dcmanager_cermon_api.subcloud_online.\
|
||||
assert_called_once_with(self.ctx, subcloud.region_name)
|
||||
|
||||
@mock.patch.object(subcloud_state_manager.SubcloudStateManager,
|
||||
'_raise_or_clear_subcloud_status_alarm')
|
||||
def test_update_state_only(self, mock_update_status_alarm):
|
||||
subcloud = self.create_subcloud_static(self.ctx, name='subcloud1')
|
||||
self.assertIsNotNone(subcloud)
|
||||
|
||||
# Set the subcloud to online/managed
|
||||
db_api.subcloud_update(self.ctx, subcloud.id,
|
||||
management_state=dccommon_consts.MANAGEMENT_UNMANAGED,
|
||||
availability_status=dccommon_consts.AVAILABILITY_ONLINE)
|
||||
|
||||
ssm = subcloud_state_manager.SubcloudStateManager()
|
||||
|
||||
with mock.patch.object(db_api, "subcloud_update") as subcloud_update_mock:
|
||||
ssm.update_subcloud_availability(self.ctx, subcloud.region_name,
|
||||
availability_status=dccommon_consts.AVAILABILITY_ONLINE,
|
||||
update_state_only=True)
|
||||
# Verify that the subcloud was not updated
|
||||
subcloud_update_mock.assert_not_called()
|
||||
|
||||
# Verify alarm status update was attempted
|
||||
mock_update_status_alarm.assert_called_once()
|
||||
|
||||
# Verify dcorch was notified
|
||||
self.fake_dcorch_api.update_subcloud_states.assert_called_once_with(
|
||||
self.ctx, subcloud.region_name, subcloud.management_state,
|
||||
dccommon_consts.AVAILABILITY_ONLINE)
|
||||
|
||||
# Verify audits were not triggered
|
||||
self.fake_dcmanager_audit_api.trigger_subcloud_audits.assert_not_called()
|
||||
|
||||
def test_update_subcloud_availability_go_online_unmanaged(self):
|
||||
# create a subcloud
|
||||
subcloud = self.create_subcloud_static(self.ctx, name='subcloud1')
|
||||
|
Reference in New Issue
Block a user