utilities/tools/collector/debian-scripts/collect_ceph.sh
Eric MacDonald f87df98a96 Use sm-query of service-group to determine controller activity state
Several collect plugins were using the 'sm-query service' with
'management-ip' as the target service to determine if the host
collect is running on is the active controller.

    sm-query service management-ip

Unfortunately, a recent update changed the name of the 'management-ip'
service. This change lead to that query always returning a 'disabled'
status.

This update changes to use 'sm-query service-group controller-services'
instead of a specific service query that might change again in the
future. It also migrates the replicated is_service_active function to
the collect_utils file making it common and reusable by all aspects of
collect. This function was renamed to 'is_controller_active' because
it's no longer representing the activity state of a particular service.

Test Plan:

PASS: Verify 'sysinv' plugin creates var/extra/inventory.info file
      on active controller. sysinv CLI command output.
PASS: Verify 'fm' plugin creates var/extra/alarm.info file
      on active controller. FM alarm and event listings.
PASS: Verify 'ceph' plugin creates var/extra/ceph.info file
      on active controller with 'ceph df' output content.
PASS: Verify 'dc' plugin creates var/extra/distributed_cloud.info file
      on active controller of a system controller or subcloud.

Closes-Bug: 2070496

Change-Id: I9989708f1d87a5ef312129cfe3ede8c862764cb0
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-06-28 15:35:20 +00:00

73 lines
2.0 KiB
Bash
Executable File

#! /bin/bash
#
# Copyright (c) 2013-2014 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
# Loads Up Utilities and Commands Variables
source /usr/local/sbin/collect_parms
source /usr/local/sbin/collect_utils
SERVICE="ceph"
LOGFILE="${extradir}/ceph.info"
echo "${hostname}: Ceph Info .........: ${LOGFILE}"
function exit_if_timeout {
if [ "$?" = "124" ] ; then
echo "Exiting due to ceph command timeout" >> ${LOGFILE}
exit 0
fi
}
###############################################################################
# Only Controller
###############################################################################
if [ "$nodetype" = "controller" ] ; then
# Using timeout with all ceph commands because commands can hang for
# minutes if the ceph cluster is down. If ceph is not configured, the
# commands return immediately.
delimiter ${LOGFILE} "ceph status"
timeout 30 ceph status >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
delimiter ${LOGFILE} "ceph mon dump"
timeout 30 ceph mon dump >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
delimiter ${LOGFILE} "ceph osd dump"
timeout 30 ceph osd dump >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
delimiter ${LOGFILE} "ceph osd tree"
timeout 30 ceph osd tree >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
delimiter ${LOGFILE} "ceph osd crush dump"
timeout 30 ceph osd crush dump >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
is_active_controller
if [ "$?" = "0" ] ; then
exit 0
fi
delimiter ${LOGFILE} "ceph df"
timeout 30 ceph df >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
delimiter ${LOGFILE} "ceph osd df tree"
timeout 30 ceph osd df tree >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
delimiter ${LOGFILE} "ceph health detail"
timeout 30 ceph health detail >> ${LOGFILE} 2>>${COLLECT_ERROR_LOG}
exit_if_timeout
fi
exit 0