169a0c0ee3
This introduces k8s-container-cleanup script that will be called when containerd.service is stopped. The script detects whether systemd state is 'stopping' due to shutdown/reboot, then stops all running containers before the service shuts down. During shutdown/reboot, some containers are not receiving the SIGTERM signal. This leads to unexpected behaviour such as generating huge coredumps. There is an upstream issue regarding this: https://github.com/kubernetes/kubernetes/issues/107158 The problem seems to be systemd related but this commit addresses the problem with a workaround. This reverts commit f3c18b0f79e3b145d378474b24d861926dd61a13. The k8s-container-cleanup script is moved from kubelet.service to containerd.service. The ExecStopPost that calls this script is removed, and replaced with ExecStop in containerd.service to call the script (in config-files repo). The k8s-container-cleanup script requires containerd is running in order to use crictl utility. The shutdown of kubelet and containerd have unpredictable timing, so the cleanup must be done in containerd. Test Plan: On AIO-SX PASS: Verify k8s-container-cleanup logs to daemon.log during 'stopping. PASS: Manual change containerd/kubelet shutdown timing and verify. k8s-container-cleanup running to completion before containerd stopped. PASS: Reboot and verify k8s-container-cleanup running to completion. PASS: Lock/unlock and verify k8s-container-cleanup running to completion. PASS: Manually run spellintian tool against k8s-container-cleanup.sh. PASS: Manually run shellcheck tool against k8s-container-cleanup.sh. PASS: Zuul tox bashate tool against k8s-container-cleanup.sh. Partial-Bug: 1964111 Change-Id: Ic8a9e257f861ae218a8520205eced3eaa580dd20 Signed-off-by: Jim Gauld <james.gauld@windriver.com>
54 lines
1.6 KiB
Bash
Executable File
54 lines
1.6 KiB
Bash
Executable File
#!/bin/bash
|
|
# Copyright (c) 2022 Wind River Systems, Inc.
|
|
#
|
|
# SPDX-License-Identifier: Apache-2.0
|
|
#
|
|
# The script will run during containerd.service ExecStop.
|
|
# This script detects whether systemd state is 'stopping' due to
|
|
# shutdown/reboot, then will stop all running containers before the
|
|
# service shuts down.
|
|
#
|
|
# All running containers are stopped one container at a time.
|
|
# The internal implementation of 'crictl stop --timeout <n>'
|
|
# sends a SIGTERM to the container, and will use SIGKILL only
|
|
# if the timeout is reached.
|
|
#
|
|
|
|
NAME=$(basename "${0}")
|
|
|
|
# Log info message to /var/log/daemon.log
|
|
function LOG {
|
|
logger -p daemon.info -t "${NAME}($$): " "${@}"
|
|
}
|
|
|
|
# Log error message to /var/log/daemon.log
|
|
function ERROR {
|
|
logger -p daemon.error -t "${NAME}($$): " "${@}"
|
|
}
|
|
|
|
state=$(timeout 10 systemctl is-system-running)
|
|
RC=$?
|
|
LOG "System state is: ${state}, RC = ${RC}."
|
|
case $RC in
|
|
124)
|
|
# systemctl hung.
|
|
ERROR "systemctl timed out. System state unknown."
|
|
;;
|
|
|
|
[01])
|
|
# 0 - running; 1 - initializing, starting, degraded, maintenance, stopping
|
|
if [ "$state" = "stopping" ]; then
|
|
LOG "Stopping all containers."
|
|
# Use crictl to gracefully stop each container. If specified timeout is
|
|
# reached, it forcibly kills the container. There is no need to check
|
|
# return code since there is nothing more we can do, and crictl already
|
|
# logs to daemon.log.
|
|
crictl ps -q | xargs -r -I {} crictl stop --timeout 5 {}
|
|
LOG "Stopping all containers completed."
|
|
exit 0
|
|
fi
|
|
;;
|
|
esac
|
|
|
|
exit 0
|