integ/kubernetes/containerd/centos/files/k8s-container-cleanup.sh
Jim Gauld 169a0c0ee3 Move k8s container cleanup to containerd service
This introduces k8s-container-cleanup script that will be called
when containerd.service is stopped. The script detects whether systemd
state is 'stopping' due to shutdown/reboot, then stops all running
containers before the service shuts down.

During shutdown/reboot, some containers are not receiving the
SIGTERM signal. This leads to unexpected behaviour such as
generating huge coredumps.

There is an upstream issue regarding this:
https://github.com/kubernetes/kubernetes/issues/107158
The problem seems to be systemd related but this commit
addresses the problem with a workaround.

This reverts commit f3c18b0f79e3b145d378474b24d861926dd61a13.
The k8s-container-cleanup script is moved from kubelet.service
to containerd.service. The ExecStopPost that calls this script
is removed, and replaced with ExecStop in containerd.service
to call the script (in config-files repo).

The k8s-container-cleanup script requires containerd is running
in order to use crictl utility. The shutdown of kubelet and
containerd have unpredictable timing, so the cleanup must be done
in containerd.

Test Plan: On AIO-SX
PASS: Verify k8s-container-cleanup logs to daemon.log during 'stopping.
PASS: Manual change containerd/kubelet shutdown timing and verify.
k8s-container-cleanup running to completion before containerd stopped.
PASS: Reboot and verify k8s-container-cleanup running to completion.
PASS: Lock/unlock and verify k8s-container-cleanup running to completion.
PASS: Manually run spellintian tool against k8s-container-cleanup.sh.
PASS: Manually run shellcheck tool against k8s-container-cleanup.sh.
PASS: Zuul tox bashate tool against k8s-container-cleanup.sh.

Partial-Bug: 1964111
Change-Id: Ic8a9e257f861ae218a8520205eced3eaa580dd20
Signed-off-by: Jim Gauld <james.gauld@windriver.com>
2022-04-12 13:52:40 -04:00

54 lines
1.6 KiB
Bash
Executable File

#!/bin/bash
# Copyright (c) 2022 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
# The script will run during containerd.service ExecStop.
# This script detects whether systemd state is 'stopping' due to
# shutdown/reboot, then will stop all running containers before the
# service shuts down.
#
# All running containers are stopped one container at a time.
# The internal implementation of 'crictl stop --timeout <n>'
# sends a SIGTERM to the container, and will use SIGKILL only
# if the timeout is reached.
#
NAME=$(basename "${0}")
# Log info message to /var/log/daemon.log
function LOG {
logger -p daemon.info -t "${NAME}($$): " "${@}"
}
# Log error message to /var/log/daemon.log
function ERROR {
logger -p daemon.error -t "${NAME}($$): " "${@}"
}
state=$(timeout 10 systemctl is-system-running)
RC=$?
LOG "System state is: ${state}, RC = ${RC}."
case $RC in
124)
# systemctl hung.
ERROR "systemctl timed out. System state unknown."
;;
[01])
# 0 - running; 1 - initializing, starting, degraded, maintenance, stopping
if [ "$state" = "stopping" ]; then
LOG "Stopping all containers."
# Use crictl to gracefully stop each container. If specified timeout is
# reached, it forcibly kills the container. There is no need to check
# return code since there is nothing more we can do, and crictl already
# logs to daemon.log.
crictl ps -q | xargs -r -I {} crictl stop --timeout 5 {}
LOG "Stopping all containers completed."
exit 0
fi
;;
esac
exit 0