A bug in STX-O prevented the libvirt container to be
killed if a server was running in a system where
only 1 worker is available. It was found that the VM
processes run by QEMU were placed in sub-cgroups under
the libvirt pod’s cgroup, creating a parent/child
relationship in the cgroup hierarchy.
Some of the cgroup resource controllers (defined in [1])
used by the VM could not be moved back to the host
root cgroup when deleting the libvirt pod, which
prevented the kernel from fully terminating the container.
This caused the pod’s preStop hook to attempt to kill all
libvirt processes, but because some of the VM’s cgroup
resource controllers remained attached to the pod’s own
cgroup hierarchy, the kernel could not fully release those
resources back to the host. As a result, the libvirt
container could not transition to a clean terminated state
and stayed stuck in a terminating condition.
This caused the pod preStop hook to kill all libvirt
processes but because some of the VM's controllers
where linked to the pod, the container could not be
definitely killed.
The libvirt cgroup initialization in the caracal version
uses a small hard-coded list of controllers, that are set
in the libvirt bash file.
In addition, a cgroup_controllers list was added to the
libvirt static Helm overrides so that the set of controllers
used can be configured explicitly from the chart values.
This ensures that any future changes in the available
cgroup controllers can be handled through the override
file without requiring further changes to the libvirt
initialization script.
This review creates a patch to update the .sh to it's
latest version, where it compares a list of controllers
set in the values file with the controllers available in
the host [2], and use that list to initialize the controllers
in the libvirt process. The patch also removes a hugepage
validation that existed in the bash file since the validation
is not necessary, given that libvirt is not running in
the pod cgroup anymore [3].
These are the commits that added the changes to the
upstream OSH repository:
[1] - https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
[2] - 3903f54d0c
[3] - ea3c04a7d9
This patch can be dropped after the epoxy upversion.
Test-Plan:
PASS - Build all STX-O packages and tarball
PASS - Patch is present on OSH-I package
PASS - Upload and apply STX-Openstack
PASS - Launch 3 VMs and delete libvirt pod
PASS - Upgrade k8s version from 1.29 to 1.31
PASS - Libvirt pod always comes back running
PASS - All VMs are accessible after pod restart
Closes-Bug: #2125753
Change-Id: Ie7dcac64a55834d670a3a2e0b689b22f25e01ce0
Signed-off-by: Daniel Caires <DanielMarques.Caires@windriver.com>