
A bug in STX-O prevented the libvirt container to be killed if a server was running in a system where only 1 worker is available. It was found that the VM processes run by QEMU were placed in sub-cgroups under the libvirt pod’s cgroup, creating a parent/child relationship in the cgroup hierarchy. Some of the cgroup resource controllers (defined in [1]) used by the VM could not be moved back to the host root cgroup when deleting the libvirt pod, which prevented the kernel from fully terminating the container. This caused the pod’s preStop hook to attempt to kill all libvirt processes, but because some of the VM’s cgroup resource controllers remained attached to the pod’s own cgroup hierarchy, the kernel could not fully release those resources back to the host. As a result, the libvirt container could not transition to a clean terminated state and stayed stuck in a terminating condition. This caused the pod preStop hook to kill all libvirt processes but because some of the VM's controllers where linked to the pod, the container could not be definitely killed. The libvirt cgroup initialization in the caracal version uses a small hard-coded list of controllers, that are set in the libvirt bash file. In addition, a cgroup_controllers list was added to the libvirt static Helm overrides so that the set of controllers used can be configured explicitly from the chart values. This ensures that any future changes in the available cgroup controllers can be handled through the override file without requiring further changes to the libvirt initialization script. This review creates a patch to update the .sh to it's latest version, where it compares a list of controllers set in the values file with the controllers available in the host [2], and use that list to initialize the controllers in the libvirt process. The patch also removes a hugepage validation that existed in the bash file since the validation is not necessary, given that libvirt is not running in the pod cgroup anymore [3]. These are the commits that added the changes to the upstream OSH repository: [1] - https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html [2] -3903f54d0c
[3] -ea3c04a7d9
This patch can be dropped after the epoxy upversion. Test-Plan: PASS - Build all STX-O packages and tarball PASS - Patch is present on OSH-I package PASS - Upload and apply STX-Openstack PASS - Launch 3 VMs and delete libvirt pod PASS - Upgrade k8s version from 1.29 to 1.31 PASS - Libvirt pod always comes back running PASS - All VMs are accessible after pod restart Closes-Bug: #2125753 Change-Id: Ie7dcac64a55834d670a3a2e0b689b22f25e01ce0 Signed-off-by: Daniel Caires <DanielMarques.Caires@windriver.com>
24 lines
1.2 KiB
Plaintext
24 lines
1.2 KiB
Plaintext
0001-Add-imagePullSecrets-in-service-account.patch
|
|
0002-Partial-revert-of-31e3469d28858d7b5eb6355e88b6f49fd6.patch
|
|
0003-Fix-pod-restarts-on-all-workers-when-worker-added.patch
|
|
0004-Add-io_thread_pool-for-rabbitmq.patch
|
|
0005-Enable-override-of-mariadb-server-probe-parameters.patch
|
|
0006-Add-mariadb-database-config-override-to-support-ipv6.patch
|
|
0007-Allow-set-public-endpoint-url-for-all-openstack-types.patch
|
|
0008-Add-GaleraDB-Secure-Replica-Traffic.patch
|
|
0009-Fix-tls-in-openstack-helm-infra.patch
|
|
0010-Remove-mariadb-tls.patch
|
|
0011-Remove-rabbitmq-tls.patch
|
|
0012-Update-openstack-Ingress-for-networking-api-v1.patch
|
|
0013-Update-libvirt-configuration-script-for-Debian.patch
|
|
0014-Add-app.starlingx.io-component-label-to-pods.patch
|
|
0015-Add-pre-apply-cleanup-Job-to-STX-O-Helm-charts.patch
|
|
0016-Add-Kubernetes-name-label-to-helm-toolkit-template.patch
|
|
0017-Add-support-for-multiple-hosts-in-a-daemonset.patch
|
|
0018-Fix-upversion-breaking-changes.patch
|
|
0019-removed-section-to-add-default-daemonset-to-global-l.patch
|
|
0020-Bring-necessary-upstream-commits.patch
|
|
0021-Add-custom-pod-annotations-to-libvirt.patch
|
|
0022-Update-ipFamilyPolicy-to-support-DualStack.patch
|
|
0023-Update-libvirt-cgroup-controllers-initiation.patch
|