integ/kubernetes/k8s-pod-recovery/centos/files
Andre Fernando Zanella Kantek e3705e6046 Execute one extra attempt to restore SRIOV device plugin
The service k8s-pod-recovery failed to restore the SRIOV device
plugin, necessary for pods that use SRIOV interfaces to create the
resource, those pods need to add the label 'restart-on-reboot=true'
to be restarted during boot. The failure was observed during an
upgrade, and although rare, it left the operator to actuate by
manually restarting the pods later.

This change adds a wait for the pod stabilization (it is considered
stable when stops the state transitions) and, if still in failure,
execute 2 attempts to restore the plugin. Logs were added to better
register the pod state in case of an error.

Test Plan:
[PASS]  execute 7 upgrades in an AIO-SX lab

Closes-Bug: 1999074

Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Change-Id: I838c35d3e0a3557c71344945a8e00f22ccb50eb4
2022-12-09 05:59:33 -05:00
..
k8s-pod-recovery Execute one extra attempt to restore SRIOV device plugin 2022-12-09 05:59:33 -05:00
k8s-pod-recovery.service Introduce k8s pod recovery service 2020-09-03 23:38:41 -04:00