integ

Author	SHA1	Message	Date
Babak Sarashki	fabc6822a0	integ: gpu-operator chart upgrade 1.6.0 -> 1.8.1 This upgrade is needed in support of A100 GPU, kernel upgrade and bug 1948050. It eliminates the requirement to create nvidia specific runtimeclass prior to installing the charts by pre-installing the toolkit through toolkit- installer subchart. This commit has been tested with the following: driver: 470.57.02 toolkit: 1.7.1-ubi8 defaultRuntime: containerd Test Plan: PASS: Verify gpu-operator starts and adds nvidia.com/gpu to the node. PASS: Verify nvidia-toolkit is removed with helm override of global.toolkit_force_clean=true. PASS: Verify pods can access gpu device and nvidia tools to monitor the GPU. PASS: Verify pod can build and execute cuda sample code. PASS: Verify driver pod prints out warning when building on Low Latency kernel with helm override of: --set driver.env[0].name=IGNORE_PREEMPT_RT_PRESENCE Closes-Bug: 1948050 Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com> Change-Id: I18dd2a0ab1adc6f9364314a22373aadc93cad27f	2021-11-23 00:56:53 +00:00
Babak Sarashki	7badc1dad1	integ: add nvidia gpu-operator helm charts This commit adds nvidia gpu-operator helm charts use case for custom container runtime feature. To load nvidia-gpu-operator on starlingx: system service-parameter-add platform container_runtime \ custom_container_runtime=\ nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime And define runtimeClass for nvidia gpu pods: kind: RuntimeClass apiVersion: node.k8s.io/v1beta1 metadata: name: nvidia handler: nvidia The above will direct all containerd creations of pods with nvidia runtimeClass to nvidia-container-runtime -- where the nvidia-conta iner-runtime is installed by the operator onto a hostMount. Story: 2008434 Task: 41978 Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com> Change-Id: Ifea8cdf6eb89a159f446c53566279e72fcf0e45e	2021-03-31 17:33:41 +00:00
Jim Gauld	f161f7f18e	Revert "integ: gpu-operator helm charts" This reverts commit 41bdf53f65684b54abaa3098a5fe3acf568cdf2a. Reason for revert: gpu operator patch is breaking stx-master build. e.g., 08:06:44 Failed to build packages: gpu-operator-1.6.0-0.tis.1.src.rpm; problem with: Patch #2 (enablement-support-on-starlingx-cloud-platform.patch): . . Skipping patch. 1 out of 1 hunk ignored -- saving rejects to file deployments/gpu-operator/templates/operator.yaml.rej patching file deployments/gpu-operator/values.yaml error: Bad exit status from /var/tmp/rpm-tmp.VQuqLh (%prep) Change-Id: Id7a05987586582c940d605874d1e0f813333f2c3	2021-03-29 12:31:25 +00:00
Babak Sarashki	41bdf53f65	integ: gpu-operator helm charts This commit adds nvidia gpu-operator helm charts use case for custom container runtime feature. To load nvidia-gpu-operator on starlingx: system service-parameter-add platform container_runtime \ custom_container_runtime=\ nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime And define runtimeClass for nvidia gpu pods: kind: RuntimeClass apiVersion: node.k8s.io/v1beta1 metadata: name: nvidia handler: nvidia The above will direct all containerd creations of pods with nvidia runtimeClass to nvidia-container-runtime -- where the nvidia-conta iner-runtime is installed by the operator onto a hostMount. Story: 2008434 Task: 41978 Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com> Change-Id: I999804d4697349bc0966d0a6e653d7bce15e18fc	2021-03-25 01:10:04 +00:00

4 Commits