4 Commits

Author SHA1 Message Date
Babak Sarashki
fabc6822a0 integ: gpu-operator chart upgrade 1.6.0 -> 1.8.1
This upgrade is needed in support of A100 GPU, kernel
upgrade and bug 1948050. It eliminates the requirement
to create nvidia specific runtimeclass prior to installing
the charts by pre-installing the toolkit through toolkit-
installer subchart.

This commit has been tested with the following:

driver: 470.57.02
toolkit: 1.7.1-ubi8
defaultRuntime: containerd

Test Plan:
PASS: Verify gpu-operator starts and adds nvidia.com/gpu
      to the node.
PASS: Verify nvidia-toolkit is removed with helm override
      of global.toolkit_force_clean=true.
PASS: Verify pods can access gpu device and nvidia tools
      to monitor the GPU.
PASS: Verify pod can build and execute cuda sample code.
PASS: Verify driver pod prints out warning when building
      on Low Latency kernel with helm override of:
	  --set driver.env[0].name=IGNORE_PREEMPT_RT_PRESENCE

Closes-Bug: 1948050
Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
Change-Id: I18dd2a0ab1adc6f9364314a22373aadc93cad27f
2021-11-23 00:56:53 +00:00
Babak Sarashki
7badc1dad1 integ: add nvidia gpu-operator helm charts
This commit adds nvidia gpu-operator helm charts use case for
custom container runtime feature. To load nvidia-gpu-operator
on starlingx:

system service-parameter-add platform container_runtime \
custom_container_runtime=\
nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime

And define  runtimeClass for nvidia gpu  pods:

kind: RuntimeClass
apiVersion: node.k8s.io/v1beta1
metadata:
  name: nvidia
handler: nvidia

The above will direct all containerd creations of pods with nvidia
runtimeClass to nvidia-container-runtime -- where the nvidia-conta
iner-runtime is installed by the operator onto a hostMount.

Story: 2008434
Task: 41978

Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
Change-Id: Ifea8cdf6eb89a159f446c53566279e72fcf0e45e
2021-03-31 17:33:41 +00:00
Jim Gauld
f161f7f18e Revert "integ: gpu-operator helm charts"
This reverts commit 41bdf53f65684b54abaa3098a5fe3acf568cdf2a.

Reason for revert: gpu operator patch is breaking stx-master build.

e.g.,
08:06:44 Failed to build packages:  gpu-operator-1.6.0-0.tis.1.src.rpm; problem with:
Patch #2 (enablement-support-on-starlingx-cloud-platform.patch):
. .
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file deployments/gpu-operator/templates/operator.yaml.rej
patching file deployments/gpu-operator/values.yaml
error: Bad exit status from /var/tmp/rpm-tmp.VQuqLh (%prep)

Change-Id: Id7a05987586582c940d605874d1e0f813333f2c3
2021-03-29 12:31:25 +00:00
Babak Sarashki
41bdf53f65 integ: gpu-operator helm charts
This commit adds nvidia gpu-operator helm charts use case for
custom container runtime feature. To load nvidia-gpu-operator
on starlingx:

system service-parameter-add platform container_runtime \
custom_container_runtime=\
nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime

And define  runtimeClass for nvidia gpu  pods:

kind: RuntimeClass
apiVersion: node.k8s.io/v1beta1
metadata:
  name: nvidia
handler: nvidia

The above will direct all containerd creations of pods with nvidia
runtimeClass to nvidia-container-runtime -- where the nvidia-conta
iner-runtime is installed by the operator onto a hostMount.

Story: 2008434
Task: 41978

Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
Change-Id: I999804d4697349bc0966d0a6e653d7bce15e18fc
2021-03-25 01:10:04 +00:00