Add documentation

2021-09-03 16:21:55 +01:00 · 2021-09-03 16:21:55 +01:00 · 2849e9496b
commit 2849e9496b
parent 63e2610196
5 changed files with 468 additions and 6 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,24 @@
 # capi-helm-charts

-This repository contains Helm charts that can be used to deploy Cluster API resources.
+This repository contains [Helm charts](https://helm.sh/) for deploying [Kubernetes](https://kubernetes.io/)
+clusters using [Cluster API](https://cluster-api.sigs.k8s.io/).
+
+The charts are available from the `stackhpc.github.io/capi-helm-charts` repository:
+
+```sh
+helm repo add capi https://stackhpc.github.io/capi-helm-charts
+helm install my-release capi/<chartname> [...options]
+```
+
+To list the available versions for the charts:
+
+```sh
+helm search repo capi --devel --versions
+```
+
+Currently, the following charts are available:
+
+| Chart | Description |
+| --- | --- |
+| [cluster-addons](./charts/cluster-addons) | Deploys addons into a Kubernetes cluster, e.g. CNI. |
+| [openstack-cluster](./charts/openstack-cluster) | Deploys a Kubernetes cluster on an OpenStack cloud. |
--- a/charts/cluster-addons/README.md
+++ b/charts/cluster-addons/README.md
@ -0,0 +1,229 @@
+# cluster-addons chart
+
+This [Helm chart](https://helm.sh/) manages the deployment of addons for a
+[Kubernetes](https://kubernetes.io) cluster. It is primarily intended to be used with
+the cluster management charts from this repository, e.g.
+[openstack-cluster](../openstack-cluster), but should work for any Kubernetes cluster.
+
+The addons are deployed by launching
+[Kubernetes jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/) on the
+target cluster, each of which is responsible for installing or updating a single addon.
+The jobs use the [utils image](../../utils) from this repository, which bundles some
+useful tools like [jq](https://stedolan.github.io/jq/),
+[kubectl](https://kubernetes.io/docs/reference/kubectl/overview/),
+[kustomize](https://kustomize.io/) and [helm](https://helm.sh), and the jobs execute
+with full permissions on the cluster using the `cluster-admin` cluster role. This is
+used rather than a more restrictive role for a few reasons:
+
+  1. This chart provides a mechanism to apply custom addons, and there is no way to
+     know in advance what resources those custom addons may need to manage.
+  1. Addons may need to manage
+     [CRD](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/)
+     instances that are not covered by a more restrictive role.
+  1. Several addons need to create
+     [RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) resources,
+     and so could elevate their permissions anyway by creating new roles.
+
+There are two patterns used in this chart for managing addons:
+
+  1. Manifests are pulled from a URL and run through `kustomize` before being applied
+     using `kubectl apply`. The manifests are **not** present in this repository. In
+     this case, the URL and kustomize configuration can be changed using the Helm values
+     if required, e.g. to change images from Docker Hub to another repository or to
+     point to an internal source if an air-gapped installation is required.
+  1. Using a Helm chart. The chart to use is configured using Helm values rather
+     than Helm dependencies, which allows full control via configuration over which
+     repository is used (e.g. a mirror for an air-gapped installation) and which version
+     is installed. The Helm values for the addon are also exposed, and can be customised,
+     via the values for this chart. This chart sets sensible defaults.
+
+This chart also allows custom addons to be managed using the Helm values, either by
+specifying manifest content inline, or by specifying a Helm chart to install with the
+corresponding values.
+
+## Container Network Interface (CNI) plugins
+
+This chart can install either
+[Calico](https://docs.projectcalico.org/about/about-calico) or
+[Weave](https://www.weave.works/docs/net/latest/kubernetes/kube-addon/) as a
+[CNI plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/)
+to provide the pod networking in a Kubernetes cluster. By default, the Calico CNI will be
+installed.
+
+To switch the CNI to Weave, use the following in your Helm values:
+
+```yaml
+cni:
+  type: weave
+```
+
+And to disable the installation of a CNI completely:
+
+```yaml
+cni:
+  enabled: false
+```
+
+Additional configuration options are available for each - see [values.yaml](./values.yaml).
+
+## Cloud Controller Managers (CCMs)
+
+In Kubernetes, a
+[Cloud Controller Manager (CCM)](https://kubernetes.io/docs/concepts/architecture/cloud-controller/)
+provides integration between a Kubernetes cluster and the cloud platform that it is running on.
+This enables things like the automatic labelling of nodes with cloud-specific information,
+automatic configuration of hostnames and IP addresses, and managed load balancers for services.
+
+This chart can install the
+[OpenStack CCM](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md)
+to provided this integration for clusters running on an OpenStack cloud.
+
+By default, this chart does not deploy a CCM. To enable the OpenStack CCM on the target cluster,
+use the following in your Helm values:
+
+```yaml
+ccm:
+  enabled: true
+  type: openstack
+```
+
+To configure options for `[Networking]`, `[LoadBalancer]` and `[Metadata]` sections of the
+[cloud-config](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md#config-openstack-cloud-controller-manager)
+file you can use the Helm values, e.g.:
+
+```yaml
+ccm:
+  openstack:
+    cloudConfig:
+      networking:
+        public-network-name: public-internet
+      loadBalancer:
+        lb-method: LEAST_CONNECTIONS
+        create-monitor: "true"
+      metadata:
+        search-order: metadataService
+```
+
+The `[Globals]` section is populated using the given `clouds.yaml` (see "OpenStack credentials" below).
+
+Additional configuration options are available for CCMs - see [values.yaml](./values.yaml).
+
+### OpenStack credentials
+
+OpenStack credentials are required for the Kubernetes OpenStack integrations to query and
+manage OpenStack resources on behalf of the cluster. The recommended way to do this is using an
+[Application Credential](https://docs.openstack.org/keystone/latest/user/application_credentials.html)
+to avoid your password being in stored on the cluster. Application credentials are project-scoped,
+and ideally you should use a separate application credential for each cluster in a project.
+
+For ease of use, this chart is written so that a `clouds.yaml` file can be given directly
+to the chart as a configuration file. When an application credential is created in Horizon,
+the corresponding `clouds.yaml` file can be downloaded, and should look something like this:
+
+```yaml
+clouds:
+  openstack:
+    auth:
+      auth_url: https://my.cloud:5000
+      application_credential_id: "<app cred id>"
+      application_credential_secret: "<app cred secret>"
+    region_name: "RegionOne"
+    interface: "public"
+    identity_api_version: 3
+    auth_type: "v3applicationcredential"
+```
+
+This file can then be passed to the chart using the `-f|--values` option, e.g.:
+
+```sh
+helm install cluster-addons capi/cluster-addons --values ./clouds.yaml [...options]
+```
+
+## NVIDIA GPU operator
+
+This chart is able to install the
+[NVIDIA GPU operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/overview.html)
+to provide access to NVIDIA GPUs from Kubernetes pods using the
+[device plugin framework](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/).
+
+When deployed, the GPU operator will detect nodes with NVIDIA GPUs and automatically install the
+NVIDIA software components required to make the GPUs available to Kubernetes. This does not
+require any special modifications to the image used to deploy the nodes.
+
+The GPU operator is not enabled by default. To enable it, use the following Helm values:
+
+```yaml
+nvidiaGPUOperator:
+  enabled: true
+```
+
+Because of the automatic detection of nodes with GPUs, there is no need to manually label
+nodes with GPUs. In the case where some nodes have GPUs and some do not, the GPU operator
+will do the right thing without the need for manual intervention.
+
+Additional configuration options are available for the NVIDIA GPU operator - see
+[values.yaml](./values.yaml).
+
+## Custom manifests
+
+This chart is able to manage the application of custom user-specified manifests to the
+cluster using `kubectl apply`. This can be useful to install cluster-specific resources
+such as additional
+[storage classes](https://kubernetes.io/docs/concepts/storage/storage-classes/)
+or [RBAC rules](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
+
+To apply custom manifests to the cluster as part of the addon installation, use something
+similar to the following in your Helm values:
+
+```yaml
+# This should be a mapping of filenames to manifest content
+customManifests:
+  storageclass-standard.yaml: |
+    apiVersion: storage.k8s.io/v1
+    kind: StorageClass
+    metadata:
+    name: standard
+    provisioner: my-storage-provisioner
+
+  pod-reader.yaml: |
+    apiVersion: rbac.authorization.k8s.io/v1
+    kind: ClusterRole
+    metadata:
+      name: pod-reader
+    rules:
+      - apiGroups: [""]
+        resources: ["pods"]
+        verbs: ["get", "watch", "list"]
+```
+
+## Custom Helm charts
+
+In addition to simple custom manifests, this chart is also able to manage additional
+cluster-specific Helm releases.
+
+To deploy a custom Helm release as part of the addon installation, use something similar
+to the following in your Helm values:
+
+```yaml
+customHelmReleases:
+  # This is the name of the release
+  my-wordpress:
+    chart:
+      # The repository that the chart is in
+      repo: https://charts.bitnami.com/bitnami
+      # The name of the chart
+      name: wordpress
+      # The version of the chart to use
+      # NOTE: THIS IS REQUIRED
+      version: 12.1.6
+    # The namespace for the release
+    # If not given, this defaults to the release name
+    namespace: wordpress
+    # The amount of time to wait for the chart to deploy before rolling back
+    timeout: 5m
+    # The values for the chart
+    values:
+      wordpressUsername: jbloggs
+      wordpressPassword: supersecretpassword
+      wordpressBlogName: JBloggs Awesome Blog!
+```
--- a/charts/cluster-addons/values.yaml
+++ b/charts/cluster-addons/values.yaml
@ -66,7 +66,7 @@ cni:
 # Settings for the cloud controller manager for external cloud providers
 ccm:
  # Indicates if an external cloud controller manager should be deployed
-  enabled: true
+  enabled: false
  # The type of the external CCM to deploy - currently only OpenStack is supported
  type: openstack
  # Settings for the OpenStack cloud controller manager
@ -102,7 +102,7 @@ ccm:
 # Settings for the NVIDIA GPU operator
 nvidiaGPUOperator:
  # Indicates if the NVIDIA GPU operator should be enabled
-  enabled: true
+  enabled: false
  chart:
    repo: https://nvidia.github.io/gpu-operator
    name: gpu-operator
--- a/charts/openstack-cluster/README.md
+++ b/charts/openstack-cluster/README.md
@ -0,0 +1,208 @@
+# openstack-cluster chart
+
+This [Helm chart](https://helm.sh/) manages the lifecycle of a [Kubernetes](https://kubernetes.io)
+cluster on an [OpenStack](https://www.openstack.org/) cloud using
+[Cluster API](https://cluster-api.sigs.k8s.io/).
+
+As well as managing the Cluster API resources for the cluster, this chart optionally
+manages addons for the cluster using Kubernetes jobs. Some of these are required for
+a functional cluster, e.g. a
+[Container Network Interface (CNI) plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/)
+and the
+[OpenStack Cloud Controller Manager (CCM)](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md), and
+others are optional.
+
+> See the [cluster-addons chart](../cluster-addons) for more details about the addons
+> that can be installed.
+
+This README describes some of the basic options, however there are many other options
+available. Check out the [values.yaml](./values.yaml) (commented) and the chart
+templates for more details.
+
+## Prerequisites
+
+First, you must set up a
+[Cluster API management cluster](https://cluster-api.sigs.k8s.io/user/concepts.html#management-cluster)
+with the [OpenStack Infrastructure Provider](https://github.com/kubernetes-sigs/cluster-api-provider-openstack)
+installed.
+
+In addition, Helm must be installed and configured to access your management cluster,
+and the chart repository containing this chart must be configured:
+
+```sh
+helm repo add capi https://stackhpc.github.io/capi-helm-charts
+```
+
+## OpenStack images
+
+Cluster API uses an
+[immutable infrastructure](https://www.hashicorp.com/resources/what-is-mutable-vs-immutable-infrastructure)
+pattern where images are built with specific versions of the required
+software installed (e.g.
+[kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/),
+[kubeadm](https://kubernetes.io/docs/reference/setup-tools/kubeadm/)).
+
+Using this pattern, particularly with pre-built images, has some significant advantages, e.g.:
+
+  * Creating, upgrading and (auto-)scaling of clusters is fast as the required software
+    is already available in the image.
+  * New images for operating system updates or new Kubernetes versions can be
+    built and tested before being rolled out onto a production cluster with confidence
+    that nothing has changed.
+  * Images can be built and tested once and shared by multiple clusters.
+  * Zero-downtime upgrades can be performed by replacing machines one at a time,
+    with rollback if the upgrade fails.
+
+Your cloud provider may use a centralised process to build, test and share suitable
+images with all projects. If you need to build a suitable image, the
+[Kubernetes Image Builder](https://image-builder.sigs.k8s.io/) project from the Cluster
+Lifecycle SIG provides a tool for building images for use with Cluster API using
+[QEMU](https://www.qemu.org/), [Packer](https://www.packer.io/) and [Ansible](https://www.ansible.com/).
+
+## OpenStack credentials
+
+OpenStack credentials are required for two purposes:
+
+  1. For Cluster API to manage OpenStack resources for the workload cluster, e.g. networks, machines.
+  2. For OpenStack integrations on the workload cluster, e.g. OpenStack CCM, Cinder CSI.
+
+By default, this chart uses the same credentials for both.
+
+The recommended way to do this is using an
+[Application Credential](https://docs.openstack.org/keystone/latest/user/application_credentials.html)
+to avoid your password being in stored on both the management and workload clusters.
+Application credentials are project-scoped, and ideally you should use a separate
+application credential for each cluster in a project.
+
+For ease of use, this chart is written so that a `clouds.yaml` file can be given directly
+to the chart as a configuration file. When an application credential is created in Horizon,
+the corresponding `clouds.yaml` file can be downloaded, and should look something like this:
+
+> WARNING
+> 
+> The Cluster API OpenStack provider currently requires that the `project_id` is present,
+> which you will need to add manually.
+
+```yaml
+clouds:
+  openstack:
+    auth:
+      auth_url: https://my.cloud:5000
+      project_id: "<project id>"
+      application_credential_id: "<app cred id>"
+      application_credential_secret: "<app cred secret>"
+    region_name: "RegionOne"
+    interface: "public"
+    identity_api_version: 3
+    auth_type: "v3applicationcredential"
+```
+
+This file can then be passed to the chart using the `-f|--values` option, e.g.:
+
+```sh
+helm install my-cluster capi/openstack-cluster --values ./clouds.yaml [...options]
+```
+
+## Managing a workload cluster
+
+In addition to the `clouds.yaml`, the following is a minimal configuration to deploy a
+working cluster:
+
+```yaml
+# The target Kubernetes version
+kubernetesVersion: 1.22.1
+
+# An image with the required software installed at the target version
+machineImage: ubuntu-2004-kube-v{{ .Values.kubernetesVersion }}
+
+# The name of the SSH keypair to inject into cluster machines
+machineSSHKeyName: jbloggs-keypair
+
+controlPlane:
+  # The flavor to use for control plane machines
+  # It is recommended to use a flavour with at least 2 CPU, 4GB RAM
+  machineFlavor: vm.small
+
+# A list of worker node groups for the cluster
+nodeGroups:
+  - # The name of the node group
+    name: md-0
+    # The flavor to use for the node group machines
+    machineFlavor: vm.xlarge
+    # The number of machines in the group
+    machineCount: 3
+```
+
+To install or upgrade a cluster, use the following Helm command:
+
+```sh
+helm upgrade my-cluster capi/openstack-cluster --devel --install -f ./clouds.yaml -f ./cluster-configuration.yaml
+```
+
+This will create a cluster on its own network with a three node, highly-available (HA)
+control plane, a load-balancer for the Kubernetes API with a floating IP attached
+and a single worker group with three nodes.
+
+To inspect the progress of the cluster deployment, you can use the
+[clusterctl CLI](https://cluster-api.sigs.k8s.io/clusterctl/overview.html):
+
+```sh
+clusterctl describe cluster my-cluster
+```
+
+To update the cluster, just modify the configuration as required and run the above
+command again. Some examples of updates that can be performed are:
+
+  * Adding and removing node groups. A cluster can have several node groups, and
+    each node group can have a different flavor and machine count.
+  * Scaling the cluster. Change the machine count for the required node group(s)
+    to add or remove machines.
+  * Changing the image to update system packages or upgrade Kubernetes.
+    Once a new image is available, change the machine image and Kubernetes version
+    as required to trigger a rolling upgrade of the cluster nodes.
+
+### Cluster addons
+
+The cluster addons are enabled by default, however by default only a CNI and the
+OpenStack CCM are enabled.
+
+You can configure which addons are deployed and the configuration of those addons
+by specifying values for the addons Helm chart:
+
+```yaml
+addons:
+  values:
+    nvidiaGPUOperator:
+      enabled: true
+```
+
+The available options under `addons.values` correspond to the available options
+for the [cluster-addons chart](../cluster-addons).
+
+The cluster addons also can be disabled completely using the following configuration:
+
+> **WARNING**
+>
+> If the cluster addons are disabled, you will need to manually install a CNI
+> plugin and the OpenStack Cloud Controller Manager before the cluster deployment
+> will complete successfully.
+
+```yaml
+addons:
+  enabled: false
+```
+
+Note that changing this after the initial deployment will **not** uninstall any
+addons that have already been installed, but it will prevent updates to addons
+from being applied.
+
+## Accessing a workload cluster
+
+To access the cluster, use `clusterctl` to generate a kubeconfig file:
+
+```sh
+# Generate a kubeconfig and write it to a file
+clusterctl get kubeconfig my-cluster > kubeconfig.my-cluster
+# Use that kubeconfig to list pods on the workload cluster
+kubectl --kubeconfig=./kubeconfig.my-cluster get po -A
+```
--- a/charts/openstack-cluster/values.yaml
+++ b/charts/openstack-cluster/values.yaml
@ -72,7 +72,7 @@ controlPlane:
  # The flavor to use for control plane machines
  machineFlavor:
  # The kubeadm config specification for the control plane
-  # By default, this uses a simple configuration that just enables the external cloud provider
+  # By default, this uses a simple configuration that enables the external cloud provider
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
@ -101,7 +101,7 @@ nodeGroupDefaults:
  machineFlavor:
  # The default kubeadm config specification for worker nodes
  # This will be merged with any configuration given for specific node groups
-  # By default, this uses a simple configuration that just enables the external cloud provider
+  # By default, this uses a simple configuration that enables the external cloud provider
  kubeadmConfigSpec:
    joinConfiguration:
      nodeRegistration:
@ -138,4 +138,8 @@ addons:
  # Values for the addons
  # See https://github.com/stackhpc/capi-helm-charts/blob/main/charts/cluster-addons for details
  # The clouds.yaml used for cluster deployment will be given in addition to these
-  values: {}
+  values:
+    # By default, enable the OpenStack CCM
+    ccm:
+      enabled: true
+      type: openstack