3279 Commits

Author SHA1 Message Date
Parsons, Cliff (cp769u)
f38880b26e [ceph-mon] Correct Ceph Mon Check Ports
The ceph-mon-check pod only knew about the v1 port before, and didn't
have the proper mon_host configuration in its ceph.conf file. This
patchset adds knowledge about the v2 port also and correctly configures
the ceph.conf file. Also fixes a namespace hardcoding that was found
in the last ceph-mon-check fix.

Change-Id: I460e43864a2d4b0683b67ae13bf6429d846173fc
2021-10-14 16:14:45 +00:00
Phil Sphicas
25b0cdc7ec [ceph-client] Fix ceph-rbd-pool deletion race
In cases where the pool deletion feature [0] is used, but the pool does
not exists, a pool is created and then subsequently deleted.

This was broken by the performance optimizations introduced with [1], as
the job is trying to delete a pool that does not exist (yet).

This change makes the ceph-rbd-pool job wait for manage_pools to finish
before trying to delete the pool.

0: https://review.opendev.org/c/792851
1: https://review.opendev.org/c/806443

Change-Id: Ibb77e33bed834be25ec7fd215bc448e62075f52a
2021-10-13 17:23:23 -07:00
Stephen Taylor
4d629d3db6 [ceph-mon] Prevent mon-check from removing mons when down temporarily
A race condition exists that can cause the mon-check pod to delete
mons from the monmap that are only down temporarily. This sometimes
causes issues with the monmap when those mons come back up. This
change adds a check to see if the list of mons in the monmap is
larger than expected before removing anything. If not, the monmap
is left alone.

Change-Id: I43b186bf80741fc178c6806d24c179417d7f2406
2021-10-13 10:47:56 -06:00
Zuul
f4a74884e5 Merge "Update lint job to use helm v3" 2021-10-13 16:09:20 +00:00
Gage Hugo
e3203bd7fe Improve osh-infra-deploy helm v3 job
This change improves the osh-infra-deploy job to
successfully deploy minikube with helm v3 along with
the necessary namespaces. Future changes will modify
the install scripts for each job to make them helm
v3 compatible.

Change-Id: I08a94046f86f7c92be7580fbf10751150d2fcecc
2021-10-11 17:02:06 +00:00
Gage Hugo
41e60f065c Update lint job to use helm v3
This change updates the lint job to use helm v3. This
is part of the effort to migrate from helm v2 to v3 and
to ensure each chart is compatible with helm v3.

Change-Id: Ibc8ba5d8fe8efc3637d64df61305602385e644e4
2021-10-11 16:09:52 +00:00
Phil Sphicas
05f2a42330 Use Kubernetes v1.19.15 in kubeadm-aio image
Update Kubernetes version to v1.19.15, the latest patch release of the
earliest supported version (as of 2021-09-15).

Change-Id: Ia8f398098dfafa7fc029c982c71bce4a876668de
2021-10-07 22:14:24 -07:00
Gage Hugo
22e50a5569 Update htk requirements
This change updates the helm-toolkit path in each chart as part
of the move to helm v3. This is due to a lack of helm serve.

Change-Id: I011e282616bf0b5a5c72c1db185c70d8c721695e
2021-10-06 01:02:28 +00:00
Chinasubbareddy Mallavarapu
6e1f2b4087 [ceph-provisioner] Add support to connect to rook-ceph cluster
This is to add support for rook-ceph in provisioner chart so that
if any clients want to connect can  make use of it .

Change-Id: I26c28fac3fa0f5d0b0e71a288217b37a5ca8fb13
2021-10-05 16:30:17 +00:00
Stephen Taylor
46c8218fbf [ceph-client] Performance optimizations for the ceph-rbd-pool job
This change attempts to reduce the number of Ceph commands required
in the ceph-rbd-pool job by collecting most pool properties in a
single call and by setting only those properties where the current
value differs from the target value.

Calls to manage_pool() are also run in the background in parallel,
so all pools are configured concurrently instead of serially. The
script waits for all of those calls to complete before proceeding
in order to avoid issues related to the script finishing before all
pools are completely configured.

Change-Id: If105cd7146313ab9074eedc09580671a0eafcec5
2021-10-01 07:59:10 -06:00
Sean Eagan
b1a247e7f5 Helm 3 - Fix Job labels
If labels are not specified on a Job, kubernetes defaults them
to include the labels of their underlying Pod template. Helm 3
injects metadata into all resources [0] including a
`app.kubernetes.io/managed-by: Helm` label. Thus when kubernetes
sees a Job's labels they are no longer empty and thus do not get
defaulted to the underlying Pod template's labels. This is a
problem since Job labels are depended on by
- Armada pre-upgrade delete hooks
- Armada wait logic configurations
- kubernetes-entrypoint dependencies

Thus for each Job template this adds labels matching the
underlying Pod template to retain the same labels that were
present with Helm 2.

[0]: https://github.com/helm/helm/pull/7649

Change-Id: I3b6b25fcc6a1af4d56f3e2b335615074e2f04b6d
2021-09-30 16:01:31 -05:00
Zuul
0fa7e0fb7e Merge "feat(helm-toolkit): allow setting extra labels on pods" 2021-09-29 06:04:53 +00:00
Tin Lam
5f75ffa180 fix(ssl): fixes libvirt ssl job
Changes the override to use dynamically generated certs for the
libvirt-ssl jobs so they don't expire in the future. Also, changes it so
it is voting again like before.

Signed-off-by: Tin Lam <t@lam.wtf>
Change-Id: If7215961b0b9a7cad75afd7f78592515b74a7b58
2021-09-27 12:45:29 -05:00
Marlin Cremers
4340e272d7 feat(helm-toolkit): allow setting extra labels on pods
Currently it isn't possible to set extra labels on pods that use
the labels snippet. This means users are required to fork the helm
repository for OpenStack services to add custom labels. Use cases
for this are for example injecting Istio sidecars.

This change introduces the ability to set one set of labels on all
resources that use the labels snippet.

Change-Id: Iefc8465300f434b89c07b18ba75260fee0a05ef5
2021-09-27 18:44:47 +02:00
Neely, Travis (tn720x)
4a490b894c Fix issue with db backup error return code being eaten
The return code from the send_to_remote_server function are
being eaten by an if statement and thus we never hit the elif
section of code.

Change-Id: Id3e256c991421ad6624713f65212abb4881240c1
2021-09-26 16:22:39 -05:00
Tin Lam
418143f3e4 fix(gate): disable ssl job
This patch sets temporary disables the ssl gate job and makes the check
job non-voting to unblock osh-infra. The certificate hardcoded in [0]
has expired.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            5f:61:31:9d:0f:ff:99:81:ba:6d:50:1a
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = libvirt.org
        Validity
            Not Before: Sep 15 21:26:53 2020 GMT
            Not After : Sep 15 21:26:53 2021 GMT

This will need to be updated or better, unhardcode this at the gate.

[0] https://opendev.org/openstack/openstack-helm-infra/src/branch/master/tools/deployment/openstack-support/051-libvirt-ssl.sh#L27-L51

Signed-off-by: Tin Lam <t@lam.wtf>
Change-Id: I5ea58490c4fe4b65fec7bd3f11b4684cdc1a3e8b
2021-09-24 12:31:17 -05:00
Tin Lam
9061d08a5e fix(netpol): allows toggling the lockdown
This patch set allows disabling egress and ingress separately.

Signed-off-by: Tin Lam <t@lam.wtf>
Change-Id: I18250a009d62a05983e00db7b7309dd065b94069
2021-09-11 11:48:37 -05:00
Samuel Liu
b7b2048b35 add ingress resources
The current ingress deployment does not add resource, we need to add it.

Change-Id: I9d610f13235c431ffdfa1d29b71660b3c1261e37
2021-09-09 19:43:47 +08:00
Zuul
56c4bfe3cd Merge "Update shaker helm3 compatability" 2021-09-08 19:25:13 +00:00
Zuul
d3a9198f36 Merge "Fix helm3 compatability" 2021-09-08 19:25:11 +00:00
Zuul
72d232af79 Merge "Remove unused jobs and related files" 2021-09-08 18:22:30 +00:00
Gage Hugo
a55f3a5aa2 Fix helm3 compatability
The prometheus-kube-state-metrics chart currently fails to lint
with helm3 due to an extra "-" character. This change removes
the extra dash character in order to allow us to link and build
the chart via helm v3.

Change-Id: Ice1661b8e52fb7e2293d8b03a19e8e7ad43078ca
2021-09-08 10:58:06 -05:00
Gage Hugo
3c8fb39e54 Update shaker helm3 compatability
Currently the shaker chart fails to lint with helm3 due to
invalid yaml marking characters. This change removes the offending
characters to allow us to lint the chart successfully with helm3.

Change-Id: Ieb1ebbeadc4ce12711090060def659709c070b94
2021-09-08 10:55:29 -05:00
Gage Hugo
9030ff05da Remove unused jobs and related files
This change removes a bunch of unused and unmaintained files
and job declarations related to deploying osh-infra with armada.

Change-Id: I158a255132cd6b02607b6e1e77b8b9525cc8a3d5
2021-09-06 22:23:33 -05:00
zhen
6bc1f5a8b6 Modify the rbac_role to make secrets accessible
In the process of secondary development, we found
that we often need to access secrets from pod.
However, it seems that helm-tookit does not support
adding resource of secrets to role. This commit
try to fix that.

Change-Id: If384d6ccb7672a8da5a5e1403733fa655dfe40dd
2021-09-07 02:23:11 +00:00
Zuul
089d3f859c Merge "Add base helm3 job" 2021-09-06 02:55:56 +00:00
Zuul
5911ee7f97 Merge "Update log format stream for mariadb" 2021-09-04 22:05:45 +00:00
Gage Hugo
b70bdd6a71 Get kubeadm working again
This change fixes several issues with kubeadm, notably
the tiller image url/version, as well as fixing the
docker python library missing.

Change-Id: I35528bd45c08ac8580d9875dc54b300a2137fe73
2021-09-02 23:59:35 +00:00
Gage Hugo
21ada44f59 Add base helm3 job
This change adds a new script and job to deploy minikube with
helm3. This job will be improved upon in later changes as
part of the movement to helm3.

Change-Id: Ia7ef30a4e2af77508ad95191e5241d2c1b83a7c4
2021-09-02 04:54:52 +00:00
Zuul
3b06925560 Merge "Always set pg_num_min to the proper value" 2021-08-31 15:09:38 +00:00
Parsons, Cliff (cp769u)
b704b9ad02 Ceph OSD log-runner container should run as ceph user
This PS changes the log-runner user ID to run as the ceph user
so that it has the appropriate permissions to write to /var/log/ceph
files.

Change-Id: I4dfd956130eb3a19ca49a21145b67faf88750d6f
2021-08-27 21:04:15 +00:00
Ritchie, Frank (fr801x)
43fe7246fd Always set pg_num_min to the proper value
Currently if pg_num_min is less than the value specified in values.yaml
or overrides no change to pg_num_min is made during updates when the value
should be increased. This PS will ensure the proper value is always set.

Change-Id: I79004506b66f2084402af59f9f41cda49a929794
2021-08-25 18:59:53 -05:00
Zuul
797658b730 Merge "cert-rotation: Correct and enhance the rotation script." 2021-08-25 21:02:13 +00:00
Gupta, Sangeet (sg774j)
222f7b6877 cert-rotation: Correct and enhance the rotation script.
Corrected the counter increment and enhanced the script to handle
situation if the certificate is stuck in issuing state.

Change-Id: Ib8a84831a605bb3e5a1fc5b5a909c827ec864797
2021-08-25 15:57:35 +00:00
Parsons, Cliff (cp769u)
a0aec27ebc Fix Ceph checkDNS script
The checkDNS script which is run inside the ceph-mon pods has had
a bug for a while now. If a value of "up" is passed in, it adds
brackets around it, but then doesn't check for the brackets when
checking for a value of "up". This causes a value of "{up}" to be
written into the ceph.conf for the mon_host line and that causes
the mon_host to not be able to respond to ceph/rbd commands. Its
normally not a problem if DNS is working, but if DNS stops working
this can happen.

This patch changes the comparison to look for "{up}" instead of
"up" in three different files, which should fix the problem.

Change-Id: I89cf07b28ad8e0e529646977a0a36dd2df48966d
2021-08-25 14:17:54 +00:00
Zuul
6f427495f3 Merge "Fix an attribute error" 2021-08-24 16:46:35 +00:00
Zuul
955f566e2c Merge "Remove Kibana indices before pod start up" 2021-08-24 15:40:13 +00:00
Lo, Chi (cl566n)
122dcef629 Remove Kibana indices before pod start up
The ps removes kibana indices from elasticsearch when a pod
comes up. It also removes the source code in values.yaml for
the flush job since it is not needed at this point.

Change-Id: Icb0376fed4872308b26e608d5be0fbac504d802d
2021-08-23 21:31:39 +00:00
zhaoleilc
e81a86d574 Fix an attribute error
The corresponding attribute in roles/build-images/defaults/main.yml
is helm_repo instead of google_helm_repo.

Change-Id: Id1be29773224ea496a3550642d7ba194fd1e83c2
2021-08-23 22:42:47 +08:00
Gage Hugo
1062d68eed Revert "chore(tiller): removes tiller chart"
This reverts commit b2adfeadd8adbf5d99187106cf5d2956f0afeeab.

Reason for revert: This breaks the kubeadm jobs, lets add this
back until a proper fix is implemented.

Change-Id: I9b93c86e3747f2e768956898a27dd6d63469a8ee
2021-08-20 16:08:44 +00:00
root
45b50160f6 Update log format stream for mariadb
It is usefule for troubleshooting.

Change-Id: Ief9fb0c700e64717fe3a7f62b7b7c22ec1f84179
2021-08-20 16:43:40 +02:00
Parsons, Cliff (cp769u)
fa174c00db Fix ceph-provisioner rbd-healer error
This patchset fixes the following error which was recently introduced
by changing the cephcsi image version to v3.4.0:

E0816 18:37:30.966684   62307 rbd_healer.go:131] list volumeAttachments failed, err: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:ceph:clcp-ucp-ceph-provisioners-ceph-rbd-csi-nodeplugin" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0816 18:37:30.966758   62307 driver.go:208] healer had failures, err volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:ceph:clcp-ucp-ceph-provisioners-ceph-rbd-csi-nodeplugin" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope

Change-Id: Ia7cc61cf1df6690f25408b7aa8797e51d1c516ff
2021-08-17 19:24:55 +00:00
Roy Tang
3a76480c00 Update RabbitMQ probes
The current health check that is used for readiness and liveness
probes is considered intrusive and is prompt to produce false
positives[0]. The command is also deprecated and will be removed
in future version.  Updating the probes based on current
recommenation from community[1].

Ref:
[0] https://www.rabbitmq.com/monitoring.html#deprecations
[1] https://www.rabbitmq.com/monitoring.html#health-checks

Change-Id: I83750731150ff9a276f59e3c1288129581fceba5
2021-08-13 19:14:22 -04:00
Zuul
43ca88c91f Merge "[ceph-provisioner] Add ceph mon v2 port for ceph csi provisioner" 2021-08-12 15:05:38 +00:00
Lo, Chi (cl566n)
09dfafbd6b Enable TLS path between Curator and Elasticsearch
Elasticsearch is TLS enabled.  Curator needs to be configured to use
cacert when communicating with Elasticsearch.

Change-Id: Ia78458516d6c8f975e478d85643dc4436b70b87c
2021-08-11 18:28:05 +00:00
Chinasubbareddy Mallavarapu
c70b3fce5a [ceph-provisioner] Add ceph mon v2 port for ceph csi provisioner
This is to update ceph mon port from v1 to v2 for csi based rbd plugin.
also update cephcsi image to 3.4.0.

Change-Id: Ib6153730216dbd5a8d2f3f7b7dd0e88c7fd4389d
2021-08-11 17:59:38 +00:00
Gage Hugo
67ac5da9ed Update helm repo url
The googleapi repo has been causing issues and the latest
one is giving an unauthorized error when trying to download
helm tarball.

This change moves the repo to use the official helm one.

Change-Id: I52607b0ca6d650d5f5e4a95045389970faa08cfb
2021-08-11 16:18:44 +00:00
Zuul
a121f3d1c2 Merge "Enable TLS path between Prometheus-elasticsearch-exporter and Elasticsearch" 2021-08-06 23:50:15 +00:00
Zuul
f68e3312e1 Merge "[ceph-osd] Change var crash mount propagation to HostToContainer" 2021-08-06 22:30:44 +00:00
Gupta, Sangeet (sg774j)
ba998fc142 cert-rotation: Return true if grep finds no match
If grep does not find a match, it return 1 which fails the shell
script. Hence made it return true if no match is found.
Also, removed returning of error from the script becasue any failure
will cause the job to re-run which may re-renew certificates and
restart the pods again. And this can continue if the error persists.

Chaange-Id: I2a38b59789fd522e8163ff9b12ff847eb1fe2f3a
Change-Id: Ica456ef6c5bec2bd29f51aaeef7b5ce5e8681beb
2021-08-06 17:58:28 +00:00