3299 Commits

Author SHA1 Message Date
Parsons, Cliff (cp769u)
b704b9ad02 Ceph OSD log-runner container should run as ceph user
This PS changes the log-runner user ID to run as the ceph user
so that it has the appropriate permissions to write to /var/log/ceph
files.

Change-Id: I4dfd956130eb3a19ca49a21145b67faf88750d6f
2021-08-27 21:04:15 +00:00
Ritchie, Frank (fr801x)
43fe7246fd Always set pg_num_min to the proper value
Currently if pg_num_min is less than the value specified in values.yaml
or overrides no change to pg_num_min is made during updates when the value
should be increased. This PS will ensure the proper value is always set.

Change-Id: I79004506b66f2084402af59f9f41cda49a929794
2021-08-25 18:59:53 -05:00
Zuul
797658b730 Merge "cert-rotation: Correct and enhance the rotation script." 2021-08-25 21:02:13 +00:00
Gupta, Sangeet (sg774j)
222f7b6877 cert-rotation: Correct and enhance the rotation script.
Corrected the counter increment and enhanced the script to handle
situation if the certificate is stuck in issuing state.

Change-Id: Ib8a84831a605bb3e5a1fc5b5a909c827ec864797
2021-08-25 15:57:35 +00:00
Parsons, Cliff (cp769u)
a0aec27ebc Fix Ceph checkDNS script
The checkDNS script which is run inside the ceph-mon pods has had
a bug for a while now. If a value of "up" is passed in, it adds
brackets around it, but then doesn't check for the brackets when
checking for a value of "up". This causes a value of "{up}" to be
written into the ceph.conf for the mon_host line and that causes
the mon_host to not be able to respond to ceph/rbd commands. Its
normally not a problem if DNS is working, but if DNS stops working
this can happen.

This patch changes the comparison to look for "{up}" instead of
"up" in three different files, which should fix the problem.

Change-Id: I89cf07b28ad8e0e529646977a0a36dd2df48966d
2021-08-25 14:17:54 +00:00
Zuul
6f427495f3 Merge "Fix an attribute error" 2021-08-24 16:46:35 +00:00
Zuul
955f566e2c Merge "Remove Kibana indices before pod start up" 2021-08-24 15:40:13 +00:00
Lo, Chi (cl566n)
122dcef629 Remove Kibana indices before pod start up
The ps removes kibana indices from elasticsearch when a pod
comes up. It also removes the source code in values.yaml for
the flush job since it is not needed at this point.

Change-Id: Icb0376fed4872308b26e608d5be0fbac504d802d
2021-08-23 21:31:39 +00:00
zhaoleilc
e81a86d574 Fix an attribute error
The corresponding attribute in roles/build-images/defaults/main.yml
is helm_repo instead of google_helm_repo.

Change-Id: Id1be29773224ea496a3550642d7ba194fd1e83c2
2021-08-23 22:42:47 +08:00
Gage Hugo
1062d68eed Revert "chore(tiller): removes tiller chart"
This reverts commit b2adfeadd8adbf5d99187106cf5d2956f0afeeab.

Reason for revert: This breaks the kubeadm jobs, lets add this
back until a proper fix is implemented.

Change-Id: I9b93c86e3747f2e768956898a27dd6d63469a8ee
2021-08-20 16:08:44 +00:00
root
45b50160f6 Update log format stream for mariadb
It is usefule for troubleshooting.

Change-Id: Ief9fb0c700e64717fe3a7f62b7b7c22ec1f84179
2021-08-20 16:43:40 +02:00
Parsons, Cliff (cp769u)
fa174c00db Fix ceph-provisioner rbd-healer error
This patchset fixes the following error which was recently introduced
by changing the cephcsi image version to v3.4.0:

E0816 18:37:30.966684   62307 rbd_healer.go:131] list volumeAttachments failed, err: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:ceph:clcp-ucp-ceph-provisioners-ceph-rbd-csi-nodeplugin" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0816 18:37:30.966758   62307 driver.go:208] healer had failures, err volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:ceph:clcp-ucp-ceph-provisioners-ceph-rbd-csi-nodeplugin" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope

Change-Id: Ia7cc61cf1df6690f25408b7aa8797e51d1c516ff
2021-08-17 19:24:55 +00:00
Roy Tang
3a76480c00 Update RabbitMQ probes
The current health check that is used for readiness and liveness
probes is considered intrusive and is prompt to produce false
positives[0]. The command is also deprecated and will be removed
in future version.  Updating the probes based on current
recommenation from community[1].

Ref:
[0] https://www.rabbitmq.com/monitoring.html#deprecations
[1] https://www.rabbitmq.com/monitoring.html#health-checks

Change-Id: I83750731150ff9a276f59e3c1288129581fceba5
2021-08-13 19:14:22 -04:00
Zuul
43ca88c91f Merge "[ceph-provisioner] Add ceph mon v2 port for ceph csi provisioner" 2021-08-12 15:05:38 +00:00
Lo, Chi (cl566n)
09dfafbd6b Enable TLS path between Curator and Elasticsearch
Elasticsearch is TLS enabled.  Curator needs to be configured to use
cacert when communicating with Elasticsearch.

Change-Id: Ia78458516d6c8f975e478d85643dc4436b70b87c
2021-08-11 18:28:05 +00:00
Chinasubbareddy Mallavarapu
c70b3fce5a [ceph-provisioner] Add ceph mon v2 port for ceph csi provisioner
This is to update ceph mon port from v1 to v2 for csi based rbd plugin.
also update cephcsi image to 3.4.0.

Change-Id: Ib6153730216dbd5a8d2f3f7b7dd0e88c7fd4389d
2021-08-11 17:59:38 +00:00
Gage Hugo
67ac5da9ed Update helm repo url
The googleapi repo has been causing issues and the latest
one is giving an unauthorized error when trying to download
helm tarball.

This change moves the repo to use the official helm one.

Change-Id: I52607b0ca6d650d5f5e4a95045389970faa08cfb
2021-08-11 16:18:44 +00:00
Zuul
a121f3d1c2 Merge "Enable TLS path between Prometheus-elasticsearch-exporter and Elasticsearch" 2021-08-06 23:50:15 +00:00
Zuul
f68e3312e1 Merge "[ceph-osd] Change var crash mount propagation to HostToContainer" 2021-08-06 22:30:44 +00:00
Gupta, Sangeet (sg774j)
ba998fc142 cert-rotation: Return true if grep finds no match
If grep does not find a match, it return 1 which fails the shell
script. Hence made it return true if no match is found.
Also, removed returning of error from the script becasue any failure
will cause the job to re-run which may re-renew certificates and
restart the pods again. And this can continue if the error persists.

Chaange-Id: I2a38b59789fd522e8163ff9b12ff847eb1fe2f3a
Change-Id: Ica456ef6c5bec2bd29f51aaeef7b5ce5e8681beb
2021-08-06 17:58:28 +00:00
Lo, Chi (cl566n)
830df06628 Enable TLS path between Prometheus-elasticsearch-exporter and Elasticsearch
Elasticsearch is TLS enabled.  Prometheus-elasticsearch-exporter
needs to be configured to use cacert when communicating with Elasticsearch.

Change-Id: I4a87226fed541777df78733f3650363859ff01b8
2021-08-06 10:02:18 -07:00
Gage Hugo
a4f300e3da Update helm 2 version to latest
The version of helm 2 that OSH has been using was older and seems
to have been removed from the googleapi repo that the jobs are
setup to use, this was causing job failures.

This change updates the version to the latest v2 release.

Change-Id: I675f539b24ea9c2355ac9eacc7dd8122c5236e5f
2021-08-06 10:56:50 -05:00
Zuul
b49919283d Merge "cert-rotation: New chart for certificate rotation" 2021-08-05 22:13:57 +00:00
Gupta, Sangeet (sg774j)
f94aed3c7a cert-rotation: New chart for certificate rotation
This chart creates a cronjob which monitors the expiry of the
certificates created by jetstack cert-manager. It rotates the
certificates and restarts the pods that mounts the certificate
secrets so that the new certificate can take effect.

Change-Id: I492b5f319cf0f2e7ccbbcf516953e17aafc1c59f
2021-08-05 17:46:15 +00:00
Chinasubbareddy Mallavarapu
7117c93772 [ceph-osd] Change var crash mount propagation to HostToContainer
- As it will be a security violation to mount anything under /var
partition to pods , changing the mount propagation to HostToContainer

Change-Id: If7a27304507a9d1bcb9efcef4fc1146f77080a4f
2021-08-05 14:33:06 +00:00
Zuul
8d00380469 Merge "Add Alertmanager dashboard to Grafana" 2021-08-03 18:25:20 +00:00
Zuul
48e4ce50ac Merge "namespace-config: Grant access to existing PSP" 2021-08-03 00:44:58 +00:00
Lo, Chi (cl566n)
5a290e1d83 Add Alertmanager dashboard to Grafana
This patch set adds a new Alertmanager dashboard to Grafana.  Note
that a new configmap is created for this instead of using the
same configmap which includes all the dashboards. Using the same
configmap will eventually run into issue with configmap size limitation.

Change-Id: I10561c0b0b464c3b67d4a738f9f2cb70ef601b3d
2021-08-02 16:05:47 -07:00
Zuul
8351fdd0f1 Merge "Use focal libvirt image for victoria and wallaby" 2021-08-02 18:36:31 +00:00
DeJaeger, Darren (dd118r)
f26d4db145 Update mon-check with latest monmap outputs
This PS updates the mon-check reap-zombies python script to consider
the more recent Ceph changes, including the fact that there is now
a v1 and v2 backend. In addition, it executes the reap-zombies script
with the python3 binary, as the basic 'python' binary does not exist
in the container.

Change-Id: Id079671f03cc5ddbe694f2aa8c9d2480dc573983
2021-08-02 13:16:39 +00:00
Phil Sphicas
3c4ebf0172 namespace-config: Grant access to existing PSP
This change updates the namespace-config chart to (optionally) create
RBAC rules allowing service accounts in the namespace 'use' access to an
existing Pod Security Policy in the cluster. The policy is specified as:

    podSecurityPolicy:
      existingPsp: name-of-existing-psp

This aligns with the PSP deprecation guidance provided to date [0],
which suggests easing the transition to the "PSP Replacement Policy" by
establishing the standard PSPs (Restricted, Baseline, and Privileged),
assigning a cluster-wide default, and binding more-permissive policies
as needed in certain namespaces.

[0] https://kubernetes.io/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/

Change-Id: I46da230abf822e0cc3553561fd779444439c34a7
2021-08-02 01:36:36 +00:00
Andrii Ostapenko
15b43d939e Use focal libvirt image for victoria and wallaby
Change-Id: I70a989aeaac3d763b110cc854e00fa33d5f8861a
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2021-07-31 20:54:08 +00:00
Zuul
9797d1b034 Merge "Revoke all privileges for PUBLIC role in postgres dbs" 2021-07-30 22:53:04 +00:00
Zuul
de4d8a02b0 Merge "Limit Ceph OSD Container Security Contexts" 2021-07-30 22:44:18 +00:00
Maximilian Weiss
bc754e088e Revoke all privileges for PUBLIC role in postgres dbs
Change-Id: I98102bd9c72264c7e364b50e0683e4777b42b0e7
2021-07-30 17:16:58 +00:00
Parsons, Cliff (cp769u)
6e794561ac Limit Ceph Provisioner Container Security Contexts
Wherever possible, the ceph-provisioner containers need to run
with the least amount of privilege required. In some cases there
are privileges granted but are not needed. This patchset modifies
those container's security contexts to reduce them to only what
is needed.

Change-Id: I74bd31df4af5cacc26834e645b0816bf285e8428
2021-07-29 20:25:07 +00:00
Parsons, Cliff (cp769u)
b55143dec2 Limit Ceph OSD Container Security Contexts
Wherever possible, the ceph-osd containers need to run with the
least amount of privilege required. In some cases there are
privileges granted but are not needed. This patchset modifies
those container's security contexts to reduce them to only what
is needed.

Change-Id: I0d6633efae7452fee4ce98d3e7088a55123f0a78
2021-07-29 20:24:37 +00:00
Chinasubbareddy Mallavarapu
bf5f545c1c [ceph-provisioner] Add check for empty ceph endpoint
This is to add check to find out empty ceph mon endpoint while
generating ceph etc configmap for clients.

Change-Id: I6579a268c5f4bc458120dda66667988e5a529ee9
2021-07-29 12:23:26 +00:00
Haider, Nafiz (nh532m)
adab36be22 Helm-Toolkit: Make Rabbit-init job more robust
Change-Id: I36ef7b2cdcf747ed2503ca5d27bc7803349f287d
2021-07-27 20:19:56 +00:00
Zuul
5e20097998 Merge "[ceph-osd] Mount /var/crash inside ceph-osd pods" 2021-07-27 15:40:59 +00:00
Ritchie, Frank (fr801x)
0acc0ce3dd Fix placement target delete function
You must specify the zone or zonegroup.

Change-Id: Id2bb6d5576ba39fb3671f7426e48f174fcf0016b
2021-07-23 17:21:44 -05:00
Stephen Taylor
c2ca599923 [ceph-osd] Mount /var/crash inside ceph-osd pods
This change adds /var/crash as a host-path volume mount for
ceph-osd pods in order to facilitate core dump capture when
ceph-osd daemons crash.

Change-Id: Ie517c64e08b11504f71d7d570394fbdb2ac8e54e
2021-07-20 15:30:19 -06:00
Neely, Travis (tn720x)
6169504761 Update db backup/restore retry for sending to remote
There is an additional error status 'Service Unavailable' which can
indicate the service is temporary unavailable. Adding that error
status to the retry list in case the issue is resolved during the
backup timeframe.

Change-Id: I9e2fc1a9b33dea3858de06b10d512da98a635015
2021-07-20 10:47:38 -05:00
Zuul
2c1a7b772b Merge "RabbitMQ add preStop and prep 3.8.x feature flag" 2021-07-14 18:49:22 +00:00
Zuul
d2822cbe16 Merge "Enable probes override from values.yaml for libvirt" 2021-07-14 07:06:26 +00:00
Anjeev Kumar
b11b4ae6c3 Enable probes override from values.yaml for libvirt
This PS enables overriding liveness/readiness probes configurations
for libvirt pods via values.yaml. In addition, updating the values
for some of the fields of the probes as the default values seem to
be too aggresive.

Change-Id: I64033a1d67461851d8f2d86905ef7068c2ec43b6

Co-authored-by: Huy Tran <ht095u@att.com>
Change-Id: Ib10379829e2989d3de385ad6d1944565b2f9953f
2021-07-13 14:08:59 -05:00
Roy Tang
479a1c7335 RabbitMQ add preStop and prep 3.8.x feature flag
This ps updates the following:
- Add preStop action to allow rabbitmq node a chance to more
  graceful shutdown
- Add support for RABBITMQ_FEATURE_FLAG in preparation for
  future 3.8.x upgrade.

Change-Id: I25d1e4fdb9dee370382e97a5a97b2b098f5ef11f
2021-07-13 14:57:03 -04:00
Zuul
4d2f78fee2 Merge "Added the helm hook for create user job for exporter" 2021-07-12 21:17:32 +00:00
Zuul
06a90742a1 Merge "Disable RGW crash dumps" 2021-07-12 21:12:03 +00:00
Zuul
db7aad8b84 Merge "Added helm hook for rabbitmq job cluster wait" 2021-07-12 20:45:59 +00:00