Pass parameter from job allowing to parallelize helm tests using
separate scripts.
Change-Id: I3e06c5590d51c75448dc5ff5978dc7fc90daca6f
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
With this commit minikube is installed using contents of precreated
minikube-aio image containing installation script, all required binaries
and images inside. Pulling a single image from dockerhub via opendev
dockerhub proxy and loading images allows to save up to 6 minutes in
minikube installation.
Change-Id: I5936f440eb0567b8dcba2fdae614e4c5e88a7b9a
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This chart could deploy fluentd either as a Deployment
or a Daemonset. Both options would use the deployment-fluentd
template with various sections toggled off based on values.yaml
I'd like to know - Does anyone run this chart as a Deployment?
We can simplify the chart, and zuul gates, by changing the chart
to deploy a Daemonset specifically.
Change-Id: Ie88ceadbf5113fc60e5bb0ddef09e18fe07a192c
This change is to address a memory leak in the ceph-mgr deployment.
The leak has also been noted in:
https://review.opendev.org/#/c/711085
Without this change memory usage for the active ceph-mgr pod will
steadily increase by roughly 100MiB per hour until all available
memory has been exhausted. Reset messages will also be seen in the
active and standby ceph-mgr pod logs.
Sample messages:
---
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
---
The root cause of the resets and associated memory leak appears to
be due to multiple ceph pods sharing the same IP address (due to
hostNetwork being true) and PID (due to hostPID being false).
In the messages above the "1" at the end of the line is the PID.
Ceph appears to use the Version:IP:Port/PID (v2:10.0.0.226:6808/1)
tuple as a unique identifier. When hostPID is false conflicts arise.
Setting hostPID to true stops the reset messages and memory leak.
Change-Id: I9821637e75e8f89b59cf39842a6eb7e66518fa2c
1) Added to service account name insted of traditional pod name
to resolve for dynamic release names.
Change-Id: Ibf4c69415e69a7baca2e3b96bcb23851e68d07d8
The PS updates wait_for_inactive_pgs function:
- Changed the name of the function to wait_for_pgs
- Added a query for getting status of pgs
- All pgs should be in "active+" state at least three times in a row
Change-Id: Iecc79ebbdfaa74886bca989b23f7741a1c3dca16
The PS adds the check of target osd value. The expected amount of OSDs
should be always more or equal to existing OSDs. If there is more OSDs
than expected it means that the value is not correct.
Change-Id: I117a189a18dbb740585b343db9ac9b596a34b929
Some scrape targets require the use of TLS client certificates, which
are specified as filenames as part of the tls_config.
This change allows these client certs and keys to be provided, stores
them in a secret, and mounts them in the pod under /tls_configs.
Example:
tls_configs:
kubernetes-etcd:
ca.pem: |
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
crt.pem: |
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
key.pem: |
-----BEGIN RSA PRIVATE KEY-----
-----END RSA PRIVATE KEY-----
conf:
prometheus:
scrape_configs:
template: |
scrape_configs:
- job_name: kubernetes-etcd
scheme: https
tls_config:
ca_file: /tls_configs/kubernetes-etcd.ca.pem
cert_file: /tls_configs/kubernetes-etcd.cert.pem
key_file: /tls_configs/kubernetes-etcd.key.pem
Change-Id: I963c65dc39f1b5110b091296b93e2de9cdd980a4
Currently the Ceph helm tests pass when the deployed Ceph cluster
is unhealthy. This change expands the cluster status testing
logic to pass when all PGs are active and fail if any PG is
inactive.
The PG autoscaler is currently causing the deployment to deploy
unhealthy Ceph clusters. This change also disables it. It should
be re-enabled once those issues are resolved.
Change-Id: Iea1ff5006fc00e4570cf67c6af5ef6746a538058
There are two issues fixed here:
1) The "backoffLimit" and "activeDeadlineSeconds" are attributes of
Job, not CronJob. Therefore, they should be placed in the Job template
part of the cron-job-backup-mariadb.yaml
2) The backup cronjob had two names in the values.yaml
"backup_mariadb" and "mariadb_backup" in various places.
3) When empty table is used, the get_rows function of
restore_mariadb.sh exit with a code of 1, which causes the invoking
function to error out.
Change-Id: Ifa85b97f56e74f7994a2bde2e12c64fb0c9acafb
There are a couple of issues that need fixing:
1) "backoffLimit" and "activeDeadlineSeconds" attributes are placed in
the CronJob part of the cron-job-backup-postgres.yaml, but should be
placed in the Job template part.
2) The backup cronjob had two names in the values.yaml
"backup_postgresql" and "postgresql_backup" in various places. It should
be "postgresql_backup" in all of those places so that the CronJob can be
deployed correctly.
Change-Id: Ifd1c7c03ee947763ac073e55c6d74c211615c343
This is to update ceph scripts to create loopback devices
in single script and also to update gate scripts.
Change-Id: Id6e3c09dca20d98fcbcc434e65f790c06b6272e8
Reverting this change as the health checks are failing with permission denied.
Need to dig more and do through testing.
This reverts commit 0da55ad85ef621baa22887799e3146cecd93d368.
Change-Id: I9de78186a2c3a6d181bedfdb8b84abeecce46bd6