2651 Commits

Author SHA1 Message Date
Steven Fitzpatrick
fb7fc87d23 Prometheus: Render Rules as Templates
This change allows us to substitute values into our rules files.

Example:

- alert: my_region_is_down
  expr: up{region="{{ $my_region }}"} == 0
  
To support this change, rule annotations that used the expansion
{{ $labels.foo }} had to be surrounded with "{{` ... `}}" to render
correctly.

Change-Id: Ia7ac891de8261acca62105a3e2636bd747a5fbea
2020-08-10 18:16:35 +00:00
Zuul
3fa84d655f Merge "Add Application Armor to Ceph-Provisioners-config test" 2020-08-04 17:56:27 +00:00
Zuul
8d8d53c65c Merge "feat(tls): add tls to prometheus-openstack-exporter" 2020-08-04 14:54:26 +00:00
Zuul
8b09a07423 Merge "Fix overrides diff" 2020-08-03 23:05:18 +00:00
Zuul
262fa219d0 Merge "Remove updateStrategy of childresources of DaemonJobController." 2020-08-03 22:43:40 +00:00
Gupta, Sangeet (sg774j)
4d512f6eff feat(tls): add tls to prometheus-openstack-exporter
This patchset enables passing of tls certificate to
openstack.

Change-Id: I370d69d8747ce894684dbff87b3580b6d1e82647
2020-08-03 22:20:34 +00:00
Zuul
9ed951aa32 Merge "[Ceph-client] Add check of target osd value" 2020-08-03 21:31:09 +00:00
Zuul
c0b86523a7 Merge "[ceph-client] update logic of inactive pgs check" 2020-08-03 20:12:06 +00:00
Frank Ritchie
5909bcbdef Use hostPID for ceph-mgr deployment
This change is to address a memory leak in the ceph-mgr deployment.
The leak has also been noted in:

https://review.opendev.org/#/c/711085

Without this change memory usage for the active ceph-mgr pod will
steadily increase by roughly 100MiB per hour until all available
memory has been exhausted. Reset messages will also be seen in the
active and standby ceph-mgr pod logs.

Sample messages:

---

0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1

---

The root cause of the resets and associated memory leak appears to
be due to multiple ceph pods sharing the same IP address (due to
hostNetwork being true) and PID (due to hostPID being false).
In the messages above the "1" at the end of the line is the PID.
Ceph appears to use the Version:IP:Port/PID (v2:10.0.0.226:6808/1)
tuple as a unique identifier. When hostPID is false conflicts arise.

Setting hostPID to true stops the reset messages and memory leak.

Change-Id: I9821637e75e8f89b59cf39842a6eb7e66518fa2c
2020-08-03 17:35:51 +00:00
dt241s@att.com
4c46b2662a Add Application Armor to Ceph-Provisioners-config test
1) Added  to service account name insted of traditional pod name
   to resolve for dynamic release names.

Change-Id: Ibf4c69415e69a7baca2e3b96bcb23851e68d07d8
2020-08-03 16:42:53 +00:00
Kabanov, Dmitrii
f6d6ae051d [ceph-client] update logic of inactive pgs check
The PS updates wait_for_inactive_pgs function:
- Changed the name of the function to wait_for_pgs
- Added a query for getting status of pgs
- All pgs should be in "active+" state at least three times in a row

Change-Id: Iecc79ebbdfaa74886bca989b23f7741a1c3dca16
2020-08-03 08:42:58 -07:00
Kabanov, Dmitrii
47ce52a5cf [Ceph-client] Add check of target osd value
The PS adds the check of target osd value. The expected amount of OSDs
should be always more or equal to existing OSDs. If there is more OSDs
than expected it means that the value is not correct.

Change-Id: I117a189a18dbb740585b343db9ac9b596a34b929
2020-08-03 15:38:14 +00:00
Zuul
3ce0170da8 Merge "Prometheus: Allow input of TLS client creds in values.yaml" 2020-08-01 22:47:56 +00:00
Andrii Ostapenko
cf90f32e8b
Fix overrides diff
Check proper path.

Change-Id: Icd3d0711fb530b77d049227b09904c433e26dc78
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-31 21:52:59 -05:00
Zuul
f79704a8f0 Merge "[ceph-client] Fix a helm test issue and disable PG autoscaler" 2020-07-31 22:09:17 +00:00
Zuul
3f9006ccae Merge "Enable Read-Only for Node-Problem Detector" 2020-07-31 21:39:06 +00:00
Zuul
b5250bb517 Merge "Fix postgresql backup cronjob deployment issues" 2020-07-31 20:13:21 +00:00
Zuul
a57fde9051 Merge "Fix MariaDB backup cronjob" 2020-07-31 18:41:32 +00:00
Zuul
71ec51cdb4 Merge "[CEPH] OSH-INFRA: Update ceph scripts to create loopback devices" 2020-07-31 17:50:01 +00:00
Phil Sphicas
5d8cf965c1 Prometheus: Allow input of TLS client creds in values.yaml
Some scrape targets require the use of TLS client certificates, which
are specified as filenames as part of the tls_config.

This change allows these client certs and keys to be provided, stores
them in a secret, and mounts them in the pod under /tls_configs.

Example:

    tls_configs:
      kubernetes-etcd:
        ca.pem: |
          -----BEGIN CERTIFICATE-----
          -----END CERTIFICATE-----
        crt.pem: |
          -----BEGIN CERTIFICATE-----
          -----END CERTIFICATE-----
        key.pem: |
          -----BEGIN RSA PRIVATE KEY-----
          -----END RSA PRIVATE KEY-----

    conf:
      prometheus:
        scrape_configs:
          template: |
            scrape_configs:
              - job_name: kubernetes-etcd
                scheme: https
                tls_config:
                  ca_file: /tls_configs/kubernetes-etcd.ca.pem
                  cert_file: /tls_configs/kubernetes-etcd.cert.pem
                  key_file: /tls_configs/kubernetes-etcd.key.pem

Change-Id: I963c65dc39f1b5110b091296b93e2de9cdd980a4
2020-07-31 16:31:52 +00:00
Stephen Taylor
84f1557566 [ceph-client] Fix a helm test issue and disable PG autoscaler
Currently the Ceph helm tests pass when the deployed Ceph cluster
is unhealthy. This change expands the cluster status testing
logic to pass when all PGs are active and fail if any PG is
inactive.

The PG autoscaler is currently causing the deployment to deploy
unhealthy Ceph clusters. This change also disables it. It should
be re-enabled once those issues are resolved.

Change-Id: Iea1ff5006fc00e4570cf67c6af5ef6746a538058
2020-07-31 14:46:10 +00:00
diwakar thyagaraj
e986c6f8c3 Enable Read-Only for Node-Problem Detector
Change-Id: I1f45455abcd812d2c4df186f7047949230f210fd
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-07-30 23:34:57 +00:00
Luna Das
1cfa619097 Remove updateStrategy of childresources of DaemonJobController.
change updateStrategy from Inplace to default onDelete.

Change-Id: Ie85e2ba116ab399c65844e0bb66eecc66f6d9c90
2020-07-31 00:19:16 +05:30
Gupta, Sangeet (sg774j)
8633b93548 feat(tls): add tls to swift user and service of ceph-rgw
This patch adds certs needed for swift user and ceph service to
communicate with keystone.

Change-Id: I4de035f6fe2138c1d1022140c7571fac91ed1a84
2020-07-30 18:20:46 +00:00
Huang, Sophie (sh879n)
f57aad9822 Fix MariaDB backup cronjob
There are two issues fixed here:
1) The "backoffLimit" and "activeDeadlineSeconds" are attributes of
Job, not CronJob. Therefore, they should be placed in the Job template
part of the cron-job-backup-mariadb.yaml
2) The backup cronjob had two names in the values.yaml
"backup_mariadb" and "mariadb_backup" in various places.
3) When empty table is used, the get_rows function of
restore_mariadb.sh exit with a code of 1, which causes the invoking
function to error out.

Change-Id: Ifa85b97f56e74f7994a2bde2e12c64fb0c9acafb
2020-07-30 15:51:30 +00:00
Zuul
d5aff1df64 Merge "Improve overrides diff script" 2020-07-30 05:53:17 +00:00
Parsons, Cliff (cp769u)
c10de970c3 Fix postgresql backup cronjob deployment issues
There are a couple of issues that need fixing:
1) "backoffLimit" and "activeDeadlineSeconds" attributes are placed in
the CronJob part of the cron-job-backup-postgres.yaml, but should be
placed in the Job template part.
2) The backup cronjob had two names in the values.yaml
"backup_postgresql" and "postgresql_backup" in various places. It should
be "postgresql_backup" in all of those places so that the CronJob can be
deployed correctly.

Change-Id: Ifd1c7c03ee947763ac073e55c6d74c211615c343
2020-07-29 22:39:59 +00:00
Gupta, Sangeet (sg774j)
738f62db5a mariadb: fix new line issue
Change-Id: Ibd45968900d06f7a3059aa184ed272fa99ad36d5
2020-07-29 19:17:35 +00:00
Chinasubbareddy Mallavarapu
4358251073 [CEPH] OSH-INFRA: Update ceph scripts to create loopback devices
This is to update ceph scripts to create loopback devices
in single script and also to update gate scripts.

Change-Id: Id6e3c09dca20d98fcbcc434e65f790c06b6272e8
2020-07-29 10:05:37 -05:00
Rahul Khiyani
3978c6a33c Revert "Add missing pod level security context template for mariadb-backup"
Reverting this change as the health checks are failing with permission denied.
Need to dig more and do through testing.

This reverts commit 0da55ad85ef621baa22887799e3146cecd93d368.

Change-Id: I9de78186a2c3a6d181bedfdb8b84abeecce46bd6
2020-07-29 14:26:28 +00:00
diwakar thyagaraj
b82a146640 [FIX] Apparmor to Node-problem Detector
Change-Id: I11876e7ca9af3e37071716c34ccdb9361f98828d
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-07-28 21:22:48 +00:00
Gupta, Sangeet (sg774j)
347ec225ed mariadb: Fix the indentation
Change-Id: Ibef80effb626024f9dc947bc1c372df3120bff2d
2020-07-28 12:29:13 +00:00
Gupta, Sangeet (sg774j)
d458e888a9 feat(tls): add tls to mariadb exporter charts
This patchset updates the .cnf files to support tls and mount
the certificates where needed.

Change-Id: I5aff6821f2649f55dd4444896379491b504415bb
2020-07-27 21:41:46 +00:00
Zuul
802655703e Merge "Use zuul docker mirror for functional jobs" 2020-07-27 11:18:08 +00:00
Zuul
189c72a7bf Merge "Fix tiller metrics port exposure issue for minikube" 2020-07-24 16:43:54 +00:00
Zuul
3cefa120cf Merge "Fluentd & Elasticsaerch: Use the latest openstackhelm image tag" 2020-07-24 05:39:09 +00:00
Zuul
b5b321fdf1 Merge "Change for alertmanager v0.20" 2020-07-24 05:39:08 +00:00
Zuul
64c3ca4f17 Merge "Correct a typo in the code" 2020-07-24 05:39:06 +00:00
Andrii Ostapenko
d103da6c06
Fix tiller metrics port exposure issue for minikube
Along with fixing the bug, with this we'll decrease build time for
all jobs using minikube and collecting tiller metrics for more than
2 minutes.

Change-Id: Ia166584eae48c643248f977b959aa6336e3a327e
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-23 23:39:35 -05:00
diwakar thyagaraj
936397b36a Add Application Armor to Ceph-Provisioners-key-generator
1) Added  to service account name insted of traditional pod name.

Change-Id: I1c7ba9081ccf396b037861b496110251f2248fd2
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
2020-07-23 14:15:04 +00:00
zhaoleilc
a4fcfaaa1f Correct a typo in the code
This patch changes 'feild' to 'field' in
helm-toolkit/templates/endpoints/
_endpoint_host_lookup.tpl

Change-Id: I14d346d74d5a72d67c290571e8b7812f01d4526e
2020-07-23 19:49:30 +08:00
Andrii Ostapenko
68097edf36 Improve overrides diff script
Change-Id: I4af33e57ee31c0d4f52afb3e2ff248039333f702
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-23 02:25:36 +00:00
willxz
c97c592216 Change for alertmanager v0.20
- Update alertmanger and prometheus discovery port from 6783 to 9094
- Update to support fqdn for discovery hostname
- Add one test alert to Prometheus to test alert pipeline
- update container name from alertmanger to prometheus-alertmanager

Change-Id: Iec5e758e4b576dff01e84591a2440d030d5ff3c4
2020-07-22 17:39:09 -04:00
Steven Fitzpatrick
68cd0027d1 Fluentd & Elasticsaerch: Use the latest openstackhelm image tag
Also, removed an unnecessary image reference from the fluentd chart

Change-Id: Ic9ce88f5ddc5096b2eed2ed2286bc73fe6dd5e73
2020-07-22 16:35:16 -05:00
Zuul
68940203db Merge "Add openstack-helm-infra to required projects for infra jobs" 2020-07-22 19:59:56 +00:00
Zuul
58aa31c8bc Merge "[ceph] Add noup flag check to helm tests" 2020-07-22 19:59:54 +00:00
Zuul
e8a18f35ab Merge "Updated the Node Problem Detector chart" 2020-07-22 18:56:43 +00:00
Zuul
66b59f6ad5 Merge "Unpin nagios, osh-selenium and heat images for grafana and nagios" 2020-07-22 18:23:41 +00:00
Andrii Ostapenko
553af32beb Add openstack-helm-infra to required projects for infra jobs
Required to run osh-infra jobs from another projects.

Change-Id: Iba1deb6ff4c6e7c3582d90f9175b2d3953bfd4d8
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
2020-07-22 17:17:31 +00:00
Zuul
b985663954 Merge "Add missing pod level security context template for mariadb-backup" 2020-07-22 15:42:16 +00:00