openstack-helm-infra

Author	SHA1	Message	Date
Zuul	b21126fed1	Merge "Add elasticsearch ILM functionality"	2021-01-22 23:43:08 +00:00
Graham Steffaniak	c1241918c2	Add elasticsearch ILM functionality Add functionality to delete indexes older than 14 days. ILM api will handle deleting indexes. Change-Id: I22c02af78b6ce979d0c70b420c106917b0fc5a4e	2021-01-21 09:02:57 -06:00
Gage Hugo	2a1677a36a	Add reno job to openstack-helm-infra repo With OSH now publishing charts regularly with each change, there needs to be a way to track these changes in order to track the changes between chart versions. This proposed change adds in a reno check job to publish notes based from the changes to each chart by version as a way to track and document all the changes that get made to OSH-infra and published to tarballs.o.o. Change-Id: I5e6eccc4b34a891078ba816249795b2bf1921a62	2021-01-21 14:36:59 +00:00
Zuul	1336da0c6f	Merge "Update Grafana version"	2021-01-20 22:23:58 +00:00
Zuul	be1c673fba	Merge "[ceph-osd] Fix a bug with DB orphan volume removal"	2021-01-19 22:35:58 +00:00
Meghan	0e66ef972a	Update Grafana version This brings the Grafana version up to the current version and fixes the selenium helm and gate test for the new login dashboard. Change-Id: I0b65412f4689c763b3f035055ecbb4ca63c21048	2021-01-19 12:36:59 -08:00
Zuul	9f0b100f5e	Merge "Improvements for ceph-client helm tests"	2021-01-19 18:29:49 +00:00
Stephen Taylor	b2c0028349	[ceph-osd] Fix a bug with DB orphan volume removal The volume naming convention prefixes logical volume names with ceph-lv-, ceph-db-, or ceph-wal-. The code that was added recently to remove orphaned DB and WAL volumes does a string replacement of "db" or "wal" with "lv" when searching for corresponding data volumes. This causes DB volumes to get identified incorrectly as orphans and removed when "db" appears in the PV UUID portion of the volume name. Change-Id: I0c9477483b70c9ec844b37a6de10a50c0f2e1df8	2021-01-19 10:10:38 -07:00
Parsons, Cliff (cp769u)	970c23acf4	Improvements for ceph-client helm tests This commit introduces the following helm test improvement for the ceph-client chart: 1) Reworks the pg_validation function so that it allows some time for peering PGs to finish peering, but fail if any other critical errors are seen. The actual pg validation was split out into a function called check_pgs(), and the pg_validation function manages the looping aspects. 2) The check_cluster_status function now calls pv_validation if the cluster status is not OK. This is very similar to what was happening before, except now, the logic will not be repeated. Change-Id: I65906380817441bd2ff9ff9cfbf9586b6fdd2ba7	2021-01-18 16:12:33 +00:00
sgupta	f60c94fc16	feat(tls): Change Issuer to ClusterIssuer ClusterIssuer does not belong to a single namespace (unlike Issuer) and can be referenced by Certificate resources from multiple different namespaces. When internal TLS is added to multiple namespaces, same ClusterIssuer can be used instead of one Issuer per namespace. Change-Id: I1576f486f30d693c4bc6b15e25c238d8004b4568	2021-01-15 18:46:09 +00:00
Apurva Gokani	25aa369025	postgres archive cleanup script This change adds cleanup mechanism to archive by following steps: 1) add archive_cleanup.sh under /tmp directory 2) through the start.sh this script will be triggered 3) It runs every hour, checking utilization of archive dir 4) If it is above threshold it deletes half of old files Change-Id: I918284b0aa5a698a6028b9807fcbf6559ef0ff45	2021-01-14 16:21:14 +00:00
Zuul	204c51a669	Merge "Run as ceph user and disallow privilege escalation"	2021-01-12 20:09:50 +00:00
Zuul	6af7303516	Merge "Add elasticsearch snapshot policy template for SLM"	2021-01-12 18:08:08 +00:00
Stephen Taylor	4c097b0300	[ceph-osd] dmsetup remove logical devices using correct device names Found another issue in disk_zap() where a needed update was missed when https://review.opendev.org/c/openstack/openstack-helm-infra/+/745166 changed the logical volume naming convention. The above patch set renamed volumes that followed the old convention, so this logic will never be correct and must be updated. Also added logic to clean up orphaned DB/WAL volumes if they are encountered and removed some cases where a data disk is marked as in use when it isn't set up correctly. Change-Id: I8deeecfdb69df1f855f287caab8385ee3d6869e0	2021-01-11 14:49:43 -07:00
Phil Sphicas	f08d30df6b	Use HostToContainer mountPropagation For any host mounts that include /var/lib/kubelet, use HostToContainer mountPropagation, which avoids creating extra references to mounts in other containers. Affects the following resources: * ingress deployment * openvswitch-vswitchd daemonset Change-Id: I5964c595210af60d54158e6f7c962d5abe77fc2f	2021-01-07 20:29:24 +00:00
Zuul	96e002c64e	Merge "Fix spacing inconsistencies with flags"	2021-01-06 20:44:41 +00:00
Smith, David (ds3330)	1934d32cdd	Fix spacing inconsistencies with flags Change-Id: I83676f62a4cfc7d8e20145a72f28eeab5ef4cc8d	2021-01-06 00:16:16 +00:00
jh629g	67618474ce	Update default Kubernetes API for use with Helm v3 Updated Kubernetes api from extensions/v1beta1 to networking.k8s.io/v1beta1 per docs[0] for kubernetes 1.16 deprecations as helm v3 linting will fail when it parses extensions/v1beta1 seen here[1] [0] https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/ [1] https://zuul.opendev.org/t/openstack/build/82f92508fb31418aa377f91d62e0d42e Change-Id: I0439272587a2afbccc4d7c49ef6ad053c8b305e7	2021-01-05 16:43:38 +00:00
Frank Ritchie	abf8d1bc6e	Run as ceph user and disallow privilege escalation This PS is to address security best practices concerning running containers as a non-privileged user and disallowing privilege escalation. Ceph-client is used for the mgr and mds pods. Change-Id: Idbd87408c17907eaae9c6398fbc942f203b51515	2021-01-04 12:58:09 -05:00
Graham Steffaniak	fcb4681cb1	Add elasticsearch snapshot policy template for SLM ADD: new snapshot policy template job which creates templates for ES SLM manager to snapshot indicies instead of curator. Change-Id: I629d30691d6d3f77646bde7d4838056b117ce091	2020-12-29 15:55:53 +00:00
Zuul	3ded481794	Merge "Fix openvswitch gate issue for multinode"	2020-12-29 02:18:56 +00:00
jh629g	63f0bc364e	Update hardcoded Google Resource URLs Kubernetes charts from google are deprecated resources. Updated to helm repositories for kubernetes charts per [0] [0] https://helm.sh/blog/new-location-stable-incubator-charts/ Change-Id: I31f29d8576b3d7e8a5ac1d14faa26f0fd6ba77a1	2020-12-23 15:16:09 +00:00
Stephen Taylor	213596d71c	[ceph-osd] Correct naming convention for logical volumes in disk_zap() OSD logical volume names used to be based on the logical disk path, i.e. /dev/sdb, but that has changed. The lvremove logic in disk_zap() is still using the old naming convention. This change fixes that. Change-Id: If32ab354670166a3c844991de1744de63a508303	2020-12-17 09:29:51 -07:00
Zuul	81f928544b	Merge "[ceph-osd] Alias synchronized commands and fix descriptor leak"	2020-12-16 20:51:51 +00:00
Zuul	794ee8ae6e	Merge "Elasticsearch: Update to 7.6.2 image"	2020-12-16 20:01:52 +00:00
Stephen Taylor	885285139e	[ceph-osd] Alias synchronized commands and fix descriptor leak There are many race conditions possible when multiple ceph-osd pods are initialized on the same host at the same time using shared metadata disks. The locked() function was introduced a while back to address these, but some commands weren't locked, locked() was being called all over the place, and there was a file descriptor leak in locked(). This change cleans that up by by maintaining a single, global file descriptor for the lock file that is only opened and closed once, and also by aliasing all of the commands that need to use locked() and removing explicit calls to locked() everywhere. The global_locked() function has also been removed as it isn't needed when individual commands that interact with disks use locked() properly. Change-Id: I0018cf0b3a25bced44c57c40e33043579c42de7a	2020-12-16 07:22:15 -07:00
Steven Fitzpatrick	6c05fee08d	Elasticsearch: Update to 7.6.2 image Change-Id: Ic0f5b6c802938ca91726210c43f81d2c73969575	2020-12-14 20:29:16 +00:00
Gupta, Sangeet (sg774j)	1f3fe0cb45	Fix openvswitch gate issue for multinode Add openvswitch gate issue with systemd 237-3ubuntu10.43 to multinode also. Added code from [0]. Additionally, made changes to support 1.18.9 version of kubeadm. [0] https://review.opendev.org/c/openstack/openstack-helm-infra/+/763619 Change-Id: I2681feb1029e5535f3f278513e8aece821c715f1	2020-12-11 17:10:55 +00:00
Frank Ritchie	9b1ac0ffcb	Enable shareProcessNamespace in mon daemonset This is to address zombie processes found in ceph-mon containers due to the mon-check.sh monitoring script. With shareProcessNamespace the /pause container will properly handle the defunct processes. Change-Id: Ic111fd28b517f4c9b59ab23626753e9c73db1b1b	2020-12-11 11:57:39 -05:00
Zuul	90a0fd7252	Merge "Collect dpkg -l for host"	2020-12-08 15:46:05 +00:00
Zuul	b952e99828	Merge "Update to container image repo k8s.gcr.io"	2020-12-07 23:37:31 +00:00
Zuul	a26891e5be	Merge "[ceph-osd] Remove default OSD configuration"	2020-12-07 23:14:18 +00:00
Zuul	91df918e87	Merge "Update build-chart playbook"	2020-12-07 22:27:13 +00:00
Chris Wedgwood	82a828ce8d	Update to container image repo k8s.gcr.io gcr.io/google_containers/ no longer contains the image versions we require, use the new location. Change-Id: Iabb9e672e494f27d1a3691a9ce0dd2ccf10d5797	2020-12-07 19:34:09 +00:00
Zuul	e4683420d7	Merge "Revert "Don't use opendev docker proxy""	2020-12-07 14:51:47 +00:00
Gage Hugo	4047ff0fd4	Update build-chart playbook The build-chart playbook task to point each chart to helm-toolkit has a find command that when used with another repo, will include the charts for osh-infra as well. This change modifies the playbook to only modify requirements in charts in the repo being published. Change-Id: I493b4c64fe2525bac0acae06bd40c3896c918e20	2020-12-04 17:31:18 -06:00
Gayathri Devi Kathiri	20d2aa1553	Update Rabbitmq exporter version With current version of rabbitmq-exporter, unable to retrieve data sometimes, failing with rabbitmq timeout issues. Rabbitmq timeout threshold is set as 10 sec and is not configurable with current version. Updating the rabbitmq-exporter version to kbudde/rabbitmq-exporter:v1.0.0-RC7.1 (Default "RABBITMQ_TIMEOUT" set as 30 sec) to solve rabbitmq timeout issues. Change-Id: Ia51f368a1bba2b0fd9195cf9991b55864cdebfc1	2020-12-04 11:01:11 +00:00
Andrii Ostapenko	7be813374f	Collect dpkg -l for host Change-Id: I8886e2bacb74f95ac117aad07c831c5c3803d5c0 Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>	2020-12-03 15:21:29 -06:00
Zuul	9187633822	Merge "Rabbitmq-exporter: Add configurable RABBIT_TIMEOUT parameter"	2020-12-03 20:56:00 +00:00
Gage Hugo	7fdf282271	Revert "Don't use opendev docker proxy" This reverts commit 42f3b3eaf5a8794b1f247915fffbef68137e6c1c. Reason for revert: dockerhub now sets a hard limit on daily pulls, lets switch back to using the opendev docker proxy. Change-Id: I87e399c89d5736f39d7bdba2011655e5f5766180	2020-12-03 19:42:47 +00:00
Zuul	970ec5128a	Merge "Make publish jobs more generic"	2020-12-02 19:51:46 +00:00
Gayathri Devi Kathiri	d7107a5c5c	Rabbitmq-exporter: Add configurable RABBIT_TIMEOUT parameter This PS adds RABBIT_TIMEOUT parameter as configurable with kbudde/rabbitmq-exporter:v1.0.0-RC7.1 version Change-Id: I8faf8cd706863f65afb5137d93a7627d421270e9	2020-12-02 16:42:49 +00:00
Zuul	59164428d3	Merge "Fluentd: Add Configurable Readiness and Liveness Probes"	2020-12-01 20:22:57 +00:00
Singh, Jasvinder (js581j)	ae96308ef1	[ceph-osd] Remove default OSD configuration The default, directory-based OSD configuration doesn't appear to work correctly and isn't really being used by anyone. It has been commented out and the comments have been enhanced to document the OSD config better. With this change there is no default configuration anymore, so the user must configure OSDs properly in their environment in values.yaml in order to deploy OSDs using this chart. Change-Id: I8caecf847ffc1fefe9cb1817d1d2b6d58b297f72	2020-12-01 10:44:21 -07:00
Steven Fitzpatrick	29489acf39	Fluentd: Add Configurable Readiness and Liveness Probes This change updates the fluentd chart to use HTK probe templates to allow configuration by value overrides Change-Id: I97a3cc0832554a31146cd2b6d86deb77fd73db41	2020-11-30 18:39:07 +00:00
Taylor, Stephen (st053q)	e37d1fc2ab	[ceph-osd] Add a check for misplaced objects to the post-apply job OSD failures during an update can cause degraded and misplaced objects. The post-apply job restarts OSDs in failure domain batches in order to accomplish the restarts efficiently. There is already a wait for degraded objects to ensure that OSDs are not restarted on degraded PGs, but misplaced objects could mean that multiple object replicas exist in the same failure domain, so the job should wait for those to recover as well before restarting OSDs in order to avoid potential disruption under these failure conditions. Change-Id: I39606e388a9a1d3a4e9c547de56aac4fc5606ea2	2020-11-30 10:17:40 -07:00
Zuul	3205c8b778	Merge "Fix values_overrides directory naming"	2020-11-27 19:22:34 +00:00
Zuul	5600c76e0b	Merge "Changing the kube version to 1.18.9"	2020-11-27 19:20:56 +00:00
MirgDenis	5f6adeca06	Fix values_overrides directory naming According to get-values-overrides.sh script it is expected to have values_overrides directory, not value_overrides. Change-Id: I53744117af6962d51519bc1d96329129473d9970	2020-11-27 10:59:20 +02:00
Taylor, Stephen (st053q)	791b0de5ee	[ceph-osd] Fix post-apply job failure related to fault tolerance A recent change to wait_for_pods() to allow for fault tolerance appears to be causing wait_for_pgs() to fail and exit the post- apply script prematurely in some cases. The existing wait_for_degraded_objects() logic won't pass until pods and PGs have recovered while the noout flag is set, so the pod and PG waits can simply be removed. Change-Id: I5fd7f422d710c18dee237c0ae97ae1a770606605	2020-11-24 06:30:37 -07:00

... 2 3 4 5 6 ...

3051 Commits