openstack-helm-infra

Author	SHA1	Message	Date
Doug Aaser	c5a85ee117	Pg_rewind failure fix This commit fixes a small issue with Patroni where sometimes pg_rewind would fail due to limitations in Postgres 9.5. To combat pg_rewind failures, we can enable remove_data_directory_on_rewind_failure which will cleanup the data directory on the pod and recreates it as a replica so that the pod can restart from fresh, rather than churning in an error state. This commit also sets remove_data_directory_on_diverged_timelines to give Patroni a greater ability to combat timeline divergence errors. Change-Id: Ic9f75dbfa0dd990e2b215ed204e55cd67a5d1159	2019-08-26 18:37:12 +00:00
Scott Hussey	5a7693cd62	(postgres) Add override for termination period - Allow configuration of the termination grace period for the Patroni pod with a default of 180s to ensure the database has time to gracefully spin down, even on slow disk. Change-Id: I420cbd601bbffa50217b717bd4a636d48d324617	2019-08-25 07:21:53 -05:00
Zuul	f0306ce33d	Merge "Sync wait-for-pods script with the one from openstack-helm"	2019-08-24 07:45:38 +00:00
Kabanov, Dmitrii	ed8ff0d6fa	Ceph-RGW: fix helm test The PS allows to run the tests when both options (rgw_ks and rgw_s3) are enabled at the same time. Change-Id: I262baa38b7c65ff9335a3db6a6e2a454c3ff3f5f	2019-08-22 17:00:40 +00:00
Pete Birley	a5682e7db3	MairaDB: Move all config to be values driven This PS moves to drive all mariadb config via the values fed to the chart. Change-Id: I4ed3624737af4d5c90b1b5de451a0a0b75a5eda1 Signed-off-by: Pete Birley <pete@port.direct>	2019-08-21 14:08:25 -05:00
Pete Birley	aba044cb0e	Mariadb: define timeouts for wsrep This PS updates the wsrep_provider_options to define the timeouts explitlcitly for evs.suspect_timeout, gmcast.peer_timeout. Their defaults are PT5S, and PT3S respectively, which are increased by a factor of approx 5, to accomdate network instability that may occur during node outage events. Change-Id: Ie5cdd06d91299e5e2632b70cb9b50a7ad14f62b1 Signed-off-by: Pete Birley <pete@port.direct>	2019-08-21 14:48:05 +00:00
Zuul	7c2c148fb0	Merge "Enable probes override from values.yaml for ovs"	2019-08-21 12:08:55 +00:00
Zuul	6639d0916b	Merge "Enhance HTK Job Manifests to be more flexible"	2019-08-20 17:45:31 +00:00
rajesh.kudaka	2b66685594	Enable probes override from values.yaml for ovs This commit enables overriding liveness/readiness probes configurations for openvswitch pods from values.yaml Change-Id: I4ec2b9e88bf8ed57e8ac9293f333969b63cef335	2019-08-19 16:34:03 +00:00
Chinasubbareddy Mallavarapu	1ff4811f06	[ceph-provisioner] Enable pvc resize feature This is to enable pvc resize feature so that pvc can be resized when need. Change-Id: Ib5840b10087b39884cfd2249017c974aac407b30	2019-08-16 16:21:05 -05:00
Pai, Radhika (rp592h)	f6ff42061f	Grafana: Updated the Ceph-Cluster variable sorting Earlier the query used to sort variables in asc alphabetical order, updated to sort the cluster in desc alphabetical order Change-Id: I8f08a44b05d0159ad1a043f052751d44b4625f1d	2019-08-16 16:07:47 +00:00
sg774j	87afa2fb8c	Rabbitmq: Correct reset_rabbit function Made correction to this function to not attempt to delete /var/lib/rabbitmq/ Change-Id: Ied16be1ec83d528f2660ef96389c3f236983aa79	2019-08-15 18:22:01 +00:00
BARTRA, RICK	f5df62d836	Run rabbitmq container with rabbitmq user This change makes rabbitmq container run with the rabbitmq user instead of the root user. As the rabbitmq user doesn't have write access to '/run' directory, the templates are updated to use the '/tmp' directory instead which the rabbitmq user has write access to. Change-Id: Ia35c3f741fefe3172c93bb042bf8d26bf7672cfc	2019-08-14 17:48:40 +00:00
Zuul	20dafdaddb	Merge "Nagios – API Handling – HTTP Security Headers Not Present"	2019-08-14 00:59:23 +00:00
Zuul	a381200e8c	Merge "Disable cephfs provisioner in multinode jobs"	2019-08-14 00:48:32 +00:00
Zuul	e11e9734bd	Merge "Minikube: Expose Tiller http port for metrics"	2019-08-13 21:50:28 +00:00
Zuul	eb3ec04325	Merge "AIO multinode: Add root user directive to Kubelet"	2019-08-13 16:55:10 +00:00
Zuul	3f0cda712b	Merge "Remove stale images from openstack-helm-infra"	2019-08-13 16:43:59 +00:00
Steve Wilkerson	d547063c37	Disable cephfs provisioner in multinode jobs This disables the cephfs provisioner in the multinode periodic jobs. It seems the helm tests for the ceph provisioner chart that test cephfs fail more often than not in the multinode jobs while passing reliably in the single node check and gate jobs. As cephfs is still gated, disabling the cephfs provisioner in the periodic jobs allows for further investigation into this issue without causing potential regressions Change-Id: I36e68cc2e446afac8769fb9ab753105909341f24 Signed-off-by: Steve Wilkerson <sw5822@att.com>	2019-08-13 14:49:27 +00:00
Drew Walters	354d53c4c3	AIO multinode: Add root user directive to Kubelet Systemd units run as the root user by default; however, environment variables in spawned processes are not populated for the root user unless "User=root" is specified for a particular unit [0]. This change adds the "User=root" declaration to the Kubelet systemd unit so that Kubelet will look in the root user's home directory for Docker configuration information. Without this change, Docker configuration information, such as authentication keys for private repositories, are ignored by Kubelet even though the Docker daemon honors them. [0] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Environment%20variables%20in%20spawned%20processes Change-Id: I209de0f4f04c078d39b1e8bf18195e51e965cbf3 Signed-off-by: Drew Walters <andrew.walters@att.com>	2019-08-12 15:56:47 +00:00
Zuul	9b9309fe31	Merge "(postgresql) Cert auth for replication connections"	2019-08-08 21:16:15 +00:00
RAHUL KHIYANI	ac65a37b0b	Nagios – API Handling – HTTP Security Headers Not Present Added new X-Content-Type-Options: nosniff header to make sure the browser does not try to detect a different Content-Type than what is actually sent (can lead to XSS) Added new X-Frame-Options: sameorigin header to protect against drag and drop clickjacking attacks in older browsers Added new Content-Security-Policy: script-src self for implementation Added new HTTP Security header X-XSS-Protection:1 mode=block to sanitize the page, when a XSS attack is detected, the browser will prevent rendering of the page Change-Id: Ic79bbb96484a7f1a497c001883783338fd26a47a	2019-08-07 19:08:48 +00:00
Steve Wilkerson	8573957fce	Minikube: Expose Tiller http port for metrics This updates the Minikube deployment to patch the tiller-deploy service to add a port definition for the http (44135) port for tiller, which is used to expose metrics for Prometheus to scrape Change-Id: I2eb5d4001c37935674ce64012b2744030addc127 Signed-off-by: Steve Wilkerson <sw5822@att.com>	2019-08-07 13:25:23 -05:00
Steve Wilkerson	443832a8fd	Remove stale images from openstack-helm-infra This removes the artifacts associated with images for libvirt, mariadb, and vbmc from openstack-helm-infra as these images now live in openstack-helm-images. Change-Id: I5c97d2db89068c71ec1a56a5ac17007682711182 Signed-off-by: Steve Wilkerson <sw5822@att.com>	2019-08-07 08:56:51 -05:00
Zuul	b310caef4f	Merge "Grafana: Code for Calico Dashboard"	2019-08-06 21:39:48 +00:00
Zuul	4a8f788532	Merge "Generate CA crt and key if needed"	2019-08-06 18:14:08 +00:00
Hussey, Scott (sh8121)	9c27dd7576	(postgresql) Cert auth for replication connections - Change the Postgres configuration to use x509 client certs for authenticating the connections for replicating between Patroni nodes. This is a straightforward solution for support credential rotation for the replication user. Password authentication is problematic due to the declartive nature of helm charts and requiring an existing replication connection to replicate the rotated password. Change-Id: I0c5456a01b3a36fee8ee4c986d25c4a1d807cb77	2019-08-06 00:03:54 -05:00
Zuul	8f749dd061	Merge "RabbitMQ: Dont remove definitions.json and erlang cookie when resetting"	2019-08-02 15:03:18 +00:00
Pete Birley	eef8ea131a	RabbitMQ: Dont remove definitions.json and erlang cookie when resetting This PS udpated the reset node function to leave the assets generated via init containers in place when resetting the node. Change-Id: Iac52ca82e95bb372dbcbca0eeea3b262215e9c12 Signed-off-by: Pete Birley <pete@port.direct>	2019-08-02 02:05:00 +00:00
Steve Wilkerson	bc20c6c8b6	Elasticsearch: Add cron job to verify snapshot repositories This adds a cron job to manually verify all snapshot repositories are registered to any active master and data nodes. This is to address scenarios where master and data nodes do not have the desired snapshot repositories registered following node outages or reboots Change-Id: Ie6f42e95c3ca4dc2ec70f2852a2bde11e59ec097 Signed-off-by: Steve Wilkerson <sw5822@att.com>	2019-08-02 02:02:14 +00:00
Zuul	26ed62352b	Merge "Ceph-Client: update configmap name for defragosds cronjob"	2019-08-02 00:21:41 +00:00
Zuul	ea303850cd	Merge "Elasticsearch: Manually verify snapshot repositories"	2019-08-01 18:36:37 +00:00
Chinasubbareddy Mallavarapu	acd5d11bc2	Ceph-Client: update configmap name for defragosds cronjob This is to update configmap names using by defragosds cronjob. Change-Id: I29608cd8b6ce1e30615a0f92853939d7bbae9972	2019-08-01 12:22:48 -05:00
Zuul	d3d898de1b	Merge "Nagios: Updated the alert for Ceph OSD Down"	2019-08-01 16:15:51 +00:00
Cliff Parsons	e059f4f827	Enhance HTK Job Manifests to be more flexible This patch enhances the HTK job manifest functions so that each job can be configured to use the desired backoffLimit and activeDeadlineSeconds, and can mount the command/script from either a configMap or a secret instead of being confined to using only configMaps. Change-Id: I5231e53b98e3e55e3e93070876d8694f37ad642d	2019-08-01 09:20:12 -05:00
Pai, Radhika (rp592h)	a37925c7e8	Grafana: Code for Calico Dashboard Appended the code that will add the calico dashboard to the Grafana. This will display the felix metrics which are collected by the prometheus. Change-Id: If18a18949f8093747b3f9ba819e036778c40b84e	2019-07-31 20:53:55 +00:00
Zuul	85b8d62830	Merge "Provide option to switch between dpdk and non-dpdk"	2019-07-31 20:38:22 +00:00
Manuel Buil	a71f1b4d33	Provide option to switch between dpdk and non-dpdk We can select if we want an image with dpdk support by adding: FEATURE_GATES=dpdk That way we can reuse the same script for different distros by using openstack-helm/tools/deployment/common/get-values-overrides.sh Change-Id: Ia2c53556be650899fdd67c1ec06f5c68ae63c9d4 Signed-off-by: Manuel Buil <mbuil@suse.com>	2019-07-31 15:54:51 +00:00
Ahmad Mahmoudi	db164a2925	Generate CA crt and key if needed Generate CA cert and CA key, if they are not present in the values. Change-Id: I14610ab66b72ddd5e6e45f57b56968e462416234	2019-07-30 13:16:03 -05:00
Arun Kant	7a8bb7058b	Removing deprecated option usage in gatther pod logs logic As per PR, https://github.com/kubernetes/kubernetes/pull/60210, in kubectl get show-all option is deprecated and no longer needed. Presumably now that's the default behavior. Also in current logs gathering logic, we are interested in capturing only pod names, so removing that option is harmless. We are seeing related failures in local CI when kubectl version is 1.15.x. So removing this option. Change-Id: I3886c792fe28bc8b80504d8c91e9524039131b15	2019-07-30 08:19:38 -07:00
Steve Wilkerson	8130e6bdc5	Elasticsearch: Manually verify snapshot repositories This updates the script for registering snapshot repositories to include a manual verification of the repositories created. This simply allows for inspection of all master and data nodes the repository is verified with to provide additional visibility into the state of all repositories Change-Id: I6e5386386e2b79b1cb0f41fc1f9b78817695f8f3 Signed-off-by: Steve Wilkerson <sw5822@att.com>	2019-07-24 15:37:23 -05:00
Zuul	17a7eb5cdc	Merge "Restore overrides functionality after regression"	2019-07-24 16:25:49 +00:00
Anderson, Craig (ca846m)	ab8c81f2ee	Restore overrides functionality after regression Revert 833d426da8e4b049277ca9847830f6e6beee40c3 https://review.opendev.org/#/c/667022 introduced a regression in the overrides functionality, which caused the corresponding gate test to fail. This "fixed" a problem by breaking the override capability. This patchset reverts the previous to restore override functionality and make gates green again. Deep copy is added in order to resolve the original problem that 667022 attempted to resolve. Change-Id: I6c052c0fabe0067612d6a3d9d3bfac4df59202d7	2019-07-24 12:18:44 +00:00
Chinasubbareddy Mallavarapu	dc66254c42	Ceph-RGW: fix file permision issue This is to fix the issue we are facing with file permision on the file /var/lib/ceph/bootstrap-rgw/ceph.keyring since owner of the file will be root. This is happening when node with rgw reboots and rgw pods fails at init after reboot,this is happening on sinlge node deplyoments. issue: ceph-rgw-5db485fbd9-dv778 0/1 Init:CrashLoopBackOff 5 6m49s logs: + chown -R ceph. /run/ceph/ /var/lib/ceph/bootstrap-rgw /var/lib/ceph/radosgw /var/lib/ceph/tmp chown: changing ownership of '/var/lib/ceph/bootstrap-rgw/ceph.keyring': Operation not permitted Change-Id: Idcb648c205053b2f03357b59173e70e02f28688c	2019-07-23 10:52:31 -05:00
Zuul	b7a7e81056	Merge "Updated the CEPH Cluster Health Panel values"	2019-07-22 15:29:36 +00:00
Zuul	f4a9e2b43c	Merge "Fix mon_host hosts when hostname contains 'ip'"	2019-07-19 21:24:59 +00:00
Pai, Radhika (rp592h)	47565d2d19	Nagios: Updated the alert for Ceph OSD Down Earlier the Nagios alert monitor was percent based as in when the percent of OSD down is greater than 80, it will send alert. >check_prom_alert!ceph_osd_down_pct_high!CRITICAL- CEPH OSDs down is more than 80 percent!OK- CEPH OSDs down is less than 80 percent Updated the code in nagios values.yaml to send alert when even 1 OSD is down: >check_prom_alert!ceph_osd_down!CRITICAL- One or more CEPH OSDs are down >for more than 5 minutes!OK- All the CEPH OSDs are up Change-Id: Id24c4a0cca64674890dae3599edc0c90d9534e90	2019-07-19 19:25:53 +00:00
Doug Aaser	9a36becf20	Cleanup unused Postgres config values This patch is part of an effort to cleanup the values.yaml file for Postgres, which has gotten messy since the introduction of Patroni. This patch specifically removes unused configuration values which were causing unnecessary bloat and complexity. Change-Id: I96180fd9c91200ba7558e58bd503b4ef9ebc183e	2019-07-19 17:16:04 +00:00
Daniel Pawlik	0b58aea135	Fix mon_host hosts when hostname contains 'ip' Ceph-mon template script parse mon_host in wrong way, when hostname contains'ip' word, e.g.: airship. Change-Id: I0a097443d42ad2e9b6be6c61facd7932ddb4b3bb Story: 2006255	2019-07-19 10:49:50 +00:00
Pete Birley	af270934d4	Rabbit: Eradicate potential crashes in wait job while upgrading cluster When upgrading/reconfiguring a rabbit cluster its possible that the nodes will not return the cluster status for some time, this ps allows us to cope with this much more gracefully than simply crashing a few times, before proceeding. Change-Id: Ibf525df9e3a9362282f70e5dbb136430734181fd Signed-off-by: Pete Birley <pete@port.direct>	2019-07-18 23:07:32 +00:00

... 2 3 4 5 6 ...

2026 Commits