openstack-helm-infra

Author	SHA1	Message	Date
Pete Birley	6ea6a85198	Ceph: Update default to use OSH image This PS udpates the default image in the chart to the latest OSH image. Change-Id: Ib8d2a72ad48049fe02560dc4405f0088890b6f64 Signed-off-by: Pete Birley <pete@port.direct>	2019-02-01 21:25:13 +00:00
Zuul	b30012a616	Merge "[CEPH] Fixes for the OSD defrag cronjob"	2019-01-31 16:05:14 +00:00
Matthew Heler	fc76091261	[CEPH] Fixes for the OSD defrag cronjob Fix a naming issue with the cronjob's binary, and schedule the cron job to run every 15 minutes for the gates. Additonally check to to ensure we are only running on block devices. Also update the script to work with ceph-volume created devices. Change-Id: I8aedab0ac41c191ef39a08034fff3278027d7520	2019-01-31 06:13:05 -06:00
Matthew Heler	f48c365cd3	[CEPH] Clean up PG troubleshooting option specific to Luminous Clean up the PG troubleshooting method that was needed for Luminous images. Since we are now on Mimic, this function is now not needed. Change-Id: Iccb148120410b956c25a1fed5655b3debba3412c	2019-01-29 18:57:23 +00:00
Zuul	f0f1b57b3c	Merge "[CEPH] Journal automation and disk cleanup updates"	2019-01-28 06:05:45 +00:00
Matthew Heler	61b93c6b46	[CEPH] Journal automation and disk cleanup updates Refactor the OSD Block initialization code that performs clean ups to use all the commands that ceph-disk zap uses. Extend the functionality when an OSD initializes to create journal partitions automatically. For example if /dev/sdc3 is defined as a journal disk, the chart will automatically create that partition. The size of the journal partition is determined by the osd_journal_size that is defined in ceph.conf. Change the OSD_FORCE_ZAP option to OSD_FORCE_REPAIR to automatically recreate/self-heal Filestore OSDs. This option will now call a function to repair a journal disk, and recreate partitions. One caveat to this, is that the device paritions must be defined (ex. /dev/sdc1) for a journal. Otherwise the OSD is zapped and re-created if the whole disk (ex. /dev/sdc) is defined as the journal disk. Change-Id: Ied131b51605595dce65eb29c0b64cb6af979066e	2019-01-24 11:47:30 -06:00
Matthew Heler	d966085321	[CEPH] Setup a cronjob to run OSD defrags for FileStore Create a cron and associated script to run monthly OSD defrags. When the script runs it will switch the OSD disk to the CFQ I/O scheduler to ensure that this is a non-blocking operation for ceph. While this cron job will run monthly, it will only execute on OSDs that are HDD based with Filestore. Change-Id: I06a4679e0cbb3e065974d610606d232cde77e0b2	2019-01-22 04:27:41 +00:00
Matthew Heler	b0da8d78d1	[CEPH] Fix a race condition with udev on OSD start Under some conditions udev may not trigger correctly and create the proper uuid symlinks required by Ceph. In order to work around this we manually create the symlinks. Change-Id: Icadce2c005864906bcfdae4d28117628c724cc1c	2019-01-18 15:03:27 -06:00
kranthi guttikonda	6771440f4a	Fix for ceph-osd regression When ceph-osd journal as a directory and data as a block device ceph-osd fails to deploy while waiting for the journal file in /var/lib/ceph/journal/journal.<id> Added the condition before checking bluestore for directory and removed the same later in the script Closes-Bug: #1811154 Change-Id: Ibd4cf0be5ed90dfc4de5ffab554a91da1b62e5f4 Signed-off-by: Kranthi Guttikonda <kranthi.guttikonda@b-yond.com> Signed-off-by: kranthi guttikonda <kranthi.guttikonda9@gmail.com>	2019-01-11 18:11:23 -05:00
Matthew Heler	e9c7aab6fd	[CEPH] Directory OSD regression fix Fix a regression with the Directory OSD logic. Change-Id: I793cf0869bda5c640eb945cbb8190cd89b30c4d0	2019-01-08 13:45:32 -06:00
Matthew Heler	4a85c21996	[CEPH] OSD directory permission fixes In the event the base image is changed, the uid of the ceph OSD directory may not align with the uid of the ceph user of the image. In this case we check permissions and set them correctly. Change-Id: I3bef7f6323d1de7c62320ccd423c929349bedb42	2019-01-07 19:08:11 -06:00
Matthew Heler	c0d028e245	Uplift Ceph charts to the Mimic release Change the release of Ceph from 12.2.3 (Luminous) to latest 13.2.2 (Mimic). Additionally use supported RHEL/Centos Images rather then Ubuntu images, which are now considered deprecated by Redhat. - Uplift all Ceph images to the latest 13.2.2 ceph-container images. - RadosGW by default will now use the Beast backend. - RadosGW has relaxed settings enabled for S3 naming conventions. - Increased RadosGW resource limits due to backend change. - All Luminous specific tests now test for both Luminous/Mimic. - Gate scripts will remove all none required ceph packages. This is required to not conflict with the pid/gid that the Redhat container uses. Change-Id: I9c00f3baa6c427e6223596ade95c65c331e763fb	2019-01-05 14:38:38 +00:00
Chris Wedgwood	0c4e37391f	'NOP' cleanup for more consistent white-space use in charts Where we have the style '{{ ...' we should use the style '... }}'. Change-Id: Ic3e779e4681370d396f95d3804ca27db5b9d3642	2019-01-03 22:45:49 +00:00
Matthew Heler	e581a79807	[CEPH] Cleanup the ceph-osd helm-chart - Split off duplicate code across multiple bash scripts into a common file. - Simplify the way journals are detected for block devices. - Cleanup unused portions of the code. - Standardize the syntax across all the code. - Use sgdisk for zapping disks rather then ceph-disk. Change-Id: I13e4a89cab3ee454dd36b5cdedfa2f341bf50b87	2018-12-28 13:09:21 -06:00
Zuul	5cca3e74d4	Merge "[CEPH] Fix race conditions with OSD POD initialization"	2018-12-24 22:48:53 +00:00
Matthew Heler	30b57ba671	[CEPH] Fix race conditions with OSD POD initialization Under POD restart conditions there is a race condition with lsblk causing the helm chart to zap a fully working OSD disk. We refactor the code to remove this requirement. Additonally the new automatic journal partitioning code has a race condition in which the same journal partition could be picked twice for OSDs on the same node. To resolve this we share a common tmp directory from the node to all of the OSD pods on that node. Change-Id: I807074c4c5e54b953b5c0efa4c169763c5629062	2018-12-21 15:05:54 -06:00
Matthew Heler	e1a3819a0d	[CEPH] Support a troubleshooting option to reset PG metadata Ceph upstream bug: https://tracker.ceph.com/issues/21142 is impacting the availability of our sites in pipeline. Add an option to reset the past interval metadata time on an OSDs PG to solve for this issue if it occurs. Change-Id: I1fe0bee6ce8aa402c241f1ad457bbf532945a530	2018-12-18 23:26:18 -06:00
Matthew Heler	de69c68365	[Ceph] Update ceph helm tests - Ensure the helm tests are logging all commands and variables Change-Id: I4f4c553a3fbb4d77e9d1ab41c1c0c763c963cfd3	2018-12-15 13:47:43 -06:00
Zuul	b76acd6dd6	Merge "Ceph: Journal partition automation"	2018-12-14 18:37:15 +00:00
Zuul	62dce1852e	Merge "Increase the cpu and memory resource limits for Ceph OSDs"	2018-12-14 01:52:57 +00:00
Renis Makadia	17df1c5df5	Ceph: Journal partition automation - Use whole disk /dev/sdc format. - Don't specify partition and let ceph-osd util create and manage partition. - On an OSD disk failure, during manintanance window, Journal partition for failed OSD should be deleted. This will allow ceph-osd util to reuse space for new partition. - Disk partition count num will continue to increase as more OSD fails. Change-Id: I87522db8cabebe8cb103481cdb65fc52f2ce2b07	2018-12-13 16:37:15 +00:00
Pete Birley	c256cce537	Ceph: Allow multiple test pods to be present in clusters This ps allows multiple ceph test pods to be present in cluster with more than one ceph deployment. Change-Id: I002a8b4681d97ed6ab95af23e1938870c28f5a83 Signed-off-by: Pete Birley <pete@port.direct>	2018-12-12 07:29:01 -06:00
Matthew Heler	2e67eeb955	Increase the cpu and memory resource limits for Ceph OSDs The minimium requirements for a Ceph OSD have changed in the latest Luminous release to accomodate Bluestore changes. We need to support these changes as we look into upgrading Ceph to the latest Luminous and beyond releases. Change-Id: I3eddffe73cfd188ff012db7c74702de6921711e7	2018-12-11 01:27:43 +00:00
Pete Birley	7608d2c9d7	Ceph: Update failure domain overrides to support dynamic config This PS updates the ceph failure domain overrides to support dynamic configuration based on host/label based overrides. Also fixes typo identified in the following ps for directories: * https://review.openstack.org/#/c/623670/1 Change-Id: Ia449be23353083f9a77df2b592944571c907e277 Signed-off-by: Pete Birley <pete@port.direct>	2018-12-08 13:54:17 -06:00
Matthew Heler	d50bd2daad	Fix detection of failure domain type Small typo in the logic filtering of the failure domain type for an OSD pod. This wasn't initially found since it didn't break any expected behavior tests. Change-Id: I2b895bbc83c6c71fffe1a0db357b120b3ffb7f56	2018-12-08 12:45:07 -06:00
Matthew Heler	4ad893eb1a	Additional Ceph tunning parameters for openstack-helm osd_scrub_load_threshold set to 10.0 (default 0.5) - With the number of multi-core processors nowadays, it's fairly typical to see systems over a load of 1.0. We need to adjust the scrub load threshold so that scrubbing runs as scheduled even when a node is moderately/lightly under load. filestore_max_sync_interval set to 10s (default 5s) - Larger default journal sizes (>1GB) will not be effectively used unless the max sync interval time is increased for Filestore. The benefit of this change is increased performance especially around sequential write workloads. mon_osd_down_out_interval set to 1800s (default 600s) - OSD PODs can take longer then several minutes to boot up. Mark an OSD as 'out' in the CRUSH map only after 15 minutes of being 'down'. Change-Id: I62d6d0de436c270d3295671f8c7f74c89b3bd71e	2018-12-04 20:27:52 -06:00
Matthew Heler	35cce6cb43	Switch Ceph to IPs when DNS is down Add helper scripts that are called by a POD to switch Ceph from DNS to IPs. This POD will loop every 5 minutes to catch cases where the DNS might be unavailable. On a POD's Service start switch ceph.conf to using IPs rather then DNS. Change-Id: I402199f55792ca9f5f28e436ff44d4a6ac9b7cf9	2018-12-03 10:51:37 -06:00
Renis Makadia	b1005b23b4	Helm tests for Ceph-OSD and Ceph-Client charts Change-Id: If4a846f0593b8679558662205a8560aa3cbb18ae	2018-12-01 08:08:00 +00:00
Matthew Heler	6e8c289c13	Add failure domains, and device classes for custom CRUSH rules Largely inspired and taken from Kranthi's PS. - Add support for creating custom CRUSH rules based off of failure domains and device classes (ssd & hdd) - Basic logic around the PG calculator to autodetect the number of OSDs globally and per device class (required when using custom crush rules that specify device classes). Change-Id: I13a6f5eb21494746c2b77e340e8d0dcb0d81a591	2018-11-27 09:37:30 -06:00
Matthew Heler	5ce9f2eb3b	Enable Ceph charts to be rack aware for CRUSH Add support for a rack level CRUSH map. Rack level CRUSH support is enabled by using the "rack_replicated_rule" crush rule. Change-Id: I4df224f2821872faa2eddec2120832e9a22f4a7c	2018-11-20 09:07:36 -06:00
Matthew Heler	55446e1f41	Move default CEPH journal size from 5GB to 10GB Request from downstream to use 10GB journal sizes. Currently journals are created manually today, but there is upcoming work to have the journals created by the Helm charts themselves. This value needs to be put in as a default to ensure journals are sized appropiately. Change-Id: Idaf46fac159ffc49063cee1628c63d5bd42b4bc6	2018-11-08 17:34:12 +00:00
Steve Wilkerson	45da8c2b69	Ceph: Update log directory host mount path This updates the ceph-mon and ceph-osd charts to use the release name for the hostpath defined for mounting the /var/log/ceph directories to. This gives us a mechanism for creating unique log directories for multiple releases of the same chart without the need for specifying an override for each deployment of that chart Change-Id: Ie6e05b99c32f24440fbade02d59c7bb14d8aa4c8	2018-10-29 13:05:46 -05:00
Matthew Heler	6ef48d3706	Further performance tuning changes for Ceph - Throttle down snap trimming as to lessen it's performance impact (Setting just osd_snap_trim_priority isn't effective enough to throttle down the impact) osd_snap_trim_sleep: 0.1 (default 0) osd_pg_max_concurrent_snap_trims: 1 (default 2) - Align filestore_merge_threshold with upstream Ceph values (A negative number disables this function, no change in behavior) filestore_merge_threshold: -10 (formerly -50, default 10) - Increase RGW pool thread size for more concurrent connections rgw_thread_pool_size: 512 (default 100) - Disable in-memory logs for the ms subsytem. debug_ms: 0/0 (default 0/5) - Formating cleanups Change-Id: I4aefcb6e774cb3e1252e52ca6003cec495556467	2018-10-26 15:10:50 +00:00
Chinasubbareddy M	a1b8f394b2	ceph: make log directory configurable this is make log directory configurable incase if another mon or osd running on same host can point to other directory Change-Id: I2db6dffd45599386f8082db8f893c799d139aba3	2018-10-25 14:34:14 +00:00
Matthew Heler	f8ac6c3f21	ceph co-location journal and permission fixes Support co-located journals with Ceph helm chart Ensure proper ownership set on OSD/Journal disks Change-Id: Ic954d75c8bd7532991dc9b3184ad6d74b97855d1	2018-10-25 08:21:31 +00:00
Steve Wilkerson	92717bdc72	Ceph: Remove fluentbit sidecars, mount hostpath for logs This removes the fluentbit sidecars from the ceph-mon and ceph-osd charts. Instead, we mount /var/log/ceph as a hostpath, and use the fluentbit daemonset to target the mounted log files instead This also updates the fluentd configuration to better handle the correct configuration type for flush_interval (time vs int), as well as updates the fluentd elasticsearch output values to help address the gate failures resulting from the Elasticsearch bulk endpoints failing Change-Id: If3f2ff6371f267ed72379de25ff463079ba4cddc	2018-10-17 11:05:03 -05:00
Matthew Heler	5efac315f7	Initialize OSDs with a crush weight of 0 to prevent automatic rebalancing. Weight the OSDs based on reported disk size when ceph-client chart runs. Change-Id: I9f4080a9843f1a63564cf71154841b351382bfe2	2018-10-16 21:33:49 +00:00
Zuul	c10f9ce59e	Merge "Modify Ceph default settings for improved performance"	2018-09-20 22:44:11 +00:00
Jean-Charles Lopez	c6cad19d11	Modify Ceph default settings for improved performance Change-Id: Ia0d856e53f3bfdc1414264b468b576003dc23b6e	2018-09-13 07:47:42 -07:00
Pete Birley	bb3ff98d53	Add release uuid to pods and rc objects This PS adds the ability to attach a release uuid to pods and rc objects as desired. A follow up ps will add the ability to add arbitary annotations to the same objects. Change-Id: Iceedba457a03387f6fc44eb763a00fd57f9d84a5 Signed-off-by: Pete Birley <pete@port.direct>	2018-09-13 05:35:35 +00:00
Steve Wilkerson	25bc83b580	Ceph: Move Ceph charts to openstack-helm-infra This continues the work of moving infrastructure related services out of openstack-helm, by moving the ceph charts to openstack helm infra instead. Change-Id: I306ccd9d494f72a7946a7850f96d5c22f36eb8a0	2018-08-28 15:03:35 -05:00

41 Commits