openstack-helm-infra

Author	SHA1	Message	Date
Vladimir Kozhukalov	cac1d4c44e	Retire openstack-helm-infra repository Change-Id: Ic43ab19eec26f3e7cf79e7df79d79c5ff0ac7da6	2025-04-15 12:13:22 -05:00
Stephen Taylor	e26324d5a5	[ceph] Update Ceph and Rook This change updates all of the charts that use Ceph images to use new images based on the Squid 19.2.1 release. Rook is also updated to 1.16.3 and is configured to deploy Ceph 19.2.1. Change-Id: Ie2c0353a4bfa181873c98ce5de655c3388aa9574	2025-02-18 13:38:05 -07:00
Vladimir Kozhukalov	672e488519	Update versions of all charts to 2024.2.0 As per agreement with https://docs.openstack.org/openstack-helm/latest/specs/2025.1/chart_versioning.html Change-Id: Ia064d83881626452dc3c0cf888128e152692ae77	2024-12-18 11:05:10 -06:00
Vladimir Kozhukalov	693d3a2686	Update Chart.yaml apiVersion to v2 Change-Id: I66dcaedefd0640f8a7b5343363354ba539d70627	2024-12-16 16:48:03 -06:00
Vladimir Kozhukalov	8b29037cec	Move values overrides to a separate directory This is the action item to implement the spec: doc/source/specs/2025.1/chart_versioning.rst Also add overrides env variables - OSH_VALUES_OVERRIDES_PATH - OSH_INFRA_VALUES_OVERRIDES_PATH This commit temporarily disables all jobs that involve scripts in the OSH git repo because they need to be updated to work with the new values_overrides structure in the OSH-infra repo. Once this is merged I4974785c904cf7c8730279854e3ad9b6b7c35498 all these disabled test jobs must be enabled. Depends-On: I327103c18fc0e10e989a17f69b3bff9995c45eb4 Change-Id: I7bfdef3ea2128bbb4e26e3a00161fe30ce29b8e7	2024-12-13 12:04:44 -06:00
Vladimir Kozhukalov	75fdad3ff9	Run utils-defragOSDs.sh in ceph-osd-default container The Ceph defragosds cronjob script used to connect to OSD pods not explicitly specifying the ceph-osd-default container and eventually tried to run the defrag script in the log-runner container where the defrag script is mounted with 0644 permissions and shell fails to run it. Change-Id: I4ffc6653070dbbc6f0766b278acf0ebe2b4ae1e1	2024-09-12 13:52:15 -05:00
Vladimir Kozhukalov	8077898106	Update kubernetes-entrypoint image Use quay.io/airshipit/kubernetes-entrypoint:latest-ubuntu_focal by default instead of 1.0.0 which is v1 formatted and not supported any more by docker. Change-Id: I6349a57494ed8b1e3c4b618f5bd82705bef42f7a	2024-07-12 13:52:07 -05:00
Stephen Taylor	2fd438b4b1	Update Ceph images to patched 18.2.2 and restore debian-reef repo This change updates the Ceph images to 18.2.2 images patched with a fix for https://tracker.ceph.com/issues/63684. It also reverts the package repository in the deployment scripts to use the debian-reef directory on download.ceph.com instead of debian-18.2.1. The issue with the repo that prompted the previous change to debian-18.2.1 has been resolved and the more generic debian-reef directory may now be used again. Change-Id: I85be0cfa73f752019fc3689887dbfd36cec3f6b2	2024-03-12 13:45:42 -06:00
Stephen Taylor	f641f34b00	[ceph] Update Ceph images to Jammy and Reef 18.2.1 This change updates all Ceph images in openstack-helm-infra to ubuntu_jammy_18.2.1-1-20240130. Change-Id: I16d9897bc5f8ca410059a5f53cc637eb8033ba47	2024-01-30 07:58:03 -07:00
Stephen Taylor	5e5a52cc04	Update Rook to 1.12.5 and Ceph to 18.2.0 This change updates Rook to the 1.12.5 release and Ceph to the 18.2.0 (Reef) release. Change-Id: I546780ce33b6965aa699f1578d1db9790dc4e002	2023-10-13 12:58:56 -06:00
Stephen Taylor	443ff3e3e3	[ceph] Use Helm toolkit functions for Ceph probes This change converts the readiness and liveness probes in the Ceph charts to use the functions from the Helm toolkit rather than having hard-coded probe definitions. This allows probe configs to be overridden in values.yaml without rebuilding charts. Change-Id: I68a01b518f12d33fe4f87f86494a5f4e19be982e	2023-08-22 19:16:37 +00:00
Stephen Taylor	9c5e5102f6	[ceph-client] Strip any errors preceding pool properties JSON Sometimes errors appear in the 'ceph osd pool get' output before the JSON string. The returned string is saved and is assumed to contain only the JSON string with the pool properties. When errors appear in the string, pool properties are not read properly, which can cause pools to be misconfigured. This change filters that output so only the expected JSON string is returned. It can then be parsed correctly. Change-Id: I83347cc32da7e7af160b5cacc2a99de74eebebc7	2023-05-10 22:41:22 +00:00
Stephen Taylor	45b492bcf7	[ceph] Update Ceph to 17.2.6 This change updates the openstack-helm-infra charts to use 17.2.6 Quincy images based on Focal. See https://review.opendev.org/c/openstack/openstack-helm-images/+/881217 Change-Id: Ibb89435ae22f6d634846755e8121facd13d5d331	2023-05-09 12:25:07 +00:00
Stephen Taylor	1cf87254e8	[ceph-client] Allow pg_num_min to be overridden per pool This change allows the target pg_num_min value (global for all pools) to be overridden on a per-pool basis by specifying a pg_num_min value in an individual pool's values. A global value for all pools may not suffice in all cases. Change-Id: I42c55606d48975b40bbab9501289a7a59c15683f	2023-03-30 15:40:33 +00:00
Stephen Taylor	f47a1033aa	[ceph] Document the use of mon_allow_pool_size_one This is simply to document the fact that mon_allow_pooL_size_one must be configured via cluster_commands in the ceph-client chart. Adding it to ceph.conf via the conf values in the ceph-mon chart doesn't seem to configure the mons effectively. Change-Id: Ic7e9a0eade9c0b4028ec232ff7ad574b8574615d	2023-03-17 12:21:01 -06:00
Stephen Taylor	fc92933346	[ceph] Update all Ceph images to Focal This change updates all Ceph image references to use Focal images for all charts in openstack-helm-infra. Change-Id: I759d3bdcf1ff332413e14e367d702c3b4ec0de44	2023-03-16 16:39:37 -06:00
Stephen Taylor	f80049faa1	[ceph] Allow gate scripts to use 1x replication in Ceph The Pacific release of Ceph disabled 1x replication by default, and some of the gate scripts are not updated to allow this explicitly. Some gate jobs fail in some configurations as a result, so this change adds 'mon_allow_pool_size_one = true' to those Ceph gate scripts that don't already have it, along with --yes-i-really-mean-it added to commands that set pool size. Change-Id: I5fb08d3bb714f1b67294bb01e17e8a5c1ddbb73a	2023-03-16 05:19:30 -06:00
Stephen Taylor	575c2885de	[ceph-client] Fix OSD count checks in the ceph-rbd-pool job This change adjusts the minimum OSD count check to be based on the osd value, and the maxiumum OSD count check to be based on the final_osd value. This logic supports both full deployments and partial deployments, with the caveat that it may allow partial deployments to over-provision storage. Change-Id: I93aac65df850e686f92347d406cd5bb5a803659d	2022-12-19 14:49:36 -07:00
Stephen Taylor	0aad6d05f0	[ceph-client] Correct check for too many OSDs in the pool job The target OSD count and the final target OSD count may differ in cases where a deployment may not include all of the hardware it is expected to include eventually. This change corrects the check for more OSDs running than expected to be based on the final OSD count rather than the intermediate one to avoid false failures when the intermediate target is exceeded and the final target is not. Change-Id: I03a13cfe3b9053b6abc5d961426e7a8e92743808	2022-12-15 11:09:12 -07:00
Stephen Taylor	6852f7c8ed	[ceph-client] Make use of noautoscale with Pacific The Ceph Pacific release has added a noautoscale flag to enable and disable the PG autoscaler for all pools globally. This change utilizes this flag for enabling and disabling autoscaler when the Ceph major version is greater than or equal to 16. Change-Id: Iaa3f2d238850eb413f26b82d75b5f6835980877f	2022-09-27 13:15:02 -06:00
Brian Haley	f31cfb2ef9	support image registries with authentication Based on spec in openstack-helm repo, support-OCI-image-registry-with-authentication-turned-on.rst Each Helm chart can configure an OCI image registry and credentials to use. A Kubernetes secret is then created with these info. Service Accounts then specify an imagePullSecret specifying the Secret with creds for the registry. Then any pod using one of these ServiceAccounts may pull images from an authenticated container registry. Change-Id: Iebda4c7a861aa13db921328776b20c14ba346269	2022-07-20 14:28:47 -05:00
Stephen Taylor	de2227f6e7	[ceph-client] Add the ability to run Ceph commands from values The major reason for the addition of this feature is to facilitate an upgrade to the Pacific Ceph release, which now requires the require-osd-release flag to be set to the proper release in order to avoid a cluster warning scenario. Any Ceph command can be run against the cluster using this feature, however. Change-Id: I194264c420cfda8453c139ca2b737e56c63ef269	2022-07-15 07:32:59 -06:00
Stephen Taylor	d2c8de85c9	[ceph-client] Handle multiple mon versions in the pool job The mon version check in the rbd-pool job can cause the script to error and abort if there are multiple mon versions present in the Ceph cluster. This change chooses the lowest-numbered major version from the available mon versions when performing the version check since the check is performed in order to determine the right way to parse JSON output from a mon query. Change-Id: I51cc6d1de0034affdc0cc616298c2d2cd3476dbb	2022-06-14 13:35:36 -06:00
Schubert Anselme	753a32c33d	Migrate CronJob resources to batch/v1 and PodDisruptionBudget resources to policy/v1 This change updates the following charts to migrate CronJob resources to the batch/v1 API version, available since v1.21. [0] and to migrate PodDisruptionBudget to the policy/v1 API version, also available since v1.21. [1] This also uplift ingress controller to 1.1.3 - ceph-client (CronJob) - cert-rotation (CronJob) - elasticsearch (CronJob) - mariadb (CronJob & PodDisruptionBudget) - postgresql (CronJob) 0: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#cronjob-v125 1: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#poddisruptionbudget-v125 Change-Id: Ia6189b98a86b3f7575dc4678bb3a0cce69562c93	2022-05-10 15:12:53 -04:00
Stephen Taylor	3b9aa44ac5	[ceph-client] More robust naming of clusterrole-checkdns Currently if multiple instances of the ceph-client chart are deployed in the same Kubernetes cluster, the releases will conflict because the clusterrole-checkdns ClusterRole is a global resources and has a hard-coded name. This change scopes the ClusterRole name by release name to address this. Change-Id: I17d04720ca301f643f6fb9cf5a9b2eec965ef537	2022-03-10 07:21:54 -07:00
Sigunov, Vladimir (vs422h)	80fe5d81cc	[CEPH] Less agressive checks in mgr deployment Ceph cluster needs only one active manager to function properly. This PS converts ceph-client-tests rules related to ceph-mgr deployment from error into warning if the number of standby mgrs is less than expected. Change-Id: I53c83c872b95da645da69eabf0864daff842bbd1	2022-03-04 16:39:52 -05:00
Sigunov, Vladimir (vs422h)	728c340dc0	[CEPH] Discovering ceph-mon endpoints This is a code improvement to reuse ceph monitor doscovering function in different templates. Calling the mentioned above function from a single place (helm-infra snippets) allows less code maintenance and simlifies further development. Rev. 0.1 Charts version bump for ceph-client, ceph-mon, ceph-osd, ceph-provisioners and helm-toolkit Rev. 0.2 Mon endpoint discovery functionality added for the rados gateway. ClusterRole and ClusterRoleBinding added. Rev. 0.3 checkdns is allowed to correct ceph.conf for RGW deployment. Rev. 0.4 Added RoleBinding to the deployment-rgw. Rev. 0.5 Remove _namespace-client-ceph-config-manager.sh.tpl and the appropriate job, because of duplicated functionality. Related configuration has been removed. Rev. 0.6 RoleBinding logic has been changed to meet rules: checkdns namespace - HAS ACCESS -> RGW namespace(s) Change-Id: Ie0af212bdcbbc3aa53335689deed9b226e5d4d89	2022-02-11 14:30:43 -07:00
Stephen Taylor	ea2c0115c4	Move ceph-mgr deployment to the ceph-mon chart This change moves the ceph-mgr deployment from the ceph-client chart to the ceph-mon chart. Its purpose is to facilitate the proper Ceph upgrade procedure, which prescribes restarting mgr daemons before mon daemons. There will be additional work required to implement the correct daemon restart procedure for upgrades. This change only addresses the move of the ceph-mgr deployment. Change-Id: I3ac4a75f776760425c88a0ba1edae5fb339f128d	2022-02-05 05:02:18 +00:00
Phil Sphicas	428cda6e33	[ceph-client] Consolidate mon_host discovery This change updates the ceph.conf update job as follows: * renames it to "ceph-ns-client-ceph-config" * consolidates some Roles and RoleBindings This change also moves the logic of figuring out the mon_host addresses from the kubernetes endpoint object to a snippet, which is used by the various bash scripts that need it. In particular, this logic is added to the rbd-pool job, so that it does not depend on the ceph-ns-client-ceph-config job. Note that the ceph.conf update job has a race with several other jobs and pods that mount ceph.conf from the ceph-client-etc configmap while it is being modified. Depending on the restartPolicy, pods (such as the one created for the ceph-rbd-pool job) may linger in StartError state. This is not addressed here. Change-Id: Id4fdbfa9cdfb448eb7bc6b71ac4c67010f34fc2c	2021-10-28 19:47:59 -07:00
Phil Sphicas	1ccc3eb0db	[ceph-client] Fix ceph.conf update job labels, rendering This change fixes two issues with the recently introduced [0] job that updates "ceph.conf" inside ceph-client-etc configmap with a discovered mon_host value: 1. adds missing metadata.labels to the job 2. allows the job to be disabled (fixes rendering when manifests.job_ns_client_ceph_config = false) 0: https://review.opendev.org/c/openstack/openstack-helm-infra/+/812159 Change-Id: I3a8f1878df4af5da52d3b88ca35ba0b97deb4c35	2021-10-28 19:47:14 -07:00
Chinasubbareddy Mallavarapu	fa608d076c	[ceph-client] Update ceph_mon config to ips from fqdn As ceph clients expect the ceph_mon config as shown below for Ceph Nautilus and later releases, this change updates the ceph-client-etc configmap to reflect the correct mon endpoint specification. mon_host = [v1:172.29.1.139:6789/0,v2:172.29.1.139:3300/0], [v1:172.29.1.140:6789/0,v2:172.29.1.140:3300/0], [v1:172.29.1.145:6789/0,v2:172.29.1.145:3300/0] Change-Id: Ic3a1cb7e56317a5a5da46f3bf97ee23ece36c99c	2021-10-14 20:47:38 +00:00
Phil Sphicas	25b0cdc7ec	[ceph-client] Fix ceph-rbd-pool deletion race In cases where the pool deletion feature [0] is used, but the pool does not exists, a pool is created and then subsequently deleted. This was broken by the performance optimizations introduced with [1], as the job is trying to delete a pool that does not exist (yet). This change makes the ceph-rbd-pool job wait for manage_pools to finish before trying to delete the pool. 0: https://review.opendev.org/c/792851 1: https://review.opendev.org/c/806443 Change-Id: Ibb77e33bed834be25ec7fd215bc448e62075f52a	2021-10-13 17:23:23 -07:00
Gage Hugo	22e50a5569	Update htk requirements This change updates the helm-toolkit path in each chart as part of the move to helm v3. This is due to a lack of helm serve. Change-Id: I011e282616bf0b5a5c72c1db185c70d8c721695e	2021-10-06 01:02:28 +00:00
Stephen Taylor	46c8218fbf	[ceph-client] Performance optimizations for the ceph-rbd-pool job This change attempts to reduce the number of Ceph commands required in the ceph-rbd-pool job by collecting most pool properties in a single call and by setting only those properties where the current value differs from the target value. Calls to manage_pool() are also run in the background in parallel, so all pools are configured concurrently instead of serially. The script waits for all of those calls to complete before proceeding in order to avoid issues related to the script finishing before all pools are completely configured. Change-Id: If105cd7146313ab9074eedc09580671a0eafcec5	2021-10-01 07:59:10 -06:00
Sean Eagan	b1a247e7f5	Helm 3 - Fix Job labels If labels are not specified on a Job, kubernetes defaults them to include the labels of their underlying Pod template. Helm 3 injects metadata into all resources [0] including a `app.kubernetes.io/managed-by: Helm` label. Thus when kubernetes sees a Job's labels they are no longer empty and thus do not get defaulted to the underlying Pod template's labels. This is a problem since Job labels are depended on by - Armada pre-upgrade delete hooks - Armada wait logic configurations - kubernetes-entrypoint dependencies Thus for each Job template this adds labels matching the underlying Pod template to retain the same labels that were present with Helm 2. [0]: https://github.com/helm/helm/pull/7649 Change-Id: I3b6b25fcc6a1af4d56f3e2b335615074e2f04b6d	2021-09-30 16:01:31 -05:00
Ritchie, Frank (fr801x)	43fe7246fd	Always set pg_num_min to the proper value Currently if pg_num_min is less than the value specified in values.yaml or overrides no change to pg_num_min is made during updates when the value should be increased. This PS will ensure the proper value is always set. Change-Id: I79004506b66f2084402af59f9f41cda49a929794	2021-08-25 18:59:53 -05:00
Parsons, Cliff (cp769u)	a0aec27ebc	Fix Ceph checkDNS script The checkDNS script which is run inside the ceph-mon pods has had a bug for a while now. If a value of "up" is passed in, it adds brackets around it, but then doesn't check for the brackets when checking for a value of "up". This causes a value of "{up}" to be written into the ceph.conf for the mon_host line and that causes the mon_host to not be able to respond to ceph/rbd commands. Its normally not a problem if DNS is working, but if DNS stops working this can happen. This patch changes the comparison to look for "{up}" instead of "up" in three different files, which should fix the problem. Change-Id: I89cf07b28ad8e0e529646977a0a36dd2df48966d	2021-08-25 14:17:54 +00:00
Stephen Taylor	07ceecd8d7	Export crash dumps when Ceph daemons crash This change configures Ceph daemon pods so that /var/lib/ceph/crash maps to a hostPath location that persists when the pod restarts. This will allow for post-mortem examination of crash dumps to attempt to understand why daemons have crashed. Change-Id: I53277848f79a405b0809e0e3f19d90bbb80f3df8	2021-06-30 14:24:15 -06:00
Thiago Brito	5a0ba49d50	Prepending library/ to docker official images This will ease mirroring capabilities for the docker official images. Signed-off-by: Thiago Brito <thiago.brito@windriver.com> Change-Id: I0f9177b0b83e4fad599ae0c3f3820202bf1d450d	2021-06-02 15:04:38 -03:00
Stephen Taylor	bcc31f9821	[ceph-client] Add pool delete support for Ceph pools Two new values, "delete" and "delete_all_pool_data," have been added to the Ceph pool spec to allow existing pools to be deleted in a brownfield deployment. For deployments where a pool does not exist, either for greenfield or because it has been deleted previously, the pool will be created and then deleted in a single step. Change-Id: Ic22acf02ae2e02e03b834e187d8a6a1fa58249e7	2021-05-24 19:25:18 +00:00
Stephen Taylor	9f3b9f4f56	[ceph-client] Add pool rename support for Ceph pools A new value "rename" has been added to the Ceph pool spec to allow pools to be renamed in a brownfield deployment. For greenfield the pool will be created and renamed in a single deployment step, and for a brownfield deployment in which the pool has already been renamed previously no changes will be made to pool names. Change-Id: I3fba88d2f94e1c7102af91f18343346a72872fde	2021-05-11 14:56:06 -06:00
Parsons, Cliff (cp769u)	d4f253ef9f	Make Ceph pool init job consistent with helm test The current pool init job only allows the finding of PGs in the "peering" or "activating" (or active) states, but it should also allow the other possible states that can occur while the PG autoscaler is running ("unknown" and "creating" and "recover"). The helm test is already allowing these states, so the pool init job is being changed to also allow them to be consistent. Change-Id: Ib2c19a459c6a30988e3348f8d073413ed687f98b	2021-05-11 15:38:18 +00:00
Parsons, Cliff (cp769u)	7bb5ff5502	Make ceph-client helm test more PG specific This patchset makes the current ceph-client helm test more specific about checking each of the PGs that are transitioning through inactive states during the test. If any single PG spends more than 30 seconds in any of these inactive states (peering, activating, creating, unknown, etc), then the test will fail. Also, if after the three minute PG checking period is expired, we will no longer fail the helm test, as it is very possible that the autoscaler could be still adjusting the PGs for several minutes after a deployment is done. Change-Id: I7f3209b7b3399feb7bec7598e6e88d7680f825c4	2021-04-16 22:25:53 +00:00
Parsons, Cliff (cp769u)	f20eff164f	Allow Ceph RBD pool job to leave failed pods This patchset will add the capability to configure the Ceph RBD pool job to leave failed pods behind for debugging purposes, if it is desired. Default is to not leave them behind, which is the current behavior. Change-Id: Ife63b73f89996d59b75ec617129818068b060d1c	2021-03-29 19:38:55 +00:00
Parsons, Cliff (cp769u)	167b9eb1a8	Fix ceph-client helm test This patch resolves a helm test problem where the test was failing if it found a PG state of "activating". It could also potentially find a number of other states, like premerge or unknown, that could also fail the test. Note that if these transient PG states are found for more than 3 minutes, the helm test fails. Change-Id: I071bcfedf7e4079e085c2f72d2fbab3adc0b027c	2021-03-22 22:06:27 +00:00
Stephen Taylor	69a7916b92	[ceph-client] Disable autoscaling before pools are created When autoscaling is disabled after pools are created, there is an opportunity for some autoscaling to take place before autoscaling is disabled. This change checks to see if autoscaling needs to be disabled before creating pools, then checks to see if it needs to be enabled after creating pools. This ensures that autoscaling won't happen when autoscaler is disabled and autoscaling won't start prematurely as pools are being created when it is enabled. Change-Id: I8803b799b51735ecd3a4878d62be45ec50bbbe19	2021-03-12 15:03:51 +00:00
bw6938	bb3ce70a10	[ceph-client] enhance logic to enable and disable the autoscaler The autoscaler was introduced in the Nautilus release. This change only sets the pg_num value for a pool if the autoscaler is disabled or the Ceph release is earlier than Nautilus. When pools are created with the autoscaler enabled, a pg_num_min value specifies the minimum value of pg_num that the autoscaler will target. That default was recently changed from 8 to 32 which severely limits the number of pools in a small cluster per https://github.com/rook/rook/issues/5091. This change overrides the default pg_num_min value of 32 with a value of 8 (matching the default pg_num value of 8) using the optional --pg-num-min <value> argument at pool creation and pg_num_min value for existing pools. Change-Id: Ie08fb367ec8b1803fcc6e8cd22dc8da43c90e5c4	2021-03-09 22:11:47 +00:00
Stephen Taylor	cf7d665e79	[ceph-client] Separate pool quotas from pg_num calculations Currently pool quotas and pg_num calculations are both based on percent_total_data values. This can be problematic when the amount of data allowed in a pool doesn't necessarily match the percentage of the cluster's data expected to be stored in the pool. It is also more intuitive to define absolute quotas for pools. This change adds an optional pool_quota value that defines an explicit value in bytes to be used as a pool quota. If pool_quota is omitted for a given pool, that pool's quota is set to 0 (no quota). A check_pool_quota_target() Helm test has also been added to verify that the sum of all pool quotas does not exceed the target quota defined for the cluster if present. Change-Id: I959fb9e95d8f1e03c36e44aba57c552a315867d0	2021-02-26 16:49:10 +00:00
Brian Wickersham	714cfdad84	Revert "[ceph-client] enhance logic to enable the autoscaler for Octopus" This reverts commit 910ed906d0df247f826ad527211bc86382e16eaa. Reason for revert: May be causing upstream multinode gates to fail. Change-Id: I1ea7349f5821b549d7c9ea88ef0089821eff3ddf	2021-02-25 17:04:37 +00:00
bw6938	910ed906d0	[ceph-client] enhance logic to enable the autoscaler for Octopus Change-Id: I90d4d279a96cd298eba03e9c0b05a8f2a752e746	2021-02-19 21:03:45 +00:00

1 2 3 4

195 Commits