This PS is to address security best practices concerning running
containers as a non-privileged user and disallowing privilege
escalation. Ceph-client is used for the mgr and mds pods.
Change-Id: Idbd87408c17907eaae9c6398fbc942f203b51515
ADD: new snapshot policy template job which creates templates for
ES SLM manager to snapshot indicies instead of curator.
Change-Id: I629d30691d6d3f77646bde7d4838056b117ce091
OSD logical volume names used to be based on the logical disk path,
i.e. /dev/sdb, but that has changed. The lvremove logic in disk_zap()
is still using the old naming convention. This change fixes that.
Change-Id: If32ab354670166a3c844991de1744de63a508303
There are many race conditions possible when multiple ceph-osd
pods are initialized on the same host at the same time using
shared metadata disks. The locked() function was introduced a
while back to address these, but some commands weren't locked,
locked() was being called all over the place, and there was a file
descriptor leak in locked(). This change cleans that up by
by maintaining a single, global file descriptor for the lock file
that is only opened and closed once, and also by aliasing all of
the commands that need to use locked() and removing explicit calls
to locked() everywhere.
The global_locked() function has also been removed as it isn't
needed when individual commands that interact with disks use
locked() properly.
Change-Id: I0018cf0b3a25bced44c57c40e33043579c42de7a
Add openvswitch gate issue with systemd 237-3ubuntu10.43 to
multinode also. Added code from [0].
Additionally, made changes to support 1.18.9 version of kubeadm.
[0] https://review.opendev.org/c/openstack/openstack-helm-infra/+/763619
Change-Id: I2681feb1029e5535f3f278513e8aece821c715f1
This is to address zombie processes found in ceph-mon containers due
to the mon-check.sh monitoring script. With shareProcessNamespace the
/pause container will properly handle the defunct processes.
Change-Id: Ic111fd28b517f4c9b59ab23626753e9c73db1b1b
The build-chart playbook task to point each chart to helm-toolkit
has a find command that when used with another repo, will
include the charts for osh-infra as well.
This change modifies the playbook to only modify requirements
in charts in the repo being published.
Change-Id: I493b4c64fe2525bac0acae06bd40c3896c918e20
With current version of rabbitmq-exporter,
unable to retrieve data sometimes,
failing with rabbitmq timeout issues.
Rabbitmq timeout threshold is set as 10 sec
and is not configurable with current version.
Updating the rabbitmq-exporter version to
kbudde/rabbitmq-exporter:v1.0.0-RC7.1
(Default "RABBITMQ_TIMEOUT" set as 30 sec)
to solve rabbitmq timeout issues.
Change-Id: Ia51f368a1bba2b0fd9195cf9991b55864cdebfc1
This reverts commit 42f3b3eaf5a8794b1f247915fffbef68137e6c1c.
Reason for revert: dockerhub now sets a hard limit on daily pulls, lets switch back to using the opendev docker proxy.
Change-Id: I87e399c89d5736f39d7bdba2011655e5f5766180
This PS adds RABBIT_TIMEOUT parameter as configurable
with kbudde/rabbitmq-exporter:v1.0.0-RC7.1 version
Change-Id: I8faf8cd706863f65afb5137d93a7627d421270e9
The default, directory-based OSD configuration doesn't appear to work
correctly and isn't really being used by anyone. It has been commented
out and the comments have been enhanced to document the OSD config
better. With this change there is no default configuration anymore, so
the user must configure OSDs properly in their environment in
values.yaml in order to deploy OSDs using this chart.
Change-Id: I8caecf847ffc1fefe9cb1817d1d2b6d58b297f72
This change updates the fluentd chart to use HTK probe templates
to allow configuration by value overrides
Change-Id: I97a3cc0832554a31146cd2b6d86deb77fd73db41
OSD failures during an update can cause degraded and misplaced
objects. The post-apply job restarts OSDs in failure domain
batches in order to accomplish the restarts efficiently. There is
already a wait for degraded objects to ensure that OSDs are not
restarted on degraded PGs, but misplaced objects could mean that
multiple object replicas exist in the same failure domain, so the
job should wait for those to recover as well before restarting
OSDs in order to avoid potential disruption under these failure
conditions.
Change-Id: I39606e388a9a1d3a4e9c547de56aac4fc5606ea2
According to get-values-overrides.sh script it is expected to
have values_overrides directory, not value_overrides.
Change-Id: I53744117af6962d51519bc1d96329129473d9970
A recent change to wait_for_pods() to allow for fault tolerance
appears to be causing wait_for_pgs() to fail and exit the post-
apply script prematurely in some cases. The existing
wait_for_degraded_objects() logic won't pass until pods and PGs
have recovered while the noout flag is set, so the pod and PG
waits can simply be removed.
Change-Id: I5fd7f422d710c18dee237c0ae97ae1a770606605
New systemd 237-3ubuntu10.43 bumps memlock limit from 16 to 64 MB [1]
which seems to cause issues with eBPF related operations in containers
run with root [2] as a possible root cause.
Here we have an option to downgrade systemd to previous available
version or to set previous default memlock limit to systemd defaults or
docker unit. Setting systemd DefaultLimitMEMLOCK in this commit.
[1] https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.43
[2] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1837580/comments/9
Change-Id: I55d14ffa47a7a29d059f2f3b502bb38be0a5dd3d
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This patch set changes the source of the rabbitmq-exporter's admin user
credential to leverage the existing secret rather than the values in the
Values.yaml file.
Change-Id: I1ad48ade3984e455d07be3a8b8ee3d9b25b449a2
Signed-off-by: Tin Lam <tin@irrational.io>
The PS updates post apply job and allows to check multiple times
inactive PGs that are not peering. The wait_for_pgs() function
fails after 10 sequential positive checks.
Change-Id: I98359894477c8e3556450b60b25d62773666b034
This change removes the non-voting divingbell job from
openstack-helm-infra checks due to not really being used to
test much functionality.
Change-Id: I343b4cdc98d637522ac854211a974cc86d49cae6
This patchset adds the capability to delete any archives that are stored
in the local file system or archives that are stored on the remote RGW
data store.
Change-Id: I68cade39e677f895e06ec8f2204f55ff913ce327
- Check issuer type to distinguish the annotation between
clusterissuer and issuer
- Add one more annotation "certmanager.k8s.io/xx" for old version
Change-Id: I320c1fe894c84ac38a2878af33e41706fb067422
Look like using docker proxy is slower and less stable than pulling from
dockerhub directly and contributes to some part of unstable builds.
This reverts commit e3f14aaff35364b84acedf53b3778111cbae0373.
Change-Id: I9735ad35ce9240f610479a56eaa38715defa2e04
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>