With current version of rabbitmq-exporter,
unable to retrieve data sometimes,
failing with rabbitmq timeout issues.
Rabbitmq timeout threshold is set as 10 sec
and is not configurable with current version.
Updating the rabbitmq-exporter version to
kbudde/rabbitmq-exporter:v1.0.0-RC7.1
(Default "RABBITMQ_TIMEOUT" set as 30 sec)
to solve rabbitmq timeout issues.
Change-Id: Ia51f368a1bba2b0fd9195cf9991b55864cdebfc1
This reverts commit 42f3b3eaf5a8794b1f247915fffbef68137e6c1c.
Reason for revert: dockerhub now sets a hard limit on daily pulls, lets switch back to using the opendev docker proxy.
Change-Id: I87e399c89d5736f39d7bdba2011655e5f5766180
This PS adds RABBIT_TIMEOUT parameter as configurable
with kbudde/rabbitmq-exporter:v1.0.0-RC7.1 version
Change-Id: I8faf8cd706863f65afb5137d93a7627d421270e9
This change updates the fluentd chart to use HTK probe templates
to allow configuration by value overrides
Change-Id: I97a3cc0832554a31146cd2b6d86deb77fd73db41
OSD failures during an update can cause degraded and misplaced
objects. The post-apply job restarts OSDs in failure domain
batches in order to accomplish the restarts efficiently. There is
already a wait for degraded objects to ensure that OSDs are not
restarted on degraded PGs, but misplaced objects could mean that
multiple object replicas exist in the same failure domain, so the
job should wait for those to recover as well before restarting
OSDs in order to avoid potential disruption under these failure
conditions.
Change-Id: I39606e388a9a1d3a4e9c547de56aac4fc5606ea2
According to get-values-overrides.sh script it is expected to
have values_overrides directory, not value_overrides.
Change-Id: I53744117af6962d51519bc1d96329129473d9970
A recent change to wait_for_pods() to allow for fault tolerance
appears to be causing wait_for_pgs() to fail and exit the post-
apply script prematurely in some cases. The existing
wait_for_degraded_objects() logic won't pass until pods and PGs
have recovered while the noout flag is set, so the pod and PG
waits can simply be removed.
Change-Id: I5fd7f422d710c18dee237c0ae97ae1a770606605
New systemd 237-3ubuntu10.43 bumps memlock limit from 16 to 64 MB [1]
which seems to cause issues with eBPF related operations in containers
run with root [2] as a possible root cause.
Here we have an option to downgrade systemd to previous available
version or to set previous default memlock limit to systemd defaults or
docker unit. Setting systemd DefaultLimitMEMLOCK in this commit.
[1] https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.43
[2] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1837580/comments/9
Change-Id: I55d14ffa47a7a29d059f2f3b502bb38be0a5dd3d
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This patch set changes the source of the rabbitmq-exporter's admin user
credential to leverage the existing secret rather than the values in the
Values.yaml file.
Change-Id: I1ad48ade3984e455d07be3a8b8ee3d9b25b449a2
Signed-off-by: Tin Lam <tin@irrational.io>
The PS updates post apply job and allows to check multiple times
inactive PGs that are not peering. The wait_for_pgs() function
fails after 10 sequential positive checks.
Change-Id: I98359894477c8e3556450b60b25d62773666b034
This change removes the non-voting divingbell job from
openstack-helm-infra checks due to not really being used to
test much functionality.
Change-Id: I343b4cdc98d637522ac854211a974cc86d49cae6
This patchset adds the capability to delete any archives that are stored
in the local file system or archives that are stored on the remote RGW
data store.
Change-Id: I68cade39e677f895e06ec8f2204f55ff913ce327
- Check issuer type to distinguish the annotation between
clusterissuer and issuer
- Add one more annotation "certmanager.k8s.io/xx" for old version
Change-Id: I320c1fe894c84ac38a2878af33e41706fb067422
Look like using docker proxy is slower and less stable than pulling from
dockerhub directly and contributes to some part of unstable builds.
This reverts commit e3f14aaff35364b84acedf53b3778111cbae0373.
Change-Id: I9735ad35ce9240f610479a56eaa38715defa2e04
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
The PS updates wait_for_pods() function in post apply script.
The changes allow to pass wait_for_pods() function when required percent
of OSDs reached (REQUIRED_PERCENT_OF_OSDS). Also removed a part of code
which is not needed any more.
Change-Id: I56f1292682cf2aa933c913df162d6f615cf1a133
This reverts commit 982e3754a5755cc227552b6f1fcc195e8793589c.
"Add default reject rule end in Postgres pg_hba.conf to ensure all
connections must be explicitly allowed."
The original commit introduced a breaking change when installing with
the chart defaults - before, all remote connections with md5 auth were
allowed, and after the change, only explicit users are allowed.
This is fully overridable, but the original defaults are more
conservative.
Change-Id: Ib297e480bccd3ac7c0cf15985b3def2c8b3e889e
* add preStop hook to trigger Fast Shutdown
* disable readiness probe by default
When Kubernetes terminates a pod, the container runtime typically sends
a SIGTERM signal to pid 1 in each container [0]. PostgreSQL interprets
SIGTERM as a request to do a "Smart Shutdown" [1]. This can take minutes
(often exhausting the termination grace period), and during this time,
new connections are not being serviced.
Now that postgresql has a single replica, this behavior is undesirable.
If we kill the pod (e.g. in an upgrade), we probably want it to come
back as soon as possible.
This change adds a preStop hook that sends a SIGINT to postgresql in
order to trigger a "Fast Shutdown". In addition, the readiness probe is
disabled by default, since it adds no value in a single-replica
scenario.
0: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
1: https://www.postgresql.org/docs/9.6/server-shutdown.html
Change-Id: Ib5f3d2a49e55332604c91f9a011e87d78947dbef
Uses the standard helm-toolkit macros for liveness and readiness probes,
allowing them to be enabled or disabled, and params to be overridden.
The existing hard-coded settings are preserved as the chart defaults.
Change-Id: Idd063e6b8721126c88fa22c459f93812151d7b64
Part 2. This patch set adjusts the url once the initial packages are
make available.
Change-Id: Idfb69146d606b43c98c552d1d2c5680ccd503282
Signed-off-by: Tin Lam <tin@irrational.io>
This corrects the ability to sync artifacts to tarballs.o.o.
Change-Id: Icb2b6653f263aaab173d1479d05c0209e7390c50
Signed-off-by: Tin Lam <tin@irrational.io>
Some services attempt to recreate the default domain
with both the values of "default" and "Default". Since this
domain already exists when keystone is deployed, this
creates redundant API calls that only result in conflicts.
This change enables nocasematch for string checking in order
to avoid making multiple unnecessary calls to keystone.
Change-Id: I698fd420dc41eae211a511269cb021d4ab7a5bfc
This patch set updates the ability to package (and subsequent publish)
of the charts in the OpenStack-Helm-Infra repository.
Change-Id: I6175325b0e7a668c22a7ec3ab08cae51ad4f9ab8
Signed-off-by: Tin Lam <tin@irrational.io>
This is to fix the logic to disable the autosclaer on pools as
its not considering newly created pools.
Change-Id: I76fe106918d865b6443453b13e3a4bd6fc35206a
There are race conditions in the ceph-volume osd-init script that
occasionally cause deployment and OSD restart issues. This change
attempts to resolve those and stabilize the script when multiple
instances run simultaneously on the same host.
Change-Id: I79407059fa20fb51c6840717a083a8dc616ba410
This changes attempts to address the chart publish issue. Also makes
the job periodic.
Change-Id: I806da82a7eb07ce8e83ae8c023a014fa3b917193
Signed-off-by: Tin Lam <tin@irrational.io>
This is to improve the logic to detect used osd disks so that scripts will
not zap the osd disks agressively.
also adding debugging mode for pvdisplay commands to capture more logs
during failure scenarios along with reading osd force repair flag from
values.
Change-Id: Id2996211dd92ac963ad531f8671a7cc8f7b7d2d5