This commit fixes a small issue with Patroni where sometimes pg_rewind
would fail due to limitations in Postgres 9.5. To combat pg_rewind
failures, we can enable remove_data_directory_on_rewind_failure which
will cleanup the data directory on the pod and recreates it as a
replica so that the pod can restart from fresh, rather than churning in
an error state. This commit also sets
remove_data_directory_on_diverged_timelines to give Patroni a greater
ability to combat timeline divergence errors.
Change-Id: Ic9f75dbfa0dd990e2b215ed204e55cd67a5d1159
- Allow configuration of the termination grace period
for the Patroni pod with a default of 180s to ensure
the database has time to gracefully spin down, even
on slow disk.
Change-Id: I420cbd601bbffa50217b717bd4a636d48d324617
The PS allows to run the tests when both options (rgw_ks and rgw_s3)
are enabled at the same time.
Change-Id: I262baa38b7c65ff9335a3db6a6e2a454c3ff3f5f
This PS moves to drive all mariadb config via the values fed
to the chart.
Change-Id: I4ed3624737af4d5c90b1b5de451a0a0b75a5eda1
Signed-off-by: Pete Birley <pete@port.direct>
This PS updates the wsrep_provider_options to define the timeouts
explitlcitly for evs.suspect_timeout, gmcast.peer_timeout. Their
defaults are PT5S, and PT3S respectively, which are increased by
a factor of approx 5, to accomdate network instability that may
occur during node outage events.
Change-Id: Ie5cdd06d91299e5e2632b70cb9b50a7ad14f62b1
Signed-off-by: Pete Birley <pete@port.direct>
This commit enables overriding liveness/readiness probes
configurations for openvswitch pods from values.yaml
Change-Id: I4ec2b9e88bf8ed57e8ac9293f333969b63cef335
Earlier the query used to sort variables in asc alphabetical order,
updated to sort the cluster in desc alphabetical order
Change-Id: I8f08a44b05d0159ad1a043f052751d44b4625f1d
This change makes rabbitmq container run with the rabbitmq user
instead of the root user. As the rabbitmq user doesn't have write
access to '/run' directory, the templates are updated to use the
'/tmp' directory instead which the rabbitmq user has write access
to.
Change-Id: Ia35c3f741fefe3172c93bb042bf8d26bf7672cfc
This disables the cephfs provisioner in the multinode
periodic jobs. It seems the helm tests for the ceph
provisioner chart that test cephfs fail more often than
not in the multinode jobs while passing reliably in the
single node check and gate jobs. As cephfs is still
gated, disabling the cephfs provisioner in the periodic
jobs allows for further investigation into this issue
without causing potential regressions
Change-Id: I36e68cc2e446afac8769fb9ab753105909341f24
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Systemd units run as the root user by default; however, environment
variables in spawned processes are not populated for the root user
unless "User=root" is specified for a particular unit [0]. This change
adds the "User=root" declaration to the Kubelet systemd unit so that
Kubelet will look in the root user's home directory for Docker
configuration information. Without this change, Docker configuration
information, such as authentication keys for private repositories, are
ignored by Kubelet even though the Docker daemon honors them.
[0] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Environment%20variables%20in%20spawned%20processes
Change-Id: I209de0f4f04c078d39b1e8bf18195e51e965cbf3
Signed-off-by: Drew Walters <andrew.walters@att.com>
Added new X-Content-Type-Options: nosniff header to make sure the browser
does not try to detect a different Content-Type than what is actually
sent (can lead to XSS)
Added new X-Frame-Options: sameorigin header to protect against
drag and drop clickjacking attacks in older browsers
Added new Content-Security-Policy: script-src self for implementation
Added new HTTP Security header X-XSS-Protection:1 mode=block to
sanitize the page, when a XSS attack is detected, the browser will
prevent rendering of the page
Change-Id: Ic79bbb96484a7f1a497c001883783338fd26a47a
This updates the Minikube deployment to patch the tiller-deploy
service to add a port definition for the http (44135) port for
tiller, which is used to expose metrics for Prometheus to scrape
Change-Id: I2eb5d4001c37935674ce64012b2744030addc127
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This removes the artifacts associated with images for libvirt,
mariadb, and vbmc from openstack-helm-infra as these images now
live in openstack-helm-images.
Change-Id: I5c97d2db89068c71ec1a56a5ac17007682711182
Signed-off-by: Steve Wilkerson <sw5822@att.com>
- Change the Postgres configuration to use x509 client
certs for authenticating the connections for replicating
between Patroni nodes. This is a straightforward solution
for support credential rotation for the replication user.
Password authentication is problematic due to the declartive
nature of helm charts and requiring an existing replication
connection to replicate the rotated password.
Change-Id: I0c5456a01b3a36fee8ee4c986d25c4a1d807cb77
This PS udpated the reset node function to leave the assets generated
via init containers in place when resetting the node.
Change-Id: Iac52ca82e95bb372dbcbca0eeea3b262215e9c12
Signed-off-by: Pete Birley <pete@port.direct>
This adds a cron job to manually verify all snapshot repositories
are registered to any active master and data nodes. This is to
address scenarios where master and data nodes do not have the
desired snapshot repositories registered following node outages
or reboots
Change-Id: Ie6f42e95c3ca4dc2ec70f2852a2bde11e59ec097
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This patch enhances the HTK job manifest functions so that each job can
be configured to use the desired backoffLimit and activeDeadlineSeconds,
and can mount the command/script from either a configMap or a secret
instead of being confined to using only configMaps.
Change-Id: I5231e53b98e3e55e3e93070876d8694f37ad642d
Appended the code that will add the calico dashboard to the Grafana. This will
display the felix metrics which are collected by the prometheus.
Change-Id: If18a18949f8093747b3f9ba819e036778c40b84e
We can select if we want an image with dpdk support by adding:
FEATURE_GATES=dpdk
That way we can reuse the same script for different distros by using
openstack-helm/tools/deployment/common/get-values-overrides.sh
Change-Id: Ia2c53556be650899fdd67c1ec06f5c68ae63c9d4
Signed-off-by: Manuel Buil <mbuil@suse.com>
As per PR, https://github.com/kubernetes/kubernetes/pull/60210,
in kubectl get show-all option is deprecated and no longer needed.
Presumably now that's the default behavior.
Also in current logs gathering logic, we are interested in capturing
only pod names, so removing that option is harmless.
We are seeing related failures in local CI when kubectl version is
1.15.x. So removing this option.
Change-Id: I3886c792fe28bc8b80504d8c91e9524039131b15
This updates the script for registering snapshot repositories to
include a manual verification of the repositories created. This
simply allows for inspection of all master and data nodes the
repository is verified with to provide additional visibility into
the state of all repositories
Change-Id: I6e5386386e2b79b1cb0f41fc1f9b78817695f8f3
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Revert 833d426da8e4b049277ca9847830f6e6beee40c3
https://review.opendev.org/#/c/667022 introduced a regression in the
overrides functionality, which caused the corresponding gate test to
fail. This "fixed" a problem by breaking the override capability.
This patchset reverts the previous to restore override functionality and
make gates green again. Deep copy is added in order to resolve the
original problem that 667022 attempted to resolve.
Change-Id: I6c052c0fabe0067612d6a3d9d3bfac4df59202d7
This is to fix the issue we are facing with file permision on the file
/var/lib/ceph/bootstrap-rgw/ceph.keyring since owner of the file
will be root.
This is happening when node with rgw reboots and rgw pods fails at
init after reboot,this is happening on sinlge node deplyoments.
issue:
ceph-rgw-5db485fbd9-dv778 0/1 Init:CrashLoopBackOff 5 6m49s
logs:
+ chown -R ceph. /run/ceph/ /var/lib/ceph/bootstrap-rgw /var/lib/ceph/radosgw
/var/lib/ceph/tmp
chown: changing ownership of
'/var/lib/ceph/bootstrap-rgw/ceph.keyring': Operation not permitted
Change-Id: Idcb648c205053b2f03357b59173e70e02f28688c
Earlier the Nagios alert monitor was percent based as in when the percent of OSD
down is greater than 80, it will send alert.
>check_prom_alert!ceph_osd_down_pct_high!CRITICAL- CEPH OSDs down is
more than 80 percent!OK- CEPH OSDs down is less than 80 percent
Updated the code in nagios values.yaml to send alert when even 1 OSD is
down:
>check_prom_alert!ceph_osd_down!CRITICAL- One or more CEPH OSDs are down
>for more than 5 minutes!OK- All the CEPH OSDs are up
Change-Id: Id24c4a0cca64674890dae3599edc0c90d9534e90
This patch is part of an effort to cleanup the values.yaml file for
Postgres, which has gotten messy since the introduction of Patroni. This
patch specifically removes unused configuration values which were
causing unnecessary bloat and complexity.
Change-Id: I96180fd9c91200ba7558e58bd503b4ef9ebc183e
When upgrading/reconfiguring a rabbit cluster its possible that the nodes
will not return the cluster status for some time, this ps allows us to
cope with this much more gracefully than simply crashing a few times, before
proceeding.
Change-Id: Ibf525df9e3a9362282f70e5dbb136430734181fd
Signed-off-by: Pete Birley <pete@port.direct>