osd_scrub_load_threshold set to 10.0 (default 0.5)
- With the number of multi-core processors nowadays, it's fairly
typical to see systems over a load of 1.0. We need to adjust the
scrub load threshold so that scrubbing runs as scheduled even
when a node is moderately/lightly under load.
filestore_max_sync_interval set to 10s (default 5s)
- Larger default journal sizes (>1GB) will not be effectively used
unless the max sync interval time is increased for Filestore. The
benefit of this change is increased performance especially around
sequential write workloads.
mon_osd_down_out_interval set to 1800s (default 600s)
- OSD PODs can take longer then several minutes to boot up. Mark
an OSD as 'out' in the CRUSH map only after 15 minutes of being
'down'.
Change-Id: I62d6d0de436c270d3295671f8c7f74c89b3bd71e
The update makes sure the Openstack service's cephx
user capabilities match best practices in terms of
security permissions after a site or software update.
Change-Id: I7c241cdb5d92463ac59c557ca7847ca5688d158b
This PS moves the single node gate to use a lightwight minikube
based env.
Change-Id: I285c4222795b66f3527f0daaf62a91973da5dca8
Co-authored-by: Krishna Venkata <kvenkata986@gmail.com>
Signed-off-by: Pete Birley <pete@port.direct>
Add helper scripts that are called by a POD to switch
Ceph from DNS to IPs. This POD will loop every 5 minutes
to catch cases where the DNS might be unavailable.
On a POD's Service start switch ceph.conf to using IPs rather
then DNS.
Change-Id: I402199f55792ca9f5f28e436ff44d4a6ac9b7cf9
This PS updates the mariadb chart to both support adoption of a
single instance of mariadb running the bash driven chart, which
did not support reforming a galera cluster by tracking state using
a configmap. Additionally basic logic is added for upgrading the
database as part of the normal rolling update flow.
Change-Id: I412de507112b38d6d2534e89f2a02f84bef3da63
Signed-off-by: Pete Birley <pete@port.direct>
This commit adds roles to kubernetes-keystone-webook policy
which has permissions similar to clusterrols cluster-admin,
edit and view present in kubernetes.
Check.sh script is also modified to test and verify the new
roles.
Change-Id: I43621d2e1036259064c805d97b340589a5b68c93
This PS breaks out the helper container images, which is required
now that the ingress image is more compact.
Change-Id: I6afb08954f37eda1ed913a4b3acdaf6e2b89d30e
Signed-off-by: Pete Birley <pete@port.direct>
This patch takes into consideration that there could be multiple
options for mandatory access control in a cluster. The previously
defined Helm toolkit function for generating a MAC annotation can
now be specified generically, like in this example:
mandatory_access_control:
type: apparmor
glance-api:
init: runtime/default
glance-api: runtime/default
glance-perms: runtime/default
ceph-keyring-placement: runtime/default
glance-registry:
init: runtime/default
glance-registry: runtime/default
If no MAC is required, then the "type" can be set to null,
and no annotation would be generated. The only MAC type supported
at the moment is "apparmor".
Change-Id: I6b45533d73af82e8fff353b0ed9f29f0891f24f1
This removes the tolerations key from the labels entries. As the
boolean check is on the pod.tolerations.enabled key instead, the
labels.foo.tolerations key is no longer used and should be removed
Change-Id: I00536dabadf9bd354219058d8efd054c60952bbd
Largely inspired and taken from Kranthi's PS.
- Add support for creating custom CRUSH rules based off of failure
domains and device classes (ssd & hdd)
- Basic logic around the PG calculator to autodetect the number of
OSDs globally and per device class (required when using custom crush
rules that specify device classes).
Change-Id: I13a6f5eb21494746c2b77e340e8d0dcb0d81a591
To allow to integrate TungstenFabric(Contrail) with Airship
there should be ability to redifine ports that can be conflicted.
Change-Id: Id15658c65339577cec03f25ebd22dd664bb5976a
Long hostnames can cause the 63 char name limit to be exceeded.
Truncate the hostname if hostname > 20 char.
Change-Id: Ieb7e4dafb41d1fe3ab3d663d2614f75c814afee6
This adds basic charts for Elastic metricbeat, filebeat,
packetbeat, and elastic APM server. This also adds an experimental
job for deploying the elastic beats along with Elasticsearch and
Kibana
Change-Id: Idcdc1bfa75bcdcaa68801dbb8999f0853652af0f
This adds session affinity to Prometheus's ingress. This allows for
the use of cookies for Prometheus's session affinity
Change-Id: I2e7e1d1b5120c1fb3ddecb5883845e46d61273de
This updates the Nagios image tag to include the updated plugin
for querying Elasticsearch for alerting on logged events
Change-Id: Idd61d82463b79baab0e94c20b32da1dc6a8b3634
This PS updates the version of the ingress controller image used.
This brings in the ability to update the ingress configuration without
reloading nginx. There may also need to be some changes for prom based
monitoring:
* https://github.com/kubernetes/ingress-nginx/blob/master/Changelog.md#0100
Change-Id: Ia0bf3dbb9b726f3a5cfb1f95d7ede456af13374a
Signed-off-by: Pete Birley <pete@port.direct>
This PS updates the ingress chart to allow the status pport to be
changed.
Change-Id: Ia38223c56806f6113622a809e792b4fedd010d87
Signed-off-by: Pete Birley <pete@port.direct>
Add support for a rack level CRUSH map. Rack level CRUSH support is
enabled by using the "rack_replicated_rule" crush rule.
Change-Id: I4df224f2821872faa2eddec2120832e9a22f4a7c
This moves to update the host used for the ceph health checks, as
we should be checking the ceph-mgr service directly for ceph
metrics instead of trying to curl the host directly.
This also changes the ceph_health_check to use the base-os
hostgroup instead of the placeholder ceph-mgr host group, as we're
just executing a simple check against the ceph-mgr service.
This also adds default configuration values for the
max_concurrent_checks (60) and check_workers (4) values instead
of leaving them at the defaults Nagios uses (0 and # cores,
respectively)
Change-Id: Ib4072fcd545d8c05d5e9e4a93085a8330be6dfe0
This updates the Nagios image to use a tag that includes a fix for
the service discovery mechanism used for updating host checks.
After moving the Nagios chart to either run in shared or host PID
namespaces, the service discovery mechanism no longer worked due
to the plugin attempting to restart PID 1 instead of determining
the appropriate PID to restart.
For reference, see:
https://review.gerrithub.io/#/c/att-comdev/nagios/+/432205/
Change-Id: Ie01c3a93dd109a9dc99cfac5d27991583546605a