915 Commits

Author SHA1 Message Date
Jean-Charles Lopez
f7e03d4763 Helm Tests for Ceph-RGW chart
Co-Authored-By: Renis Makadia <renis.makadia@att.com>

Change-Id: I81cc0cb498b2ca911d5b7bfa7c3bd9b8552e0e2b
2018-12-01 08:08:28 +00:00
Zuul
5316586d9e Merge "Fluentbit/Node Exporter: Remove unused tolerations key" 2018-11-29 15:40:03 +00:00
Pete Birley
4803fe31d1 Ingress: Break out helper container images
This PS breaks out the helper container images, which is required
now that the ingress image is more compact.

Change-Id: I6afb08954f37eda1ed913a4b3acdaf6e2b89d30e
Signed-off-by: Pete Birley <pete@port.direct>
2018-11-28 20:54:35 -06:00
Zuul
2bec7040a9 Merge "Add failure domains, and device classes for custom CRUSH rules" 2018-11-29 00:42:05 +00:00
Cliff Parsons
598faeb8db Make access control annotations more generic.
This patch takes into consideration that there could be multiple
options for mandatory access control in a cluster. The previously
defined Helm toolkit function for generating a MAC annotation can
now be specified generically, like in this example:

  mandatory_access_control:
    type: apparmor
    glance-api:
      init: runtime/default
      glance-api: runtime/default
      glance-perms: runtime/default
      ceph-keyring-placement: runtime/default
    glance-registry:
      init: runtime/default
      glance-registry: runtime/default

If no MAC is required, then the "type" can be set to null,
and no annotation would be generated. The only MAC type supported
at the moment is "apparmor".

Change-Id: I6b45533d73af82e8fff353b0ed9f29f0891f24f1
2018-11-28 08:54:15 +00:00
Zuul
04c8f03532 Merge "Add charts for Elastic Beats" 2018-11-27 20:25:42 +00:00
Steve Wilkerson
26c3773983 Fluentbit/Node Exporter: Remove unused tolerations key
This removes the tolerations key from the labels entries. As the
boolean check is on the pod.tolerations.enabled key instead, the
labels.foo.tolerations key is no longer used and should be removed

Change-Id: I00536dabadf9bd354219058d8efd054c60952bbd
2018-11-27 12:38:16 -06:00
Zuul
42249d4243 Merge "Truncate long host names for overrides" 2018-11-27 16:01:53 +00:00
Matthew Heler
6e8c289c13 Add failure domains, and device classes for custom CRUSH rules
Largely inspired and taken from Kranthi's PS.

 - Add support for creating custom CRUSH rules based off of failure
domains and device classes (ssd & hdd)
- Basic logic around the PG calculator to autodetect the number of
OSDs globally and per device class (required when using custom crush
rules that specify device classes).

Change-Id: I13a6f5eb21494746c2b77e340e8d0dcb0d81a591
2018-11-27 09:37:30 -06:00
Andrey Pavlov
5ac56d9307 add parameter to allow redefining of server port for ingress
To allow to integrate TungstenFabric(Contrail) with Airship
there should be ability to redifine ports that can be conflicted.

Change-Id: Id15658c65339577cec03f25ebd22dd664bb5976a
2018-11-27 13:15:32 +03:00
Anderson, Craig (ca846m)
48a0c09fea Truncate long host names for overrides
Long hostnames can cause the 63 char name limit to be exceeded.
Truncate the hostname if hostname > 20 char.

Change-Id: Ieb7e4dafb41d1fe3ab3d663d2614f75c814afee6
2018-11-26 17:04:58 -08:00
Steve Wilkerson
4c18a421ee Add charts for Elastic Beats
This adds basic charts for Elastic metricbeat, filebeat,
packetbeat, and elastic APM server.  This also adds an experimental
job for deploying the elastic beats along with Elasticsearch and
Kibana

Change-Id: Idcdc1bfa75bcdcaa68801dbb8999f0853652af0f
2018-11-26 20:19:57 +00:00
Zuul
0730df5973 Merge "Prometheus: Add session affinity to ingress" 2018-11-26 18:21:14 +00:00
Zuul
4b76f8c280 Merge "Nagios: Update image tag" 2018-11-26 17:40:20 +00:00
Steve Wilkerson
71c1a16758 Prometheus: Add session affinity to ingress
This adds session affinity to Prometheus's ingress. This allows for
the use of cookies for Prometheus's session affinity

Change-Id: I2e7e1d1b5120c1fb3ddecb5883845e46d61273de
2018-11-26 14:30:08 +00:00
Steve Wilkerson
439079693d Nagios: Update image tag
This updates the Nagios image tag to include the updated plugin
for querying Elasticsearch for alerting on logged events

Change-Id: Idd61d82463b79baab0e94c20b32da1dc6a8b3634
2018-11-26 08:29:22 -06:00
Zuul
8e369d2c9c Merge "Ingress: Update version of ingress controller image" 2018-11-23 20:39:38 +00:00
Zuul
89b651dc1d Merge "Ingress: Make healthz port configurable" 2018-11-21 20:01:26 +00:00
Pete Birley
4d2085f0af Ingress: Update version of ingress controller image
This PS updates the version of the ingress controller image used.

This brings in the ability to update the ingress configuration without
reloading nginx. There may also need to be some changes for prom based
monitoring:
 * https://github.com/kubernetes/ingress-nginx/blob/master/Changelog.md#0100

Change-Id: Ia0bf3dbb9b726f3a5cfb1f95d7ede456af13374a
Signed-off-by: Pete Birley <pete@port.direct>
2018-11-21 19:21:40 +00:00
Zuul
16072765bf Merge "Ingress: Allow status port to be customised" 2018-11-20 18:29:16 +00:00
Pete Birley
ea875b1dcc Ingress: Make healthz port configurable
This PS updates the healthz port to be configurable

Change-Id: Ifa5ea4b7b422156a7309886ecc21668fc096065b
Signed-off-by: Pete Birley <pete@port.direct>
2018-11-20 12:28:14 -06:00
Pete Birley
f3e1fa4e72 Ingress: Allow status port to be customised
This PS updates the ingress chart to allow the status pport to be
changed.

Change-Id: Ia38223c56806f6113622a809e792b4fedd010d87
Signed-off-by: Pete Birley <pete@port.direct>
2018-11-20 09:57:56 -06:00
Matthew Heler
5ce9f2eb3b Enable Ceph charts to be rack aware for CRUSH
Add support for a rack level CRUSH map. Rack level CRUSH support is
enabled by using the "rack_replicated_rule" crush rule.

Change-Id: I4df224f2821872faa2eddec2120832e9a22f4a7c
2018-11-20 09:07:36 -06:00
Zuul
5d356f9265 Merge "Document howto recover from a Ceph namspace deletion" 2018-11-15 17:27:45 +00:00
Matthew Heler
cfc2d4abd8 Document howto recover from a Ceph namspace deletion
Change-Id: Ib1b03cd046fbdad6f18478cfa9c9f0bf70ec9430
2018-11-14 13:31:16 -06:00
Zuul
dd6b2a0a1d Merge "Additional Ceph RGW tuning and cleanups" 2018-11-14 18:48:36 +00:00
Zuul
5bf9c26bd8 Merge "Move default CEPH journal size from 5GB to 10GB" 2018-11-13 05:28:45 +00:00
Matthew Heler
225b85eb5f Additional Ceph RGW tuning and cleanups
Set RGW rados handles from 1 to 4
Remove support for fastcgi (it's no longer supported)

Change-Id: Ie260a3e1e5eab2065ec6a4d0637c144965a4214d
2018-11-12 20:13:33 +00:00
Zuul
2640e7422d Merge "This fixes host-specific overrides" 2018-11-10 02:37:53 +00:00
Zuul
2c9ff8bee8 Merge "Fix the checkPGs cronjob" 2018-11-09 22:57:50 +00:00
Ian Howell
9b132225c6 This fixes host-specific overrides
This properly assigns k8s secrets to volumes, rather than using
configMaps

Change-Id: Ifcabd3565fb2abee063f5da117d83ac3a5602536
2018-11-09 16:24:03 -06:00
Steve Wilkerson
dfb4654fba Nagios: Configuration updates
This moves to update the host used for the ceph health checks, as
we should be checking the ceph-mgr service directly for ceph
metrics instead of trying to curl the host directly.

This also changes the ceph_health_check to use the base-os
hostgroup instead of the placeholder ceph-mgr host group, as we're
just executing a simple check against the ceph-mgr service.

This also adds default configuration values for the
max_concurrent_checks (60) and check_workers (4) values instead
of leaving them at the defaults Nagios uses (0 and # cores,
respectively)

Change-Id: Ib4072fcd545d8c05d5e9e4a93085a8330be6dfe0
2018-11-09 13:28:50 -06:00
Steve Wilkerson
325b3cea4d Nagios: Update host check mechanism
This updates the Nagios image to use a tag that includes a fix for
the service discovery mechanism used for updating host checks.
After moving the Nagios chart to either run in shared or host PID
namespaces, the service discovery mechanism no longer worked due
to the plugin attempting to restart PID 1 instead of determining
the appropriate PID to restart.

For reference, see:
https://review.gerrithub.io/#/c/att-comdev/nagios/+/432205/

Change-Id: Ie01c3a93dd109a9dc99cfac5d27991583546605a
2018-11-09 09:12:16 -06:00
Zuul
b55e9b10a7 Merge "Nagios: Add session affinity to ingress" 2018-11-09 04:45:36 +00:00
Zuul
98c9b148f3 Merge "Nagios: Update ceph_health check" 2018-11-09 03:24:23 +00:00
Steve Wilkerson
2c6aa8ad1b Nagios: Add session affinity to ingress
This adds session affinity to Nagios's ingress. This allows for
the use of cookies for Nagios's session affinity

Change-Id: I6054a92f644dc533dd06d35a2541fb44d46cba88
2018-11-09 02:07:39 +00:00
Zuul
a90ebb784c Merge "Prometheus: Update discovery configuration for ceph-mgr services" 2018-11-09 01:01:54 +00:00
Zuul
77772547e2 Merge "RGW: Fix multinode deploy for ceph rgw" 2018-11-08 22:54:01 +00:00
Zuul
d530635348 Merge "Do not use OSH_INFRA_PATH in osh-infra" 2018-11-08 22:54:00 +00:00
Meg Heisler
774e0cb654 RGW: Fix multinode deploy for ceph rgw
Change deployment script for rgw to not use the docker
bridge for public and cluster network overrides. Instead,
calculate network values in same way as other ceph multinodes
deployment steps

Change-Id: I2bacd1af1cc331d76a5d61f3b589ca6ef80b1b2e
2018-11-08 11:39:23 -06:00
Matthew Heler
55446e1f41 Move default CEPH journal size from 5GB to 10GB
Request from downstream to use 10GB journal sizes. Currently journals 
are created manually today, but there is upcoming work to have the
journals created by the Helm charts themselves. This value needs to be
put in as a default to ensure journals are sized appropiately.

Change-Id: Idaf46fac159ffc49063cee1628c63d5bd42b4bc6
2018-11-08 17:34:12 +00:00
Zuul
7274c5f95f Merge "Revert "Fix rally deployment config to rally 1.2.0"" 2018-11-07 22:26:22 +00:00
Zuul
47d49bcfd4 Merge "prometheus ceph.rules changes" 2018-11-07 20:51:42 +00:00
Pete Birley
b7e77dfea0 Revert "Fix rally deployment config to rally 1.2.0"
This reverts commit 5c2859c3e9026e464bf0c35b591aaae810ff2a1c.

This commit breaks the ability to declare users to use with rally/helm test - and needs to be refactored to match the commit message's intent.

Change-Id: I2bc66ef40694c277058b4324b8a3528f4f25d1d1
2018-11-07 19:31:49 +00:00
Zuul
b28aed8331 Merge "Fix rally deployment config to rally 1.2.0" 2018-11-07 14:12:32 +00:00
Matthew Heler
e1c82f3465 Fix the checkPGs cronjob
Currently the cronjob is broken due to syntax and
permission issues.

Additionally move the cronjob from once a month to
every 15 minutes, and automatically disable the job
unless explicitly enabled.

Change-Id: Id72bdb286c805ccb0ea4e9fcf65fabca94a180dd
2018-11-06 19:39:23 -06:00
Steve Wilkerson
ba22b0e726 Nagios: Update ceph_health check
The ceph_health check in Nagios incorrectly sets the warning and
error level to 0. The ceph_health_status metric's value of 0
indicates the cluster is healthy, while 1 indicates a warning and
2 indicates an error state. The Nagios check for ceph_health is
updated to reflect these values

Change-Id: Iffe80f1c34f6edee6370dd7e707e5f55f83f1ec1
2018-11-06 14:51:40 -06:00
Steve Wilkerson
e0f2d66ee3 Prometheus: Update discovery configuration for ceph-mgr services
This updates the Prometheus scrape configuration to use the
service based discovery mechanism instead of endpoints. This
removes issues associated with multiple ceph-mgr replicas deployed

Change-Id: I2c557af0c7200d0c4aea646c5f9ecd1a070db33e
2018-11-06 13:56:37 -06:00
Jean-Philippe Evrard
ff1f75fc45 Do not use OSH_INFRA_PATH in osh-infra
If OSH_INFRA_PATH is never used in the openstack-helm-infra repository,
as all the references are using relative paths.

The keystone script is not using a relative path, and relies on
OSH_INFRA_PATH to be defined to work.

This is a problem, because when it is not defined, the expected path
for ldap chart is /ldap, which is an incorrect path.

This fixes the problem by ensuring the path is relative.

Change-Id: I04a8d5c074b7c1e6fa66617bbb907f2ad4dcb3af
2018-11-05 13:36:03 +00:00
Zuul
fca344900f Merge "Enable the mgr balancer module by default." 2018-11-02 22:36:13 +00:00