1113 Commits

Author SHA1 Message Date
Zuul
d1b77b2bea Merge "Prometheus: Update pod container status alerts" 2019-01-25 19:39:47 +00:00
Zuul
7fe287bf11 Merge "Add liveness probe to fluentd" 2019-01-25 19:39:46 +00:00
Zuul
4132b40d4f Merge "[CEPH] Setup a cronjob to run OSD defrags for FileStore" 2019-01-24 23:51:43 +00:00
Steve Wilkerson
9bb603ed0c Update Helm to version 2.12.1
This updates Helm from version 2.11.0 to 2.12.1

Change-Id: I9bc37c330b068388df9840eb84dfb12e2536c173
2019-01-23 16:43:36 -06:00
Steve Wilkerson
9f5b1a77bc Add liveness probe to fluentd
This adds a liveness probe to the fluentd chart. This probe will
simply perform a tcpSocket check on the same port the readiness
probe executes the check on.

Change-Id: I768b23d36d50d6f6938f5588bea71e97aeb624b9
2019-01-23 11:47:34 -06:00
Steve Wilkerson
87ff958fb8 Prometheus: Update pod container status alerts
This updates the Prometheus pod container status alerts. This
ensures there are alerts defined for ImagePullBackOff,
ErrImagePull, and CreateContainerConfigError errors.

This also updates the Nagios service checks to include correct
checks for those alerts

Change-Id: I91544e7dff8c6aac8c79cd8aa7d8f7bc03adaa9a
2019-01-23 16:26:39 +00:00
Stamatis Katsaounis
032740957e Pin pip to 18.1 to allow build of docker images
Task: 29045
Story: 2004843

This patch pins pip to 18.1 as the latest pip 19.0 has a problem with
--no-cache-dir option. This problem is causing the build of docker
images of mariadb and kubeadm-aio to fail when they upgrade the
setuptools package.

Change-Id: If2b76249eeacec519a6a76605607ba6f3f81ac7d
Signed-off-by: Stamatis Katsaounis <mokats@intracom-telecom.com>
2019-01-23 11:56:10 +02:00
Zuul
067a37f76f Merge "Add exception handling to Kibana Selenium test" 2019-01-22 17:18:30 +00:00
Matthew Heler
d966085321 [CEPH] Setup a cronjob to run OSD defrags for FileStore
Create a cron and associated script to run monthly OSD defrags.
When the script runs it will switch the OSD disk to the CFQ I/O
scheduler to ensure that this is a non-blocking operation for ceph.
While this cron job will run monthly, it will only execute on OSDs
that are HDD based with Filestore.

Change-Id: I06a4679e0cbb3e065974d610606d232cde77e0b2
2019-01-22 04:27:41 +00:00
Meg Heisler
1bf24051c5 Add exception handling to Kibana Selenium test
This adds exception handling to the Kibana Selenium tests
to address the test failures due to TimeoutExceptions when
the dashboard loads slowly. Only TimeoutExceptions are handled
so if there is an issue with the page itself an error will still
cause the gate to fail as intended. When a TimeoutException
occurs an error message is logged and a screenshot is taken
of the current page.

Change-Id: I16cd3a61ffce2e5fdc39bd7731cc068b8a6ec41f
2019-01-21 13:26:43 -06:00
Steve Wilkerson
cd4ec0b4b2 Grafana: Update Ceph dashboards for Mimic release
This updates the Ceph dashboards for Grafana, as some of the ceph
metrics have changed with the Mimic release.  This fixes issues
with the ceph OSD metrics that broke some Grafana panels, and also
removes the Ceph panel for displaying the number of monitors in
quorum, as that metric has been removed in Mimic

Change-Id: If6cbbfa7d2972ddd0e44b29a6c8277188d2d9ff0
2019-01-21 09:25:57 -06:00
Zuul
c09c10443a Merge "[Calico] Update TLS settings for Calico" 2019-01-19 21:57:05 +00:00
Zuul
537912e976 Merge "HTK: Dont display keystone user password in ks-user job" 2019-01-19 05:52:22 +00:00
Zuul
2c7f0cbb49 Merge "[CEPH] Fix a race condition with udev on OSD start" 2019-01-18 23:37:59 +00:00
Matthew Heler
b0da8d78d1 [CEPH] Fix a race condition with udev on OSD start
Under some conditions udev may not trigger correctly and create
the proper uuid symlinks required by Ceph. In order to work around
this we manually create the symlinks.

Change-Id: Icadce2c005864906bcfdae4d28117628c724cc1c
2019-01-18 15:03:27 -06:00
Pete Birley
f7b7f17c12 HTK: Dont display keystone user password in ks-user job
This PS updates the ks user job script to not display the password
on stdout.

Change-Id: I3c11601a409d6d5993c351170c7057217cfabd8a
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-18 20:44:20 +00:00
Dmitrii Kabanov
0c5e2c4830 [Calico] Update TLS settings for Calico
PS provides possibility to use TLS in etcd (for Calico).
The ansible scripts were updated as well.

Change-Id: I522a78043a125660153aaa60f13d61ba8e325e75
2019-01-18 19:53:46 +00:00
Steve Wilkerson
b3097f6a25 Selenium: Add "|| true" to kibana selenium execution
This temporarily adds a "|| true" suffix to the kibana
selenium script execution, as we've noticed rare cases where the
tests fail due to the paths not being ready in time. Once we have
a path forward for waiting to ensure the path is ready,
we should allow for periodic failures of the kibana selenium tests

Change-Id: I6c406ad8907cc87425562dee56eec6b8a0502142
2019-01-18 11:22:29 -06:00
Zuul
3d9a78c80a Merge "HTK: dont set keystone password on user creation" 2019-01-18 06:54:01 +00:00
Pete Birley
7f1d3be62e HTK: dont set keystone password on user creation
This PS updates the Keystone user job script to not
set the user password upon potential creation, falling back
to the set password command later in the script. This is both
slighly cleaner, and avoids potential race conditions when
running multiple keystone servers.

Change-Id: Ibe775df23fe7b747aea5137ca85975e067b8cea3
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-17 19:16:17 -06:00
Zuul
958127477d Merge "Additional Selenium tests for Kibana dashboard" 2019-01-17 23:46:14 +00:00
Zuul
9ff0a5afdf Merge "Nagios: Update logging, add readiness probe" 2019-01-17 19:56:50 +00:00
Meg Heisler
9289cd0987 Additional Selenium tests for Kibana dashboard
This helps verify Kibana is working properly by using
Selenium Webdriver to navigate to different index dashboards
and takes screenshot of each one. It also add the scripts to
the gates for single and multinode deployments.

Change-Id: Ic2c91734d1eaac0ea4e7985bf69082942166715d
2019-01-17 11:24:19 -06:00
Steve Wilkerson
046742c9c6 Nagios: Update logging, add readiness probe
This updates the Nagios chart configuration to not use syslog for
logging, removes the logging of notifications, and drastically
increases the number of concurrent checks executed.

This also removes the hostPath for Nagios logs, as it seems to add
no value over what's already reported to the console.  Finally, as
Nagios's log file has the potential to grow very rapidly while the
service has no means to disable logging to disk, this adds a
readiness probe that both checks whether Nagios's endpoint is
being served and clears out the log file by redirecting the
no-op commands output to the nagios log file.

Change-Id: I81151c48ef4e0b7877f595c271f55b8fd479e8c1
2019-01-17 11:12:16 -06:00
Zuul
17581c1c19 Merge "Handle root vHost declaration" 2019-01-17 16:14:15 +00:00
Zuul
379d918a20 Merge "Update Elasticsearch health status expressions" 2019-01-16 21:14:44 +00:00
Bryan Strassner
34b2a965cb Handle root vHost declaration
If the source chart does not declare a vHost value, or uses the value of
"/", the script would fail upon trying to declare the vhost. This change
avoids the declaration of the "/" vhost, and continues with setting the
specified user with permissions to "/"

Change-Id: I28619c0aef22049c632c92a2f9a9d3831f8c284c
2019-01-16 13:34:40 -06:00
Steve Wilkerson
9e5a295465 Update Elasticsearch health status expressions
This updates the Elasticsearch health status expressions used in
Prometheus, Nagios and Grafana.  The previous Prometheus rule
defined for Elasticsearch health checked for a status that was
> 0 to trigger an alarm for a green health status. The correct
returned values are: 1 for green, 0 for both red and yellow. This
changes the expression to use arithmetic operators to give us a
result that maps to: 2 for green, 1 for yellow, 0 for red.

This also updates the Elasticsearch dashboard in Grafana to add a
new mapping for the updated 2g,1y,0r scale.

Finally, this also updates the Nagios service check to be a bit
more verbose in its output.

For reference, see:
https://github.com/justwatchcom/elasticsearch_exporter/issues/120

Change-Id: I6ef2a7c308c6ebfdb693b46127a285bceb6ba872
2019-01-16 11:11:59 -06:00
Steve Wilkerson
00b40480a3 Nagios: Fix elasticsearch query clause volume mount
This fixes the Nagios volume mount for the Elasticsearch query
file. Previously, the check for adding the volumemount to the
pod definition was incorrect. This fixes the conditional check,
and also adds the same conditional check to the configuration
secret

This adds a simple check to the monitoring and multinode jobs to
validate the resulting json gets mounted into the pod successfully

Change-Id: I2af289ccc4e1cff1669cb5e6e829514781b14dd3
2019-01-15 16:18:01 -06:00
Zuul
6bd70a9fc6 Merge "Gate: simplify playbooks" 2019-01-14 17:55:19 +00:00
Zuul
1509383894 Merge "Foundation for LMA docs" 2019-01-14 08:10:03 +00:00
Pete Birley
a8b3787a1b Gate: simplify playbooks
This PS simplifies the gate playbooks

Change-Id: I4dd7c892090f8eec10edf083b68dc3e3cc99ece9
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-14 07:55:32 +00:00
Zuul
e7d169f62a Merge "Fluentd: Update buffer output settings for Elasticsearch" 2019-01-14 00:03:09 +00:00
Zuul
adaa773598 Merge "Ceph : cleanup ceph charts" 2019-01-13 04:09:23 +00:00
Zuul
defeed8952 Merge "Fix for ceph-osd regression" 2019-01-13 04:00:11 +00:00
Zuul
898752bdb1 Merge "Helm-toolkit: Check radosgw endpoint scheme for bucket creation" 2019-01-13 01:17:21 +00:00
Zuul
10c9651601 Merge "Basic support for BGP communities in calico" 2019-01-12 23:16:58 +00:00
Steve Wilkerson
2483d35640 Helm-toolkit: Check radosgw endpoint scheme for bucket creation
This updates the helm-toolkit s3 bucket creation script and job
manifest to account for situations where the radosgw endpoint
might require the --no-ssl flag. The update checks for the
radosgw endpoint scheme to determine whether to use the flag in
order to preserve previous behavior

Change-Id: I75f441f55ca29b7864c09c70d875e48b366ebf52
2019-01-12 13:58:52 -06:00
kranthi guttikonda
6771440f4a Fix for ceph-osd regression
When ceph-osd journal as a directory and data as
 a block device ceph-osd fails to deploy while
waiting for the journal file in
/var/lib/ceph/journal/journal.<id>

Added the condition before checking bluestore for
directory and removed the same later in the script

Closes-Bug: #1811154
Change-Id: Ibd4cf0be5ed90dfc4de5ffab554a91da1b62e5f4
Signed-off-by: Kranthi Guttikonda <kranthi.guttikonda@b-yond.com>
Signed-off-by: kranthi guttikonda <kranthi.guttikonda9@gmail.com>
2019-01-11 18:11:23 -05:00
Chinasubbareddy M
1abcde851e Ceph : cleanup ceph charts
This is  to clean ceph charts for unused variables and left over from
ceph chart split

Change-Id: Iec50599a031ae7acacc8eb0504f7146647450306
2019-01-11 19:53:36 +00:00
Zuul
ded5de14fa Merge "Running agents on all nodes." 2019-01-11 04:15:27 +00:00
Steve Wilkerson
181d7ebb34 Fluentd: Update buffer output settings for Elasticsearch
This updates the fluentd configuration to use 8 threads for the
Elasticsearch output configuration by default. This uses the
correct buffer output settings for the fluent-elasticsearch
plugin

This also updates the buffer output settings to the defaults used
for fluentd

Change-Id: I976cddaa973e850dabe4de495cd3bf1a4acdd4e7
2019-01-10 14:51:41 -06:00
Michael Beaver
e34270c51e Basic support for BGP communities in calico
This creates a new section in calico/values.yaml that enables
BGP communities to be applied to a cidr by using the bird_ipam
templates.

Change-Id: I4dbbc8d8e761e0484eeb7c8bf0fefa28d29493e5
2019-01-10 14:02:16 -06:00
Sungil Im
b9e864a456 Running agents on all nodes.
Using a node selector can not run the prometheus-process-exporter
on the master node. So, This PS changes the scheduling to use
either taint/toleration or the node selector.

Change-Id: Ie84b2d2e0354fa927c1010c18392667dad171483
2019-01-10 05:46:53 -05:00
lijunjie
32b3ac3723 Fix the misspelling of "argument"
Change-Id: If78a27fe0d28a60d3dbbe0ee21d8209b2cfd633c
2019-01-10 16:41:17 +08:00
Zuul
730e7811c2 Merge "Add PodSecurityPolicy chart" 2019-01-10 07:24:16 +00:00
Evgeny L
8662018a4d Fix json parsing error for rally config
Change-Id: If573af721df73dd791bbf3b9bd5272ae8453aaa5
2019-01-09 15:25:25 +00:00
Zuul
f743caa254 Merge "Fix rally deployment config to rally 1.3.0" 2019-01-09 06:16:01 +00:00
Zuul
13124286e2 Merge "Kibana: Include kernel and journal indexes in register job" 2019-01-08 23:51:28 +00:00
Zuul
2a3740f349 Merge "[CEPH] Directory OSD regression fix" 2019-01-08 22:17:50 +00:00