1002 Commits

Author SHA1 Message Date
Meg Heisler
c3bef9e88f Selenium Tests for OSH Infra
This adds scripts using Selenium Webdriver to verify
the dashboards for Gafana, Nagios, and Prometheus are
reachable and functioning as expected. The scripts
create screenshots of each dashboard as well as
pages that can be navigated to.

It also adds the scripts to the gates for the single
and multinode deployments.

Change-Id: I1699e0ba8ff82ce8f59342cc71aad10cff7d2516
2019-01-07 15:59:42 -06:00
Steve Wilkerson
281b0799f0 Write libvirt logs to host
This modifies the libvirt chart to write logs directly to the
host by default. This also modifies the fluentbit and fluentd
charts to capture libvirt logs from the host and index them into
Elasticsearch

Change-Id: I0bbc49d2c0d4cf4895f797e48f309f308ffd021f
2018-12-28 17:43:12 +00:00
Zuul
13a58c5530 Merge "[Calico] Update to v3.2.4" 2018-12-27 20:07:16 +00:00
Zuul
5cca3e74d4 Merge "[CEPH] Fix race conditions with OSD POD initialization" 2018-12-24 22:48:53 +00:00
Matthew Heler
89745aad06 [Ceph] Update rbd-provisioner and cephfs-provisioner
- Move from docker tag v0.1.1 to v1.1.0-k8s1.10

Change-Id: I5a2afbdeb87c732a17da64916de8bb301f12cbb3
2018-12-22 17:31:29 +00:00
Matthew Heler
30b57ba671 [CEPH] Fix race conditions with OSD POD initialization
Under POD restart conditions there is a race condition with lsblk
causing the helm chart to zap a fully working OSD disk. We refactor
the code to remove this requirement.

Additonally the new automatic journal partitioning code has a race
condition in which the same journal partition could be picked twice
for OSDs on the same node. To resolve this we share a common tmp
directory from the node to all of the OSD pods on that node.

Change-Id: I807074c4c5e54b953b5c0efa4c169763c5629062
2018-12-21 15:05:54 -06:00
Zuul
0513c779bd Merge "(calico) Add network policy safety valve" 2018-12-21 08:06:29 +00:00
Zuul
5ea831964e Merge "[gnocchi] don't randomize job names" 2018-12-21 04:34:38 +00:00
Scott Hussey
048b18a50f (calico) Add network policy safety valve
- If a rule set in the network policy override for the calico
  chart is empty, it causes the calico-settings job to fail. This
  safety valve should handle the empty list gracefully.

Change-Id: I4b8a39941f05a8eb86734ff129b2d73830883236
2018-12-20 11:02:32 -06:00
Chris Wedgwood
41508d39e2 [Calico] Update to v3.2.4
Upstream container updates only, no chart changes required.

Change-Id: I3cdc6f23269a5beac231575ac1b5faf654e424b7
2018-12-19 17:18:32 +00:00
Matthew Heler
e1a3819a0d [CEPH] Support a troubleshooting option to reset PG metadata
Ceph upstream bug: https://tracker.ceph.com/issues/21142 is
impacting the availability of our sites in pipeline. Add an option
to reset the past interval metadata time on an OSDs PG to solve for
this issue if it occurs.

Change-Id: I1fe0bee6ce8aa402c241f1ad457bbf532945a530
2018-12-18 23:26:18 -06:00
Zuul
4233c25308 Merge "[Ceph] Tunables for rgw buckets" 2018-12-18 17:37:10 +00:00
Zuul
1b0d47bb01 Merge "[Ceph] Update ceph helm tests" 2018-12-17 18:23:51 +00:00
Matthew Heler
54efa7922d [Ceph] Tunables for rgw buckets
Set rgw_override_bucket_index_max_shards to 8 (default: 0)

By default create 8 shards per a bucket with Ceph RagosGW. This allows
up to ~800k-1M objects to be in a bucket before seeing performance slow-
downs. The only downside to this change is that a directory listing for
a bucket may take slightly longer to finish.

Change-Id: I96c7ac81501a41d29927e102a6029bf432bd3d21
2018-12-16 19:35:00 +00:00
Zuul
bc32affe0c Merge "[Calico] Logging fixes/updates" 2018-12-16 16:52:46 +00:00
Zuul
6d354f0f7b Merge "Revert "Add Egress Helm-toolkit function & enforce the nework policy at OSH-INFRA"" 2018-12-16 08:57:09 +00:00
Chris Wedgwood
3f79066797 [Calico] Logging fixes/updates
Expose the early logging level for calico-node.

Use conf.node.FELIX_LOGSEVERITYSCREEN to set logging level in
BGPConfiguration and FelixConfiguration (whilst this is an odd
name/location it backwards compatible and will in most cases set
things as expected).

Change-Id: I70c3028423eddb4721456f645c4475da4af7ced5
2018-12-16 07:21:31 +00:00
Pete Birley
0bf3674539 Revert "Add Egress Helm-toolkit function & enforce the nework policy at OSH-INFRA"
This reverts commit 8d33a2911cda0c9e88406b9eeacbd8dfa70286f2.

Change-Id: Ic861b9bf9b337449b47a3558da8355e7a5bcacee
2018-12-16 04:21:46 +00:00
Zuul
158c223256 Merge "Ceph: Allow multiple test pods for ceph-client to be present in clusters" 2018-12-16 03:30:03 +00:00
Matthew Heler
de69c68365 [Ceph] Update ceph helm tests
- Ensure the helm tests are logging all commands and variables

Change-Id: I4f4c553a3fbb4d77e9d1ab41c1c0c763c963cfd3
2018-12-15 13:47:43 -06:00
Zuul
b90bf10b89 Merge "Add Egress Helm-toolkit function & enforce the nework policy at OSH-INFRA" 2018-12-15 09:32:21 +00:00
Mike Pham
8d33a2911c Add Egress Helm-toolkit function & enforce the nework policy at OSH-INFRA
This PS implements the helm toolkit function to generate the
Egress in kubernetes network policy manifest based on overrideable values.
It also enbale the K8s network policy at Osh-infra gate.

Change-Id: Icbe2a18c98dba795d15398dcdcac64228f6a7b4c
2018-12-14 16:32:40 -05:00
Zuul
96ef3188aa Merge "Revert "helm-toolkit: Support standard kubernetes/helm labels"" 2018-12-14 20:36:42 +00:00
Zuul
ef0f26988f Merge "Helm-toolkit: Fix hasKey call for security context snippet" 2018-12-14 20:01:53 +00:00
Zuul
b76acd6dd6 Merge "Ceph: Journal partition automation" 2018-12-14 18:37:15 +00:00
Chris Wedgwood
6a6b9db2da [gnocchi] don't randomize job names
Random job names mean `helm upgrade` or indeed anything looks for
changes from rendered templates will see changes when there are none
causing churn and restarts.

Change-Id: I59e6a60d6c4c601c5c8cecbd8238af6b7c5f389e
2018-12-14 18:26:04 +00:00
Matthew Heler
f9d3c16d1c Revert "helm-toolkit: Support standard kubernetes/helm labels"
This reverts commit 75e0c2d0f526d29ea947e03e3d1ea2ea34a48881.

This commit was blocking chart upgrades.

Change-Id: I15aa5f507beeeadd04a0bddec241f5dd7ca272c9
2018-12-14 18:00:55 +00:00
Steve Wilkerson
2f46948259 Helm-toolkit: Fix hasKey call for security context snippet
This fixes the hasKey call in the pod security context snippet
template, as the call requires 2 args: a map and a key. This
addresses the problem by indexing the provided map on the
application key, before passing it to the hasKey call

Change-Id: I95264c933b51e2a8e38f63faa1e239bb3c1ebfda
2018-12-14 10:32:21 -06:00
Zuul
80d5b4a84e Merge "Update documentation for recent failure domain options in Ceph" 2018-12-14 15:48:37 +00:00
Pete Birley
99f5fe22f2 Ingress: Share container PID namespaces under docker
This PS shares pid namespaces for containers in pods under docker,
bringing running in this runtime inline with other runc based container
backends, allowing the pause process in the pod to act as a reaper.

Change-Id: I70965a62b585de31fb953ba98189a84021dba1cb
Signed-off-by: Pete Birley <pete@port.direct>
2018-12-14 04:52:44 +00:00
Pete Birley
6a9c16862a OpenvSwitch: Share container PID namespaces under docker
This PS shares pid namespaces for containers in pods under docker,
bringing running in this runtime inline with other runc based container
backends, allowing the pause process in the pod to act as a reaper.

Change-Id: I1e511b1cd11a4b2f4818a772a91e8a8dfd342be3
Signed-off-by: Pete Birley <pete@port.direct>
2018-12-14 04:52:39 +00:00
Pete Birley
2da8ad396a Memcached: Share container PID namespaces under docker
This PS shares pid namespaces for containers in pods under docker,
bringing running in this runtime inline with other runc based container
backends, allowing the pause process in the pod to act as a reaper.

Change-Id: I43bea4cd9e91f9d27a846879dfc329cfa26f8ee7
Signed-off-by: Pete Birley <pete@port.direct>
2018-12-14 04:52:22 +00:00
Zuul
62dce1852e Merge "Increase the cpu and memory resource limits for Ceph OSDs" 2018-12-14 01:52:57 +00:00
Zuul
f81e2c54d1 Merge "Update Ceph-rgw helm tests" 2018-12-13 22:39:22 +00:00
Matthew Heler
b18dd351d2 Update documentation for recent failure domain options in Ceph
Change-Id: Id6707566753b834da598b62f630f4caafe8ee234
2018-12-13 13:05:52 -06:00
Zuul
23967559a6 Merge "Add securityContext helm-toolkit function" 2018-12-13 18:24:10 +00:00
Renis Makadia
458b8f6692 Update Ceph-rgw helm tests
Change-Id: I7b328da18ef10840baf8454e2fb3abaeeb542068
2018-12-13 11:21:13 -06:00
Renis Makadia
17df1c5df5 Ceph: Journal partition automation
- Use whole disk /dev/sdc format.
- Don't specify partition and let ceph-osd util create
and manage partition.
- On an OSD disk failure, during manintanance window,
Journal partition for failed OSD should be deleted.
This will allow ceph-osd util to reuse space for new partition.
- Disk partition count num will continue to
increase as more OSD fails.

Change-Id: I87522db8cabebe8cb103481cdb65fc52f2ce2b07
2018-12-13 16:37:15 +00:00
Zuul
6589af54db Merge "Fluentbit: Add Decode_Field_As config to docker parser" 2018-12-13 01:35:59 +00:00
Zuul
d984f3c782 Merge "Elasticsearch: Define success criteria for adding snapshot repo" 2018-12-12 23:23:48 +00:00
Zuul
fe24873310 Merge "Elasticsearch: Update helm test" 2018-12-12 22:30:58 +00:00
Steve Wilkerson
f4e10f8839 Fluentbit: Add Decode_Field_As config to docker parser
This adds the Decode_Field_As configuration key to the docker
parser for fluentbit. This is required to escape utf-8 encoded
characters appropriately in the log field

Change-Id: Ie2600cfe22045e3ab651fddf61ed2f676ab8a1d5
2018-12-12 22:24:09 +00:00
Zuul
ef2e415ec8 Merge "Ingress: Remove server headers from response" 2018-12-12 22:13:53 +00:00
Steve Wilkerson
7be42d3cd5 Elasticsearch: Define success criteria for adding snapshot repo
This adds a simple check to the Elasticsearch snapshot repo job
that will cause the job to fail if the repository isn't added
successfully

Change-Id: I9dca6ef545b43c52a37542319fa2f706b174c44b
2018-12-12 14:44:49 -06:00
Steve Wilkerson
d3e046d803 Elasticsearch: Update helm test
This updates the Elasticsearch helm test to execute a clean on the
test index before attempting to create it, in cases where a
stranded test index may exist

Change-Id: I87533f94f6ea55b0b2f929543f8d3e75baa81bed
2018-12-12 12:43:13 -06:00
Pete Birley
5695a15f93 Ceph: Allow multiple test pods for ceph-client to be present in clusters
This ps allows multiple ceph test pods to be present in cluster with
more than one ceph deployment.

Change-Id: Ib8be8fc58e3a374dfcf6845988668433cf43655a
Signed-off-by: Pete Birley <pete@port.direct>
2018-12-12 10:03:15 -06:00
Pete Birley
c256cce537 Ceph: Allow multiple test pods to be present in clusters
This ps allows multiple ceph test pods to be present in cluster with
more than one ceph deployment.

Change-Id: I002a8b4681d97ed6ab95af23e1938870c28f5a83
Signed-off-by: Pete Birley <pete@port.direct>
2018-12-12 07:29:01 -06:00
Zuul
7f1ad7b03c Merge "Ingress: Update sleep function to not require dumb-init" 2018-12-11 22:01:30 +00:00
Zuul
d93c591e9e Merge "Elasticsearch: Remove default Curator action configuration" 2018-12-11 20:58:50 +00:00
Pete Birley
337ac99234 Ingress: Update sleep function to not require dumb-init
This PS updates the sleep function to not require dumb-init to be
present in images.

Change-Id: I9ee7270f2c101a3a85b2aecd01097a70014ea4a6
Signed-off-by: Pete Birley <pete@port.direct>
2018-12-11 12:53:38 -06:00