This adds scripts using Selenium Webdriver to verify
the dashboards for Gafana, Nagios, and Prometheus are
reachable and functioning as expected. The scripts
create screenshots of each dashboard as well as
pages that can be navigated to.
It also adds the scripts to the gates for the single
and multinode deployments.
Change-Id: I1699e0ba8ff82ce8f59342cc71aad10cff7d2516
This modifies the libvirt chart to write logs directly to the
host by default. This also modifies the fluentbit and fluentd
charts to capture libvirt logs from the host and index them into
Elasticsearch
Change-Id: I0bbc49d2c0d4cf4895f797e48f309f308ffd021f
Under POD restart conditions there is a race condition with lsblk
causing the helm chart to zap a fully working OSD disk. We refactor
the code to remove this requirement.
Additonally the new automatic journal partitioning code has a race
condition in which the same journal partition could be picked twice
for OSDs on the same node. To resolve this we share a common tmp
directory from the node to all of the OSD pods on that node.
Change-Id: I807074c4c5e54b953b5c0efa4c169763c5629062
- If a rule set in the network policy override for the calico
chart is empty, it causes the calico-settings job to fail. This
safety valve should handle the empty list gracefully.
Change-Id: I4b8a39941f05a8eb86734ff129b2d73830883236
Ceph upstream bug: https://tracker.ceph.com/issues/21142 is
impacting the availability of our sites in pipeline. Add an option
to reset the past interval metadata time on an OSDs PG to solve for
this issue if it occurs.
Change-Id: I1fe0bee6ce8aa402c241f1ad457bbf532945a530
Set rgw_override_bucket_index_max_shards to 8 (default: 0)
By default create 8 shards per a bucket with Ceph RagosGW. This allows
up to ~800k-1M objects to be in a bucket before seeing performance slow-
downs. The only downside to this change is that a directory listing for
a bucket may take slightly longer to finish.
Change-Id: I96c7ac81501a41d29927e102a6029bf432bd3d21
Expose the early logging level for calico-node.
Use conf.node.FELIX_LOGSEVERITYSCREEN to set logging level in
BGPConfiguration and FelixConfiguration (whilst this is an odd
name/location it backwards compatible and will in most cases set
things as expected).
Change-Id: I70c3028423eddb4721456f645c4475da4af7ced5
This PS implements the helm toolkit function to generate the
Egress in kubernetes network policy manifest based on overrideable values.
It also enbale the K8s network policy at Osh-infra gate.
Change-Id: Icbe2a18c98dba795d15398dcdcac64228f6a7b4c
Random job names mean `helm upgrade` or indeed anything looks for
changes from rendered templates will see changes when there are none
causing churn and restarts.
Change-Id: I59e6a60d6c4c601c5c8cecbd8238af6b7c5f389e
This reverts commit 75e0c2d0f526d29ea947e03e3d1ea2ea34a48881.
This commit was blocking chart upgrades.
Change-Id: I15aa5f507beeeadd04a0bddec241f5dd7ca272c9
This fixes the hasKey call in the pod security context snippet
template, as the call requires 2 args: a map and a key. This
addresses the problem by indexing the provided map on the
application key, before passing it to the hasKey call
Change-Id: I95264c933b51e2a8e38f63faa1e239bb3c1ebfda
This PS shares pid namespaces for containers in pods under docker,
bringing running in this runtime inline with other runc based container
backends, allowing the pause process in the pod to act as a reaper.
Change-Id: I70965a62b585de31fb953ba98189a84021dba1cb
Signed-off-by: Pete Birley <pete@port.direct>
This PS shares pid namespaces for containers in pods under docker,
bringing running in this runtime inline with other runc based container
backends, allowing the pause process in the pod to act as a reaper.
Change-Id: I1e511b1cd11a4b2f4818a772a91e8a8dfd342be3
Signed-off-by: Pete Birley <pete@port.direct>
This PS shares pid namespaces for containers in pods under docker,
bringing running in this runtime inline with other runc based container
backends, allowing the pause process in the pod to act as a reaper.
Change-Id: I43bea4cd9e91f9d27a846879dfc329cfa26f8ee7
Signed-off-by: Pete Birley <pete@port.direct>
- Use whole disk /dev/sdc format.
- Don't specify partition and let ceph-osd util create
and manage partition.
- On an OSD disk failure, during manintanance window,
Journal partition for failed OSD should be deleted.
This will allow ceph-osd util to reuse space for new partition.
- Disk partition count num will continue to
increase as more OSD fails.
Change-Id: I87522db8cabebe8cb103481cdb65fc52f2ce2b07
This adds the Decode_Field_As configuration key to the docker
parser for fluentbit. This is required to escape utf-8 encoded
characters appropriately in the log field
Change-Id: Ie2600cfe22045e3ab651fddf61ed2f676ab8a1d5
This adds a simple check to the Elasticsearch snapshot repo job
that will cause the job to fail if the repository isn't added
successfully
Change-Id: I9dca6ef545b43c52a37542319fa2f706b174c44b
This updates the Elasticsearch helm test to execute a clean on the
test index before attempting to create it, in cases where a
stranded test index may exist
Change-Id: I87533f94f6ea55b0b2f929543f8d3e75baa81bed
This ps allows multiple ceph test pods to be present in cluster with
more than one ceph deployment.
Change-Id: Ib8be8fc58e3a374dfcf6845988668433cf43655a
Signed-off-by: Pete Birley <pete@port.direct>
This ps allows multiple ceph test pods to be present in cluster with
more than one ceph deployment.
Change-Id: I002a8b4681d97ed6ab95af23e1938870c28f5a83
Signed-off-by: Pete Birley <pete@port.direct>
This PS updates the sleep function to not require dumb-init to be
present in images.
Change-Id: I9ee7270f2c101a3a85b2aecd01097a70014ea4a6
Signed-off-by: Pete Birley <pete@port.direct>