1157 Commits

Author SHA1 Message Date
Steve Wilkerson
3614d025dc Fluentbit: Remove database used in tail inputs
This updates the fluentbit configuration for tail inputs to remove
the values for utilizing mysqlite databases to track its location
in each file it's configured to tail.  This is intended to reduce
the pressure fluentbit exerts on the host through writing to
/var/log/foo.db. To help mitigate large amounts of traffic
sent from fluentbit to fluentd upon a pod restart, this also
adds a throttle filter to fluentbit.

As a result, Fluentbit no longer needs a writable mount to its
hostPath on /var/log on the host.  Thus, this change includes
updating the Fluentbit daemonset's mount on /var/log to be
readOnly

Change-Id: If4381f4ff47e887f3ea10beded4f6172edaf08ba
2019-02-01 16:56:31 +00:00
Steve Wilkerson
25e4e5662e Update network-policy ldap deployment and test
This updates the script for deploying ldap in the network policy
job to accept ingress traffic from prometheus pods.

This also updates the network policy test to account for return
values with more than one result when checking for a pod to use,
as well as selecting pods by application and component labels
instead of simply grepping for a name (as this could cause issues
with grepping for 'fluentd', when that could return both fluentd
and fluentd-exporter pods, for example)

Change-Id: I12a4029f574ea7d5b250709adef21b07d8cf0220
2019-01-31 21:29:40 +00:00
Zuul
6ef3f58fb8 Merge "Add pre-fixes to the Selenium jobs and remove "|| true"" 2019-01-31 20:39:40 +00:00
Zuul
b30012a616 Merge "[CEPH] Fixes for the OSD defrag cronjob" 2019-01-31 16:05:14 +00:00
Matthew Heler
fc76091261 [CEPH] Fixes for the OSD defrag cronjob
Fix a naming issue with the cronjob's binary, and schedule the cron
job to run every 15 minutes for the gates. Additonally check to
to ensure we are only running on block devices. Also update the
script to work with ceph-volume created devices.

Change-Id: I8aedab0ac41c191ef39a08034fff3278027d7520
2019-01-31 06:13:05 -06:00
Deokjin Kim
cbb9ec0748 Fix calling wrong variable name in gnocchi
Checking test_version seems right. test_mimic is not existing.

Change-Id: I2cbfed0f7da0b22eb753ed7bce833872a7ff707f
Signed-off-by: Deokjin Kim <deokjin81.kim@samsung.com>
2019-01-31 00:34:21 +00:00
Zuul
c3a8063fdb Merge "Fluentd: remove unused configuration section" 2019-01-30 23:30:50 +00:00
Zuul
3bd3b70e51 Merge "[Calico] Configuration robustness improvements" 2019-01-30 22:16:55 +00:00
Steve Wilkerson
f01e9d2391 Fluentd: remove unused configuration section
This removes an unused section of configuration for fluentd, as
well as cleans up the values for filtering fluentd logs

Change-Id: I0c58d3ac236af7723c64c3b9fcba877736b1f606
2019-01-30 16:03:59 -06:00
Chris Wedgwood
b7b7c5ea44 [alertmanager] default to 1 replica, multinode gate uses 3
Change-Id: Ifb1420f8dcf7237349a79f1f97aea5e547bafeab
2019-01-30 08:43:18 +00:00
Chris Wedgwood
47a2da5af0 [Calico] Configuration robustness improvements
No longer use networking.settings.ippool.ipip.mode, rather take from
conf.node.CALICO_IPV4POOL_IPIP (this avoids duplication and
possibility of setting them differently).

Logging values previously required Titlecase in some places, lower in
others (and it changed across versions); have the chart DTRT where it
matters to avoid configuration problems.

Change-Id: Idb7ccb5be8f9e1cb184ed86a9fd0875704912564
2019-01-30 06:33:22 +00:00
Zuul
33178a529d Merge "Fluentd: Remove unused liveness port" 2019-01-30 04:12:48 +00:00
Zuul
8028bcb641 Merge "[tiller] Disable monitoring by default, enable in gate" 2019-01-30 04:12:47 +00:00
Zuul
0963980b51 Merge "[Prometheus] Relax disk IO constraints" 2019-01-30 04:12:46 +00:00
Zuul
ba68a8c745 Merge "[Prometheus] Fix filesystem space checks" 2019-01-30 04:12:45 +00:00
Zuul
3fa8fbea1a Merge "[ingress] explicitly specify the Prometheus scrape port" 2019-01-30 04:12:44 +00:00
Meg Heisler
98fbc9a1e2 Add pre-fixes to the Selenium jobs and remove "|| true"
This adds xxx-job name prefixes to the Selenium jobs for consistency

This will also remove the "|| true" suffix that was added temporarily to
ensure the Kibana selenium job did not error. The fix for the issue
was merged so the quick fix is no longer needed and may prevent an
error when an issue actually occurs.
Change-Id: I16881974cbf618b31813964b17c090dbfe33fe51
2019-01-29 20:24:57 -06:00
Pete Birley
bf4713f04b HTK: Support tls secrets on non-fqdn overridden hosts in ingress
This PS adds support for tls secrets on non-fqdn overriden hosts
in ingress rules.

Change-Id: I134af614e7c2ac3fae6eba2bc4bda9f8b41f7f78
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-29 23:34:18 +00:00
Zuul
a6aabe0feb Merge "Liveness probes for OpenVSwitch daemons." 2019-01-29 23:06:07 +00:00
Steve Wilkerson
39410b16bc Fluentd: Remove unused liveness port
This removes an unused port for a previous implementation of the
fluentd liveness probe

Change-Id: I80367bcf6fedc75b3ee7054eba9c382fbb4bc79d
2019-01-29 14:31:50 -06:00
Zuul
4aca509aaf Merge "[CEPH] Clean up PG troubleshooting option specific to Luminous" 2019-01-29 20:23:53 +00:00
Hemachandra Reddy
aef0ff7810 Liveness probes for OpenVSwitch daemons.
Uses ovs-vsctl for ovs-db
Uses ovs-appctl for ovs-vswitchd as "ovs-vsctl show" does not
talk to ovs-vswitchd.

Change-Id: Ia0b84e3546ff1693676ca61370e1344d75b6e308
2019-01-29 20:10:41 +00:00
Zuul
6051d5e450 Merge "Helm-Toolkit: Make ingress manifest work for more than public endpoints" 2019-01-29 20:06:01 +00:00
Chris Wedgwood
a6fa47eea5 [tiller] Disable monitoring by default, enable in gate
Change-Id: Idb7a1f0046e96261a7042d30eedfaea031b27209
2019-01-29 18:57:58 +00:00
Matthew Heler
f48c365cd3 [CEPH] Clean up PG troubleshooting option specific to Luminous
Clean up the PG troubleshooting method that was needed for
Luminous images. Since we are now on Mimic, this function is now
not needed.

Change-Id: Iccb148120410b956c25a1fed5655b3debba3412c
2019-01-29 18:57:23 +00:00
Zuul
7b5d6e9237 Merge "OSH-Infra: Update multinode and aio-monitoring/logging jobs" 2019-01-29 17:03:18 +00:00
Zuul
2de223b863 Merge "Add proxy support to Minikube gate script" 2019-01-29 17:03:17 +00:00
Pete Birley
3eb0517fc9 Helm-Toolkit: Make ingress manifest work for more than public endpoints
This PS enables the ingress manifest function to work for all endpoints
rather than just public.

Change-Id: I3b454bb24a763f51896e845b767fd9d28f5b07dc
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-29 08:53:06 -06:00
Chris Wedgwood
d7808468fc [Prometheus] Relax disk IO constraints
Relax the timing constrains for disk IO to accommodate rotating disks;
a "measured IO" might be the result of a small number of physical IOs,
allow for enough time for a small number of disk rotations (this isn't
perfect but seems to be about right in testing under load).

Change-Id: Ifb067a2218528e5918d2f4b2ba169b6e739084e0
2019-01-29 06:41:51 +00:00
Chris Wedgwood
4fb6ee6e35 [Prometheus] Fix filesystem space checks
Change-Id: Id527ea6e08070cb7d2634417a7c203c1c5c3d97c
2019-01-29 06:34:54 +00:00
Chris Wedgwood
03ee843b22 [ingress] explicitly specify the Prometheus scrape port
Change-Id: I9e191257c436ca6ab74d013feb07bb0ffed2d532
2019-01-29 04:42:26 +00:00
Pete Birley
0a077f8996 HTK: update fqdn hostname lookup to support host keys
This PS adds support for maps containing `host` for use within
the endpoint host lookup functions as well as a simple string

Change-Id: Ifddfb935bf12510a8b8fac25a4a18b4314845230
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-28 15:32:51 -06:00
Pete Birley
633d99c2ff HTK: Update host and port function to call correct host function
This PS updates the host and port function to call the correct
host function to allow ip addresses to be rendered if required.

Change-Id: I55c91bd911875b537a54ac76cda03a126649af80
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-28 12:14:20 -06:00
Drew Walters
fd7add74ac Add proxy support to Minikube gate script
This commit introduces proxy support to the Minikube gate script by
leveraging existing `HTTP_PROXY`, `HTTPS_PROXY`, and `NO_PROXY`
environment variables.  Additionally, this adds the ability to interpret
DNS nameservers when running behind a proxy server and use those in
`/etc/resolv.conf` over the Google DNS servers.

Change-Id: I508dd00fb7df33945e8ee96af250a8eff9db389a
2019-01-28 11:51:12 -06:00
Pete Birley
26fd3f6be3 HTK: support a map for endpoint host lookups
This PS adds support for maps containing `host` for use within
the endpoint host lookup functions as well as a simple string

Change-Id: I21818676e3e907452912b7c7e3c5765e53aebc64
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-28 11:21:29 -06:00
Pete Birley
52f6591f70 HTK: update .gitignore to exclude htk development files
This PS updates the .gitignore to not add the files commonly used
for htk development by default.

Change-Id: Ic7b3711c3311ecef43b55342ae487078b5e004de
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-28 10:56:35 -06:00
Pete Birley
bf3871b739 HTK: support a map for hostname lookups
This PS adds support for maps containing `host` for use within
the hostname lookup functions as well as a simple string.

Change-Id: I6fc5ebfb349c6581d40fe2d8723771d16ba1f9ec
Signed-off-by: Pete Birley <pete@port.direct>
2019-01-28 10:15:54 -06:00
Zuul
f0f1b57b3c Merge "[CEPH] Journal automation and disk cleanup updates" 2019-01-28 06:05:45 +00:00
Zuul
404930ae75 Merge "Helm: Update version to 2.12.3" 2019-01-26 23:45:13 +00:00
Zuul
901fe70298 Merge "Pentest - NC1.0 K8S –Security HTTP Headers Not Present – TCP 6443" 2019-01-26 07:03:51 +00:00
Zuul
d1b77b2bea Merge "Prometheus: Update pod container status alerts" 2019-01-25 19:39:47 +00:00
Zuul
7fe287bf11 Merge "Add liveness probe to fluentd" 2019-01-25 19:39:46 +00:00
Zuul
4132b40d4f Merge "[CEPH] Setup a cronjob to run OSD defrags for FileStore" 2019-01-24 23:51:43 +00:00
Matthew Heler
61b93c6b46 [CEPH] Journal automation and disk cleanup updates
Refactor the OSD Block initialization code that performs clean ups
to use all the commands that ceph-disk zap uses.

Extend the functionality when an OSD initializes to create journal
partitions automatically. For example if /dev/sdc3 is defined as a
journal disk, the chart will automatically create that partition.
The size of the journal partition is determined by the
osd_journal_size that is defined in ceph.conf.

Change the OSD_FORCE_ZAP option to OSD_FORCE_REPAIR to automatically
recreate/self-heal Filestore OSDs. This option will now call a
function to repair a journal disk, and recreate partitions. One
caveat to this, is that the device paritions must be defined (ex.
/dev/sdc1) for a journal. Otherwise the OSD is zapped and re-created
if the whole disk (ex. /dev/sdc) is defined as the journal disk.

Change-Id: Ied131b51605595dce65eb29c0b64cb6af979066e
2019-01-24 11:47:30 -06:00
Steve Wilkerson
ea0b94a052 Helm: Update version to 2.12.3
This updates Helm from version 2.12.1 to 2.12.3

Change-Id: Ie85e4ae3b55ce2e9f67a5e67af1e785540a6637e
2019-01-24 09:09:00 -06:00
Steve Wilkerson
9bb603ed0c Update Helm to version 2.12.1
This updates Helm from version 2.11.0 to 2.12.1

Change-Id: I9bc37c330b068388df9840eb84dfb12e2536c173
2019-01-23 16:43:36 -06:00
Jagan Kavva
c49207819e Pentest - NC1.0 K8S –Security HTTP Headers Not Present – TCP 6443
The server should send an X-Content-Type-Options: nosniff to make sure
the browser does not try to detect a different Content-Type than what is
actually sent (can lead to XSS).

Additionally the server should send an X-Frame-Options: deny to protect
against drag'n drop clickjacking attacks in older browsers.

Change-Id: I779c519cf75bbee23d3a8348291c0fd053e61e4e
2019-01-23 16:21:32 -06:00
Steve Wilkerson
9f5b1a77bc Add liveness probe to fluentd
This adds a liveness probe to the fluentd chart. This probe will
simply perform a tcpSocket check on the same port the readiness
probe executes the check on.

Change-Id: I768b23d36d50d6f6938f5588bea71e97aeb624b9
2019-01-23 11:47:34 -06:00
Steve Wilkerson
87ff958fb8 Prometheus: Update pod container status alerts
This updates the Prometheus pod container status alerts. This
ensures there are alerts defined for ImagePullBackOff,
ErrImagePull, and CreateContainerConfigError errors.

This also updates the Nagios service checks to include correct
checks for those alerts

Change-Id: I91544e7dff8c6aac8c79cd8aa7d8f7bc03adaa9a
2019-01-23 16:26:39 +00:00
Steve Wilkerson
1e40765d88 OSH-Infra: Update multinode and aio-monitoring/logging jobs
This proposes moving the multinode job to a periodic job to
match the approach used in the openstack-helm repo.

This also adds the openstack-exporter to the aio monitoring job as
it was previously missing.

This also proposes moving the aio-logging and aio-monitoring jobs
to voting

Change-Id: Idcd4544e03facdcd2430683b66bd80c79e73a372
2019-01-23 08:49:48 -06:00