1954 Commits

Author SHA1 Message Date
Steve Wilkerson
8573957fce Minikube: Expose Tiller http port for metrics
This updates the Minikube deployment to patch the tiller-deploy
service to add a port definition for the http (44135) port for
tiller, which is used to expose metrics for Prometheus to scrape

Change-Id: I2eb5d4001c37935674ce64012b2744030addc127
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-08-07 13:25:23 -05:00
Steve Wilkerson
443832a8fd Remove stale images from openstack-helm-infra
This removes the artifacts associated with images for libvirt,
mariadb, and vbmc from openstack-helm-infra as these images now
live in openstack-helm-images.

Change-Id: I5c97d2db89068c71ec1a56a5ac17007682711182
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-08-07 08:56:51 -05:00
Zuul
b310caef4f Merge "Grafana: Code for Calico Dashboard" 2019-08-06 21:39:48 +00:00
Zuul
4a8f788532 Merge "Generate CA crt and key if needed" 2019-08-06 18:14:08 +00:00
Hussey, Scott (sh8121)
9c27dd7576 (postgresql) Cert auth for replication connections
- Change the Postgres configuration to use x509 client
  certs for authenticating the connections for replicating
  between Patroni nodes. This is a straightforward solution
  for support credential rotation for the replication user.
  Password authentication is problematic due to the declartive
  nature of helm charts and requiring an existing replication
  connection to replicate the rotated password.

Change-Id: I0c5456a01b3a36fee8ee4c986d25c4a1d807cb77
2019-08-06 00:03:54 -05:00
Zuul
8f749dd061 Merge "RabbitMQ: Dont remove definitions.json and erlang cookie when resetting" 2019-08-02 15:03:18 +00:00
Pete Birley
eef8ea131a RabbitMQ: Dont remove definitions.json and erlang cookie when resetting
This PS udpated the reset node function to leave the assets generated
via init containers in place when resetting the node.

Change-Id: Iac52ca82e95bb372dbcbca0eeea3b262215e9c12
Signed-off-by: Pete Birley <pete@port.direct>
2019-08-02 02:05:00 +00:00
Steve Wilkerson
bc20c6c8b6 Elasticsearch: Add cron job to verify snapshot repositories
This adds a cron job to manually verify all snapshot repositories
are registered to any active master and data nodes. This is to
address scenarios where master and data nodes do not have the
desired snapshot repositories registered following node outages
or reboots

Change-Id: Ie6f42e95c3ca4dc2ec70f2852a2bde11e59ec097
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-08-02 02:02:14 +00:00
Zuul
26ed62352b Merge "Ceph-Client: update configmap name for defragosds cronjob" 2019-08-02 00:21:41 +00:00
Zuul
ea303850cd Merge "Elasticsearch: Manually verify snapshot repositories" 2019-08-01 18:36:37 +00:00
Chinasubbareddy Mallavarapu
acd5d11bc2 Ceph-Client: update configmap name for defragosds cronjob
This is to update configmap names using by defragosds cronjob.

Change-Id: I29608cd8b6ce1e30615a0f92853939d7bbae9972
2019-08-01 12:22:48 -05:00
Zuul
d3d898de1b Merge "Nagios: Updated the alert for Ceph OSD Down" 2019-08-01 16:15:51 +00:00
Cliff Parsons
e059f4f827 Enhance HTK Job Manifests to be more flexible
This patch enhances the HTK job manifest functions so that each job can
be configured to use the desired backoffLimit and activeDeadlineSeconds,
and can mount the command/script from either a configMap or a secret
instead of being confined to using only configMaps.

Change-Id: I5231e53b98e3e55e3e93070876d8694f37ad642d
2019-08-01 09:20:12 -05:00
Pai, Radhika (rp592h)
a37925c7e8 Grafana: Code for Calico Dashboard
Appended the code that will add the calico dashboard to the Grafana. This will
display the felix metrics which are collected by the prometheus.

Change-Id: If18a18949f8093747b3f9ba819e036778c40b84e
2019-07-31 20:53:55 +00:00
Zuul
85b8d62830 Merge "Provide option to switch between dpdk and non-dpdk" 2019-07-31 20:38:22 +00:00
Manuel Buil
a71f1b4d33 Provide option to switch between dpdk and non-dpdk
We can select if we want an image with dpdk support by adding:

FEATURE_GATES=dpdk

That way we can reuse the same script for different distros by using
openstack-helm/tools/deployment/common/get-values-overrides.sh

Change-Id: Ia2c53556be650899fdd67c1ec06f5c68ae63c9d4
Signed-off-by: Manuel Buil <mbuil@suse.com>
2019-07-31 15:54:51 +00:00
Ahmad Mahmoudi
db164a2925 Generate CA crt and key if needed
Generate CA cert and CA key, if they are not present in
the values.

Change-Id: I14610ab66b72ddd5e6e45f57b56968e462416234
2019-07-30 13:16:03 -05:00
Arun Kant
7a8bb7058b Removing deprecated option usage in gatther pod logs logic
As per PR, https://github.com/kubernetes/kubernetes/pull/60210,
in kubectl get show-all option is deprecated and no longer needed.
Presumably now that's the default behavior.

Also in current logs gathering logic, we are interested in capturing
only pod names, so removing that option is harmless.

We are seeing related failures in local CI when kubectl version is
1.15.x. So removing this option.

Change-Id: I3886c792fe28bc8b80504d8c91e9524039131b15
2019-07-30 08:19:38 -07:00
Steve Wilkerson
8130e6bdc5 Elasticsearch: Manually verify snapshot repositories
This updates the script for registering snapshot repositories to
include a manual verification of the repositories created. This
simply allows for inspection of all master and data nodes the
repository is verified with to provide additional visibility into
the state of all repositories

Change-Id: I6e5386386e2b79b1cb0f41fc1f9b78817695f8f3
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-07-24 15:37:23 -05:00
Zuul
17a7eb5cdc Merge "Restore overrides functionality after regression" 2019-07-24 16:25:49 +00:00
Anderson, Craig (ca846m)
ab8c81f2ee Restore overrides functionality after regression
Revert 833d426da8e4b049277ca9847830f6e6beee40c3

https://review.opendev.org/#/c/667022 introduced a regression in the
overrides functionality, which caused the corresponding gate test to
fail. This "fixed" a problem by breaking the override capability.

This patchset reverts the previous to restore override functionality and
make gates green again. Deep copy is added in order to resolve the
original problem that 667022 attempted to resolve.

Change-Id: I6c052c0fabe0067612d6a3d9d3bfac4df59202d7
2019-07-24 12:18:44 +00:00
Chinasubbareddy Mallavarapu
dc66254c42 Ceph-RGW: fix file permision issue
This is to fix the issue we are facing with file permision on the file
/var/lib/ceph/bootstrap-rgw/ceph.keyring since owner of the file
will be root.

This is happening when node with rgw reboots and rgw pods fails at
init after reboot,this is happening on sinlge node deplyoments.

issue:

ceph-rgw-5db485fbd9-dv778  0/1  Init:CrashLoopBackOff   5  6m49s

logs:
+ chown -R ceph. /run/ceph/ /var/lib/ceph/bootstrap-rgw /var/lib/ceph/radosgw
/var/lib/ceph/tmp
chown: changing ownership of
'/var/lib/ceph/bootstrap-rgw/ceph.keyring': Operation not permitted

Change-Id: Idcb648c205053b2f03357b59173e70e02f28688c
2019-07-23 10:52:31 -05:00
Zuul
b7a7e81056 Merge "Updated the CEPH Cluster Health Panel values" 2019-07-22 15:29:36 +00:00
Zuul
f4a9e2b43c Merge "Fix mon_host hosts when hostname contains 'ip'" 2019-07-19 21:24:59 +00:00
Pai, Radhika (rp592h)
47565d2d19 Nagios: Updated the alert for Ceph OSD Down
Earlier the Nagios alert monitor was percent based as in when the percent of OSD
down is greater than 80, it will send alert.
>check_prom_alert!ceph_osd_down_pct_high!CRITICAL- CEPH OSDs down is
more than 80 percent!OK- CEPH OSDs down is less than 80 percent

Updated the code in nagios values.yaml to send alert when even 1 OSD is
down:
>check_prom_alert!ceph_osd_down!CRITICAL- One or more CEPH OSDs are down
>for more than 5 minutes!OK- All the CEPH OSDs are up

Change-Id: Id24c4a0cca64674890dae3599edc0c90d9534e90
2019-07-19 19:25:53 +00:00
Doug Aaser
9a36becf20 Cleanup unused Postgres config values
This patch is part of an effort to cleanup the values.yaml file for
Postgres, which has gotten messy since the introduction of Patroni. This
patch specifically removes unused configuration values which were
causing unnecessary bloat and complexity.

Change-Id: I96180fd9c91200ba7558e58bd503b4ef9ebc183e
2019-07-19 17:16:04 +00:00
Daniel Pawlik
0b58aea135 Fix mon_host hosts when hostname contains 'ip'
Ceph-mon template script parse mon_host in wrong way, when
hostname contains'ip' word, e.g.: airship.

Change-Id: I0a097443d42ad2e9b6be6c61facd7932ddb4b3bb
Story: 2006255
2019-07-19 10:49:50 +00:00
Pete Birley
af270934d4 Rabbit: Eradicate potential crashes in wait job while upgrading cluster
When upgrading/reconfiguring a rabbit cluster its possible that the nodes
will not return the cluster status for some time, this ps allows us to
cope with this much more gracefully than simply crashing a few times, before
proceeding.

Change-Id: Ibf525df9e3a9362282f70e5dbb136430734181fd
Signed-off-by: Pete Birley <pete@port.direct>
2019-07-18 23:07:32 +00:00
Zuul
2c8b18aeb8 Merge "Openvswitch: Fix typo in image overrides" 2019-07-18 20:30:45 +00:00
Zuul
0c3a46ae6e Merge "Helm-Toolkit: Add a function to return quoted csv sting from a list" 2019-07-18 20:15:12 +00:00
Zuul
e29022f8ae Merge "Revert "CI: Make openstack-support and keystone-auth jobs nonvoting"" 2019-07-18 19:47:54 +00:00
Manuel Buil
dc1b4dd1c5 Openvswitch: Fix typo in image overrides
The tag is pointing to a libvirt image. It should point to the
openvswitch image

Change-Id: If95a7b9cce2cadcb644389c28799fff48572c549
Signed-off-by: Manuel Buil <mbuil@suse.com>
2019-07-18 18:43:25 +00:00
Pete Birley
af17153627 RabbitMQ: prune any extra nodes from cluster if scaling down
This PS updates the cluster wait job to prune any extra nodes from
the cluster if scaling down.

Change-Id: I58d22121a07cd99448add62502582a6873776622
Signed-off-by: Pete Birley <pete@port.direct>
2019-07-18 17:21:37 +00:00
cheng li
776885458a Revert "CI: Make openstack-support and keystone-auth jobs nonvoting"
This reverts commit 5e3f729ffe5692e6e37d0fe6378906662d94bbd0.

Change-Id: I65cb5d24f0538fbd0d6cd28e5e6313e679d87655
2019-07-17 14:06:21 +00:00
Pete Birley
e96bdd9fb6 Ingress: Clean up tmp dir entirely on container start
This PS cleans up the container dir entirely on container restart,
as sometimes remnets of previous runs can cause issues.

Change-Id: I873667a8a57bca6096cbe777ee83ef8648a368d4
Signed-off-by: Pete Birley <pete@port.direct>
2019-07-16 01:21:02 +00:00
Alexander Noskov
3b5a1c7909 Take dnsPolicy from .Values.pod.dns_policy variable
Change-Id: Iae7caa5bdefe7749231c031c6003591a6251fa97
2019-07-15 17:31:16 +00:00
Zuul
769d0980f0 Merge "Prometheus: Fix volume utilization alert expression" 2019-07-14 04:49:18 +00:00
Zuul
e01741589a Merge "Tenant-Ceph: Enable cephfs storage class provisioning" 2019-07-13 16:16:54 +00:00
Zuul
79c9777bf4 Merge "Remove quotes for bind-address in ingress Chart" 2019-07-13 14:21:48 +00:00
Alexander Noskov
0eff94f51c Remove quotes for bind-address in ingress Chart
Currently, we are getting `bind-address: null` in ingress-conf for ingress pod in kube-system namespace
In that case, nginx starting on 0.0.0.0:80 which breaks other ingress controllers, such as maas-ingress.
All further ingress controllers can't start because they can't bind on 80 port.

Change-Id: Ie7e9563bf14fe347969bea0d3c900c8d87d06de0
2019-07-12 17:10:00 -05:00
Drew Walters
8ba46703ee CI: Restore Xenial compatibility in K8s script
Recently, the Minikube gate script was modified to support Ubuntu Bionic
[0]; however, the change made the script incompatible with Ubuntu Xenial
because libxtables12 is not available on Ubuntu Xenial. OpenStack-Helm
still supports Ubuntu Xenial, and this script should too.

This change modifies the gate script to install iptables instead of
libxtables12. The iptables package depends on libxtables11 on Ubuntu
Xenial and libxtables12 on Ubuntu Bionic, so this achieves the same
result.

[0] https://review.opendev.org/650523

Change-Id: I5afbcfeca6e7b30857a44aed35a360595eeb5037
Signed-off-by: Drew Walters <andrew.walters@att.com>
2019-07-12 13:50:22 +00:00
Steve Wilkerson
7e55710a42 Tenant-Ceph: Enable cephfs storage class provisioning
This updates the tenant ceph job to provision the cephfs storage
class by removing the override that prevents it. This is required
for the ceph namespace activation deployment for osh-infra to
successfully pass its helm tests

Change-Id: I3f801cb2a369f6a073105296d7cc4f98fddf6a68
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-07-12 13:45:40 +00:00
Steve Wilkerson
ae3c07b853 Ceph: Update default test pod timeout for provisioners
This mvoes the default timeout for the ceph provisioners helm test
pod to 600 seconds, as 120 seconds is fairly aggressive.  This
also adds the required --timeout flag to the helm test command in
each job for the ceph provisioners, as well as adding the required
helm test configuration to the armada-lma manifest

Change-Id: I5a3b98de9132fe83cf09b1e5b3fcc513bd496650
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-07-12 13:43:38 +00:00
Zuul
aead8ca0b9 Merge "Extended OVS chart with support for DPDK" 2019-07-12 13:30:48 +00:00
Zuul
c3ac26a35d Merge "Pentest-NC1.0 Nova–Security HTTP Headers Not Present" 2019-07-11 22:28:11 +00:00
Zuul
e40c903cda Merge "Armada: Fix issues with armada-lma manifest" 2019-07-11 19:06:43 +00:00
Zuul
36b31af88a Merge "Disable systemd-resolved service in nameserver role" 2019-07-11 19:04:00 +00:00
Zuul
639dcc2da3 Merge "Enable calico prometheus metrics for minikube" 2019-07-11 19:03:59 +00:00
Pai, Radhika (rp592h)
891e259d66 Updated the CEPH Cluster Health Panel values
Change-Id: Id4016d1ce6c0e2acadef31496102667ee79f030f
2019-07-11 15:25:39 +00:00
Alexander Noskov
b191d4ae99 Update symlink for 110-kibana.sh
070-kibana.sh was renamed in https://review.opendev.org/#/c/661753/1/tools/deployment/osh-infra-logging/075-kibana.sh

Change-Id: I043179d259f51734056d168058304ca9a8ff4de4
2019-07-10 18:12:27 -05:00