61 Commits

Author SHA1 Message Date
Emilien Macchi
b3a7cfc43f ansible: replace yum module by package module when possible
Problem: RHEL and CentOS8 will deprecate the usage of Yum.

From DNF release note:
DNF is the next upcoming major version of yum, a package
manager for RPM-based Linux distributions.
It roughly maintains CLI compatibility with YUM and defines a strict API for
extensions.

Solution: Use "package" Ansible module instead of "yum".

"package" module is smarter when it comes to detect with package manager
runs on the system. The goal of this patch is to support both yum/dnf
(dnf will be the default in rhel/centos 8) from a single ansible module.

Change-Id: I8e67d6f053e8790fdd0eb52a42035dca3051999e
2018-07-21 00:17:25 +00:00
Carlos Camacho
00ed1a2d39 FFU Create cell0 db points to the nova-api bootstrap node.
In case the nova-api service is not running in the MySQL
master node, the FFU tasks will fail as it might not have
MySQL installed.

Avoid executing Nova DB tasks on FFU if MySQL not installed,
point to the MySQL server.

Resolves: rhbz#1593910
Closes-bug: 1780425

Change-Id: I02bc48d535707d579ecd590f970b1a08962a0111
2018-07-09 16:22:26 +02:00
Zuul
4dfb598bcc Merge "Upgrades: Refactor playbooks to set facts" 2018-06-14 11:31:56 +00:00
Michele Baldessari
a0dfc6c0c6 rerun *_init_bundles all the time
In the same spirit as change I1f07272499b419079466cf9f395fb04a082099bd
we want to rerun all pacemaker _init_bundles all the time. For a few main
reasons:
1) We will eventually support scaling-up roles that contain
   pacemaker-managed services and we need to rerun _init_bundles so that
   pacemaker properties are created for the newly added nodes.
2) When you replace a controller the pacemaker properties will be
   recreated for the newly added node.
3) We need to create appropriate iptables rules whenever we add a
   service to an existing deployment.

We do this by adding the DeployIdentifier to the environment so that
paunch will retrigger a run at every redeploy.

Partial-Bug: #1775196
Change-Id: Ifd48d74507609fc7f4abc269b61b2868bfbc9272
2018-06-09 10:22:15 +02:00
Michele Baldessari
a6389da22d Introduce restart_bundle containers to detect config changes and restart pacemaker resources
During the containerization work we regressed on the restart of
pacemaker resources when a config change for the service was detected.
In baremetal we used to do the following:
1) If a puppet config change was detect we'd touch a file with the
   service name under /var/lib/tripleo/pacemaker-restarts/<service>
2) A post deployment bash script (extraconfig/tasks/pacemaker_resource_restart.sh)
   would test for the service file's existence and restart the pcs service via
   'pcs resource restart --wait=600 service' on the bootstrap node.

With this patchset we make use of paunch's ability do detect if a config
hash change happened to respawn a temporary container (called
<service>_restart_bundle) which will simply always restart the pacemaker
service from the bootstrap node whenever invoked, but only if the pcmk
resource already exists. For this reason we add config_volume and bind
mount it inside the container, so that the TRIPLEO_CONFIG_HASH env
variable gets generated for these *_restart_bundle containers.

We tested this change as follows:
A) Deployed an HA overcloud with this change and observed that pcmk resources
   were not restarted needlessly during initial deploy
B) Rerun the exact same overcloud deploy with no changes, observed that
   no spurious restarts would take place
C) Added an env file to trigger the of config of haproxy[1], redeployed and observed that it restarted
   haproxy only:
   Jun 06 16:22:37 overcloud-controller-0 dockerd-current[15272]: haproxy-bundle restart invoked
D) Added a trigger [2] for mysql config change, redeployed and observed restart:
   Jun 06 16:40:52 overcloud-controller-0 dockerd-current[15272]: galera-bundle restart invoked
E) Added a trigger [3] for a rabbitmq config change, redeployed and observed restart:
   Jun 06 17:03:41 overcloud-controller-0 dockerd-current[15272]: rabbitmq-bundle restart invoked
F) Added a trigger [4] for a redis config change, redeployed and observed restart:
   Jun 07 08:42:54 overcloud-controller-0 dockerd-current[15272]: redis-bundle restart invoked
G) Rerun a deploy with no changes and observed that no spurious restarts
   were triggered

[1] haproxy config change trigger:
parameter_defaults:
  ExtraConfig:
    tripleo::haproxy::haproxy_globals_override:
      'maxconn': 1111

[2] mysql config change trigger:
parameter_defaults:
  ExtraConfig:
    mysql_max_connections: 1111

[3] rabbitmq config change trigger (default partition handling is 'ignore'):
parameter_defaults:
  ExtraConfig:
    rabbitmq_config_variables:
      cluster_partition_handling: 'pause_minority'
      queue_master_locator: '<<"min-masters">>'
      loopback_users: '[]'

[4] redis config change trigger:
parameter_defaults:
  ExtraConfig:
    redis::tcp_backlog: 666
    redis::params::tcp_backlog: 666

Change-Id: I62870c055097569ceab2ff67cf0fe63122277c5b
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Closes-Bug: #1775196
2018-06-08 16:06:24 +02:00
Lukas Bezdicka
56bec75c02 Upgrades: Refactor playbooks to set facts
To not to redefine variable multiple times in each service we
run check only once and we set fact. To increase readability of
generated playbook we add block per strep in services.

Change-Id: I2399a72709d240f84e3463c5c3b56942462d1e5c
2018-06-08 11:46:12 +02:00
mandreou
adb10e6586 Pike to Queens controller upgrade guard rerun with no images
As discussed at [0] if current overcloud resources are removed
manually/some error, and even container images deleted, the
upgrade tasks should be able to guard.

We already have a similar guard for the fetch&retag at [1]

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1584809
[1] 8824e7abcd/docker/services/pacemaker/haproxy.yaml (L305-L324)

Change-Id: I2c81e6d0f73fbef0f2a347b9fd4d27df91c2fdd1
2018-06-01 12:13:42 +03:00
Carlos Camacho
44ef2a3ec1 Change template names to rocky
The new master branch should point now to rocky.

So, HOT templates should specify that they might contain features
for rocky release [1]

Also, this submission updates the yaml validation to use only latest
heat_version alias. There are cases in which we will need to set
the version for specific templates i.e. mixed versions, so there
is added a variable to assign specific templates to specific heat_version
aliases, avoiding the introductions of error by bulk replacing the
the old version in new releases.

[1]: https://docs.openstack.org/heat/latest/template_guide/hot_spec.html#rocky
Change-Id: Ib17526d9cc453516d99d4659ee5fa51a5aa7fb4b
2018-05-09 08:28:42 +02:00
Bogdan Dobrelya
04fd6ff1b1 Copy-in redis certs via kolla extended/start
Instead of bind-mounting directly into the redis container,
follow the established approach for ditributing certificates
in containers.

Partial-bug: #1767998

Change-Id: Iff1a757c4893698ba550143d786088e5b9ffd714
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2018-04-30 13:16:02 +02:00
Zuul
0a1b9c27d8 Merge "Make pcs resource bundle image name update tolerant of rerun" 2018-04-25 18:26:50 +00:00
mandreou
8530dd9ddc Make pcs resource bundle image name update tolerant of rerun
The parent review at Ic87a66753b104b9f15db70fdccbd66d88cef94df
allows us to update the name for pcs resource bundle resources
if this is changed as part of the upgrade configuration.

If the upgrade is interrupted the target pacemaker resource bundle
may not even have been created yet. This groups stop/update/start
the bundle resource and adds a new conditional to check if the
cluster resource exists before trying to update the container
image being used. Otherwise a re-run of the upgrade tasks may
fail if the cluster resource doesn't exist.

Related-Bug: 1763001
Change-Id: Ifc6f78d73bc71a5b5edfadfbfacaa3560fe7c2df
2018-04-23 11:42:24 +00:00
Zuul
34ef2efc6f Merge "Upgrade: make bundles use new container image name after upgrade" 2018-04-21 05:17:18 +00:00
Damien Ciabrini
a246549303 Reduce verbosity during mysql bootstrap
During the initial deployment, a one-time container is used to bootstrap the
mysql databse on disk, create the required users and set their password. The
script that runs that is too verbose and logs some credentials in the
container's logs and in the journal.

Use kolla_extend_start directly instead of kolla_start to stop tracing shell
commands and reduce logging to the bare minimum for troubleshooting.

Closes-Bug: #1765339

Change-Id: I90827feff0d1b9fd8badb72e68e4c8dd8db8aea5
2018-04-19 19:14:03 +00:00
Damien Ciabrini
e8a1fc25d7 Upgrade: make bundles use new container image name after upgrade
The major_upgrade tasks for HA services only allows to change the container
image tag used by bundles. It doesn't work when the image name changes.

Fix this unwanted behaviour by updating the bundle's attribute in pacemaker
to use container image <NEW>:pcmklatest instead of <CURRENT>:pcmklatest
We are constrained by the steps at when we can modify the bundle:
  . Image update must stay at step 3 when pacemaker is stopped.
  . image name used by the bundle must be available in docker when the
    bundle is updated

So we re-use the double tagging idiom to perform the image update:
  . At step 0, we tag the image pointed to by <CURRENT>:pcmklatest
    with an additional temporary tag <NEW>:pcmklatest.
    => this ensures that at step1, the new tag is available on all
       controller nodes.
  . At step 1, we update the resource bundle to use the new image
    name <NEW>:pcmklatest.
    => at the end of step1, the bundle will be configured with the
       new name, and be able to start even if the real container
       image hasn't be pulled yet.
  . At step 3, the existing code will download the real image
    <NEW>:<NEWTAG> and make tag <NEW>:pcmklatest point to it.

Since the bundle is always modified, we now stop and restart the
bundle resources unconditionally.

Also, move the mariadb upgrade task to step 3, when pacemaker is
guaranteed to be stopped, because the task assumes that no mysql
is running while it runs. Fix the mysql permission after rpm
upgrade on the host.

Change-Id: Ic87a66753b104b9f15db70fdccbd66d88cef94df
Closes-Bug: #1763001
2018-04-17 08:04:10 +00:00
Juan Antonio Osorio Robles
6c40b1586a Always run mysql init bundle
This init container runs docker-puppet manually and is responsible of
provisioning the mysql users and passwords. This currently doesn't get
ran every time since the configuration stays the same, even if the users
or passwords change (which are gotten from hieradata). Allowing this to
run every time will allow us to change database passwords

Closes-Bug: #1762991
Change-Id: I1f07272499b419079466cf9f395fb04a082099bd
2018-04-11 10:52:56 +00:00
Damien Ciabrini
624fedb114 Upgrade data on disk on mariadb major upgrade
Add an ansible task to run mysql_upgrade whenever a container
image upgrade causes a major upgrade of mariadb (e.g. 5.5 -> 10.1)

. If the overcloud was containerized prior to the major upgrade, the
  mysql upgrade job is ran in an ephemeral container (where the latest
  version of mysql comes from) and uses credentials from the Kolla
  configuration.

. Otherwise the upgrade job is run from the host (once the mysql
  rpm has been updated) and uses credentials from the host.

We log the output of the script in the journal. Also, the mysql server
needs to be started temporarily, so use a temporary log file for it
when run from the ephemeral container.

Change-Id: Id330d634ee214923407ea893fdf7a189dc477e5c
2018-03-25 19:18:07 +00:00
Damien Ciabrini
f4a45b751b Make HA containers log to /var/log/containers after upgrade
HA containerized services currently log under
/var/log/pacemaker/bundles/{service-replica}.

Move the logging of those HA services into /var/log/containers,
like all the paunch-managed containers. Also leave a readme.txt
in the previous location to notify the change (taken from
Ic8048b25a33006a3fb5ba9bf8f20afd2de2501ee)

Only the main service log is being moved, e.g. for mysql:
  . mysqld.log now ends up in /var/log/containers/mysqld.log
  . pacemaker logs stay under /var/log/pacemaker/bundles/{service-replica}

Note: some HA services don't need to be changed during upgrade:
  . cinder-{backup|volume} log under /var/log/containers/cinder
  . manila-share log under /var/log/containers/manila
  . haproxy only logs to the journal

Change-Id: Icb311984104eac16cd391d75613517f62ccf6696
Co-Authored-By: Jiri Stransky <jistr@redhat.com>
Partial-Bug: #1731969
2018-03-23 16:19:03 +00:00
Damien Ciabrini
f37c06cd9d Fix update of pacemaker container images during major upgrade
Currently, the idiomatic "download image and retag to pcmklatest"
happens at step 2 during upgrade. This doesn't work if the stack
is already containerized before the upgrade, because pacemaker
is still running at step 2.

Reshuffle the steps at which the various upgrade tasks are run,
while keeping the ordering guarantees of the upgrade flow:

  . Deletion of non-containerized resources happens at step 1,
    to allow calling pcs while pacemaker is running.
  . Pacemaker is stopped at step 2.
  . Docker images for containerized resources are upgraded at
    step 3, after the cluster is guaranteed to be stopped.
  . Pacemaker is restarted at step 4 as before, once we know
    that all resources have been upgraded, yum packages updated
    and any potential docker restart has been executed.

Also change the way we detect containerized resources, so that
the predicate still remains valid past step 2 when pacemaker
has been stopped and has deleted its containerized resources.

Change-Id: I85e11dd93c7fd2c42e71b467f46b0044d4516524
2018-03-20 22:36:31 +00:00
Lukas Bezdicka
3f38dd6e46 FFU: Upgrades: fix pacemaker checks
Check if pacemaker resource is defined, not if it's running.
Ensure we try disabling pacemaker resources during FFU.

Change-Id: I9be9118490a28ee9c24d9c8c89a8daee75e5b817
2018-03-15 15:00:24 +01:00
Steven Hardy
3a7baa8fa6 Convert ServiceNetMap evals to hiera interpolation
Since https://review.openstack.org/#/c/514707/ added the net_ip_map
to hieradata, we can look up the per-network bind IPs via hiera
interpolation instead of heat map_replace.

In some cases the ServiceNetMap lookup is used for other things,
but anywhere we make use of the "magic" translation via NetIpMap
is changed the same way.

This will enable more of the configuration data to be exposed per
role vs per node in a future patch (to simplify our ansible
workflow).

Co-authored-by: Bogdan Dobrelya <bdobreli@redhat.com>
Change-Id: Ie3da9fedbfce87e85f74d8780e7ad1ceadda79c8
2018-03-10 08:18:30 +00:00
marios
029ec62b79 Add pacemaker upgrade_tasks for P..Q major upgrade
This adds pacemaker upgrade_tasks for Pike to Queens. We need
to handle both cases:
 - Upgrade from baremetal so we should kill the systemd things.
 - Upgrade from containers so we should kill the container pull
   and retag the image.

Change-Id: Icacb31b79da3a18b7ab0986779a021dfe6a5553f
2018-02-15 10:37:24 +00:00
Zuul
9604728016 Merge "Fix Redis TLS setup and its HA deployment" 2018-02-13 23:34:49 +00:00
Lukas Bezdicka
0cb5c847f3 Always evaluate step first in conditional
If we use variables defined in later step in conditional before
checking which step are we on we will fail.

Resolves: rhbz#1535457
Closes-Bug: #1743764
Change-Id: Ic21f6eb5c4101f230fa894cd0829a11e2f0ef39b
2018-02-09 17:12:29 +01:00
Damien Ciabrini
91db2020df Fix Redis TLS setup and its HA deployment
This patch reverts the revert of Redis TLS [1,2], and update the
pacemaker redis template to configure Redis to encrypt the
replication traffic between Redis nodes.

[1] a3769c03175cb36f0066c173477749a26f767566
[2] ebc8414cd0c18426ff80d9d65c964e91a7fe447f

Depends-On: I6cc818973fab25b4cd6f7a0d040aaa05a35c5bb1
Change-Id: I7f7be4bba6d41c04385f074857c82507cc8c2617
Closes-Bug: #1737707
2018-02-05 14:05:12 +00:00
Zuul
eda047f075 Merge "Remove unused env var during mysql bootstrap" 2018-01-31 10:05:09 +00:00
Zuul
1af7729939 Merge "Convert tags to when statements for Q major upgrade workflow" 2018-01-13 09:39:38 +00:00
Martin André
5e8bec8d53 Remove unused env var during mysql bootstrap
In TripleO we exported the KOLLA_KUBERNETES to skip the cluster
readiness check and workaround a limitation of the mariadb boostrap
script in Kolla that expected MariaDB 10.0 coming from the MariaDB
repository and didn't work with the MariaDB 10.1 from RDO.

Luckily this was fixed in Kolla with
Ia2acb09e877a586243fc1acb49d8d140cf27d7b5 and we can now remove this
tech debt from t-h-t.

Change-Id: Iba62e436a16ddb3cfc87fc4ec03b599e55841681
Related-Bug: #1740060
2018-01-11 10:40:15 +01:00
Alex Schultz
6f834f60e6 Use docker_config_scripts for puppet apply
There are some configuration applies that we need to do during the
deployment. These currently live as manually constructed bash runs which
are missing the --detailed-exitcode handling to know when we have
failures.  In order to reduce the duplicated code and simplify this
exeuction, this change creates a docker_config_scripts with
docker_puppet_run.sh in containers-common that can be reused by any of
the docker services. This allows use to properly handle
--detailed-exitcodes while also reducing the amount of duplicated code
bits that we have within THT.

Additionally this change adds a new shared value for ContainersCommon to
pull the required volumes for the docker_puppet_apply.sh script into a
single place. Unfortunately the existing volumes from ContainersCommon
includes a mount for /etc/puppet to /etc/puppet which causes problems
because we need to be able to write out a hiera value.  The /etc/puppet
mount is needed for the bootstrap_host_exec function which is consumed
by various docker_config tasks but the mount conflicts with the puppet
apply logic being used.

Depends-On: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
Change-Id: Icf4a64ed76635e39bbb34c3a088c55e1f14fddca
Related-Bug: #1741345
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
2018-01-09 17:17:13 -07:00
marios
dec003def8 Convert tags to when statements for Q major upgrade workflow
This converts "tags: stepN" to "when: step|int == N" for the direct
execution as an ansible playbook, with a loop variable 'step'.
The tasks all include the explicit cast |int.

This also adds a set_fact task for handling of the package removal
with the UpgradeRemovePackages parameter (no change to the interface)

The yaml-validate also now checks for duplicate 'when:' statements

Q upgrade spec @ Ibde21e6efae3a7d311bee526d63c5692c4e27b28
Related Blueprint: major-upgrade-workflow
[0]: 394a92f761/tripleo_common/utils/config.py (L141)
Change-Id: I6adc5619a28099f4e241351b63377f1e96933810
2018-01-08 13:57:47 +02:00
Michele Baldessari
4d7e03be85 Add proper debug switch on init_bundles
When deploying with -e environments/config-debug.yaml, which sets
ConfigDebug to true, it is expected that puppet is run with --debug
--verbose. This has happened for most of the puppet uses (via
LP#1722752), but we missed enabling it for the init_bundle under
docker/services.

While we're at it we also add '--color=false' to the puppet apply
command of the init_bundle containers as that is what we use in the
other puppet apply runs.

Closes-Bug: #1738764

Change-Id: If529b83a7342b3ad17d705517978539d1c6b949e
2017-12-18 15:27:36 +01:00
Yurii Prokulevych
09dcd7e26c Search for containers within stopped containers.
During minor update pcs cluster is stopped during step 1.
Then we search for pcs managed containers at step 2.
But since pcs cluster is stopped, 'docker ps' won't report stopped
containers.
This change adds '--all' option to show all the containers.

Change-Id: If38a4f7e25d4d1f4679d9684ad2c0db8475d679b
Closes-Bug: #1737548
2017-12-13 11:34:26 +01:00
Sofer Athlan-Guyot
4a708af34a Add modulepath option when applying puppet inside docker.
When new module are added, we may miss the symlink in
/etc/puppet/modules.  And for consistency as we mount the
/usr/share/openstack-puppet/modules directory it’s better to add it
to the modulepath.

Change-Id: I963aede41403ebbe3b9afb55a725b304a30a0cbb
Closes-Bug: #1736980
2017-12-07 20:09:13 +01:00
Pradeep Kilambi
ebc8414cd0 Redis replication does not work with TLS
Lets revert the tls support until we know it works.

Revert "Enable redis TLS proxy in HA deployments"

This reverts commit c6d8df01d7aa8b44af9ac152b3bb08f07e2e02b7.

Closes-bug: #1735259

Change-Id: If98acec1b8d0a179d56b8412e5c0ad9341719cea
2017-11-29 17:43:42 -05:00
Carlos Camacho
927495fe3d Change template names to queens
The new master branch should point now to queens instead of pike.

So, HOT templates should specify that they might contain features
for queens release [1]

[1]: https://docs.openstack.org/heat/latest/template_guide/hot_spec.html#queens

Change-Id: I7654d1c59db0c4508a9d7045f452612d22493004
2017-11-23 10:15:32 +01:00
Dan Prince
a307fe7ffc Drop step_config as top level docker requirement
Step config is only required within the puppet_configs section
of docker/services/*. This patch drops the top level 'step_config'
and updates the unit tests accordingly.

Change-Id: I7dc7cfae3ef1965ec95b1d9ef23e7f162418c034
2017-11-15 16:01:16 -05:00
Jenkins
afed8d7daa Merge "Make containerized galera use mysql_network everywhere" 2017-10-07 09:14:06 +00:00
marios
a953bda0ae Adds pacemaker update_tasks for Pike minor update workflow
Adds update_tasks for the minor update workflow. These will be
collected into playbooks during an initial 'update init' heat
stack update and then invoked later by the operator as ansible
playbooks.

Current understanding/workflow:
 Step=1: stop the cluster on the updated node
 Step=2: Pull the latest image and retag the it pcmklatest
 Step=3: yum upgrade happens on the host
 Step=4: Restart the cluster on the node
 Step=5: Verification: test pacemaker services are running.

https://etherpad.openstack.org/p/tripleo-pike-updates-upgrades

Related-Bug: 1715557
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Co-Authored-By: Sofer Athlan-Guyot <sathlang@redhat.com>
Change-Id: I101e0f5d221045fbf94fb9dc11a2f30706843806
2017-10-05 14:35:26 +00:00
Juan Badia Payno
5dbe1121e9 docker: add logging(source & groups)
The services that docker depends on, have logging_sources and logging_groups;
but those are not set on the docker outputs so they are not used when dockers
are deployed.

Added logging_source & logging_groups as docker optional parameters in
tools/yaml-validate.py

Closes-Bug: #1718110
Change-Id: I8795eaf4bd06051e9b94aa50450dee0d8761e526
2017-09-27 07:37:14 +00:00
Damien Ciabrini
e10aa591dc Make containerized galera use mysql_network everywhere
The containerized galera service generates a galera.cnf which uses
short hostname to identify itself rather than the fqdn from the
mysql_network (e.g. overcloud-x.internalapi.cloudname).

This breaks when internal TLS is in use, because the mysql certificate
does not reference this short hostname.

Fix the appropriate hiera parameter to make it behave like the
non-containerized galera service.

Change-Id: I904cde38f2baeddab5178e8ad48d34a0c73629af
Closes-Bug: #1719599
2017-09-26 15:23:11 +02:00
Damien Ciabrini
b0f50db80b Disable all uses of wsrep-provider in mysql_bootstrap container
During the bootstrap of the mariadb database, galera replication
must be disabled while the users credentials are being set up. This
is done by setting wsrep-provider=none when starting mysqld_safe.

Icf67fd2fbf520e8a62405b4d49e8d5169ff3925b already disabled it
when the clustercheck credentials are being set up, but Kolla also
start a temporary server for setting up the root password.

Disable the setting directly at the end of the mysql.cnf in the
running container. That way, the default setting from galera.cnf will
be overriden, all mysqld_safe calls will disable WSREP and the setting
will stay ephemeral.

Change-Id: If14e22992b46a35a05a16a9db5ecb360ea13df8f
Closes-Bug: #1717250
2017-09-15 09:35:56 +02:00
Marius Cornea
64d7be1e3d One time delete pacemaker resources during upgrade to containers
This change allows running the major upgrade composable docker
steps multiple times by not trying to delete the pacemaker resources
if they're not reported as started or in master state.

Closes-bug: 1716031
Depends-On: I8da03f5c4a6d442617b81be5793a9724cc8842bf
Change-Id: Ifcf9de8c82550a90a9fb118052d43fdbcdc6ca7e
2017-09-11 10:54:00 +00:00
Martin André
c6d8df01d7 Enable redis TLS proxy in HA deployments
Redis does not have TLS out of the box. Let's use a proxy container for
TLS termination.

This commit enables redis TLS proxy for the HA deployment.

bp tls-via-certmonger

Change-Id: I45e539872a03878337def33c681c4577c1a5629e
2017-09-09 09:18:12 +00:00
Mathieu Bultel
e92430d8d0 Retry if the pacemaker_resource commands failed
Add a retry when the pacemaker_resource command
wasn't apply correctly, more info here:
https://bugzilla.redhat.com/show_bug.cgi?id=1482116

This is the same approach puppet-pacemaker uses
and provides eventual consistency when multiple
nodes change the cluster CIB concurrently.

This change depends-on :
https://review.gerrithub.io/375982

The return code is not available in the current
ansible-pacemaker package.

Change-Id: I8da03f5c4a6d442617b81be5793a9724cc8842bf
2017-09-08 08:30:53 +00:00
Michele Baldessari
c19968ca85 Add --wsrep-provider=none to the mysql_bootstrap container
Depending on the version of mariadb/galera installed the mysql_bootstrap
command might fail. With the following unrevealing error:

openstack-mariadb-docker:2017-08-28.10 "bash -ec 'if [ -e /v" 3 hours ago Exited (124) 3 hours ago

The timeout is actually due to the fact that the following snippets does
not complete within 60 seconds:
"""
if [ -e /var/lib/mysql/mysql ]; then exit 0; fi
kolla_start
mysqld_safe --skip-networking --wsrep-on=OFF &
timeout ${DB_MAX_TIMEOUT} /bin/bash -c ''until mysqladmin -uroot -p"${DB_ROOT_PASSWORD}" ping 2>/dev/null; do sleep 1; done''
mysql -uroot -p"${DB_ROOT_PASSWORD}" -e "CREATE USER ''clustercheck''@''localhost'' IDENTIFIED BY '${DB_CLUSTERCHECK_PASSWORD}'';"
mysql -uroot -p"${DB_ROOT_PASSWORD}" -e "GRANT PROCESS ON *.* TO ''clustercheck'
"""

The problem is that with older mariadb versions:
galera-25.3.16-3.el7ost.x86_64
mariadb-5.5.56-2.el7.x86_64

The mysqld_safe process starts in galera mode (as opposed as to single
local mode):
170830 17:03:05 [Note] WSREP: Start replication
170830 17:03:05 [Note] WSREP: GMCast version 0
...
170830 17:03:05 [ERROR] WSREP: wsrep::connect() failed: 7
170830 17:03:05 [ERROR] Aborting

That means that even though we specified --wsrep-on=OFF it is still
starting in cluster mode. Let's add the extra --wsrep-provider=none
which older versions required.

Let's also add a '-x' to this transient container as that
would have helped a bit because we would have understood right away
that it was mysqld_safe that was not starting. I tested this
successfully on an environment that showed the problem. The new
option is still accepted by newer DB versions in any case.

Closes-Bug: #1714057

Change-Id: Icf67fd2fbf520e8a62405b4d49e8d5169ff3925b
Co-Authored-By: Mike Bayer <mbayer@redhat.com>
2017-08-30 19:35:24 +02:00
Michele Baldessari
bf02ad9d7c Tag the ha containers with 'pcmklatest' at deploy time
We need to tag the HA containers with a special tag so
that the RA definition never changes. We do this step in THT
as opposed to puppet because we need to guarantee
that all images are tagged on all nodes *before* step 2 where the bundle
gets created.

NB: Getting the image name without the tag will require some more
yaql work to get all the cases right. Right now this works only
if we enforce that the image has a ':tag' at the end of the name.
So far this is always the case. If things change we will need to
amend this code.

Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Co-Authored-By: Sofer Athlan-Guyot <sathlang@redhat.com>

Change-Id: I362e6cf26fba77d3f949b7d2fc4b35a3eab9087e
2017-08-18 15:59:17 +02:00
Jenkins
5dea7c99d2 Merge "Do not run clustercheck on the host after O->P upgrade" 2017-08-15 23:53:57 +00:00
Jenkins
6976b8f650 Merge "Enable TLS configuration for containerized Galera" 2017-08-14 23:03:35 +00:00
Damien Ciabrini
ac79bf92d0 Enable TLS configuration for containerized Galera
In non-containerized deployments, Galera can be configured to use TLS
for gcomm group communication when enable_internal_tls is set to true.

Fix the metadata service definition and update the Kolla configuration
to make gcomm use TLS in containers, if configured.

bp tls-via-certmonger-containers

Change-Id: Ibead27be81910f946d64b8e5421bcc41210d7430
Co-Authored-By: Juan Antonio Osorio Robles <jaosorior@redhat.com>
Closes-Bug: #1708135
Depends-On: If845baa7b0a437c28148c817b7f94d540ca15814
2017-08-11 04:26:41 +00:00
Damien Ciabrini
7968f37f6e MariaDB: create clustercheck user at container bootstrap
In HA overclouds, the helper script clustercheck is called by HAProxy to poll
the state of the galera cluster. Make sure that a dedicated clustercheck user
is created at deployment, like it is currently done in Ocata.

The creation of the clustercheck user happens on all controller nodes, right
after the database creation. This way, it does not need to wait for the galera
cluster to be up and running.

Partial-Bug: #1707683
Change-Id: If8e0b3f9e4f317fde5328e71115aab87a5fa655f
2017-07-31 12:37:36 -04:00
Damien Ciabrini
090b33ed3b Do not run clustercheck on the host after O->P upgrade
Once an Ocata overcloud is upgraded to Pike, clustercheck should only be
running in a dedicated container, and xinetd should no longer manage it on
the host. Fix the mysql upgrade_task accordingly.

Change-Id: I01acacc2ff7bcc867760b298fad6ff11742a2afb
Closes-Bug: #1706612
2017-07-27 14:48:35 +00:00