Since the undercloud is localhost, ansible skips ssh and just runs local
commands. That will cause problems when running ansible-playbook under
the mistral workflow because the mistral user can not use sudo. Set
become:false on all the undercloud plays as sudo is not actually needed.
Change-Id: I2980c584d2f4ee5c2de3720eecfc80cc43ee1fa6
implements: blueprint ansible-config-download
When SELinux is enforcing, use the docker volume mount flag
:z for the docker-puppet tool's bind-mounted volumes in RW mode.
Note, if a volume mount with a Z, then the label will be specific
to the container, and not be able to be shared between containers.
Volumes from /etc/pki mounted RO do not require the context changes.
For those RO volumes that do require it, use :ro,z.
For deploy-steps, make sure ansible file resources in /var/lib/
are enforced the same SELinux context attributes what docker's :z
provides.
Partial-bug: #1682179
Related-bug: #1723003
Change-Id: Idc0caa49573bd88e8410d3d4217fd39e9aabf8f2
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
The overcloud deployment playbook consists of several plays. Facts
gathered in a single playbook persist between plays, but by default
each play gathers facts from involved nodes before applying any
roles/tasks. That resulted in gathering facts many times.
This commit changes the deployment playbook so that we gather facts
once at the beginning, and then reuse them for subsequent plays.
Also any_errors_fatal is added to make sure that when one host fails,
subsequent tasks aren't attempted on the other hosts either.
Change-Id: I192ea99105bd188554d45a6e4290bb33d1f08ff1
puppet run on never fails, even when it should, since we moved
to the ansible way of applying it. The reason is the current following code:
- name: Run puppet host configuration for step {{step}}
command: >-
puppet apply
--modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules
--logdest syslog --logdest console --color=false
/var/lib/tripleo-config/puppet_step_config.pp
The above is missing the --detailed-exitcodes switch and so puppet will never
really error out on us and the deployment will keep on running all the
steps even though a previous puppet manifest might have failed. This
cause extra hard-to-debug failures.
Initially the issue was observed on the puppet host runs, but this
parameter is missing also from docker-puppet.py, so let's add it there
as well as it makes sense to return proper error codes whenever we call
puppet.
Besides this being a good idea in general, we actually *have* to do it
because puppet does not fail correctly without this option due to the
following puppet bug:
https://tickets.puppetlabs.com/browse/PUP-2754
Depends-On: I607927c2ee5c29b605e18e9294b0f91d37337680
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Change-Id: Ie9df4f520645404560a9635fb66e3af42b966f54
Closes-Bug: #1723163
This is needed because these aren't defined by default when deploying
without config-download-environment.yaml and
https://review.openstack.org/#/c/508189/, but it's still useful to allow
re-running the deploy steps for debugging, e.g:
ansible-playbook -v -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml
Currently this doesn't work anymore, because these variables are undefined.
Change-Id: I2d99c1cb8bf4ccd8581e78d914d438e5de544219
Before pike we used to be able to add -e environments/config-debug.yaml
and that would give us debug logs for puppet. With the move to ansible
running puppet we lost this feature.
Let's make sure that the old ConfigDebug variable still works with
the ansible playbook-based deploy steps. With this patch and ConfigDebug
set to true, we correctly get the puppet debug logs:
TASK [debug] *******************************************************************
ok: [localhost] => {
"(outputs.stderr|default('')).split('\n')|union(outputs.stdout_lines|default([]))": [
"Warning: Undefined variable 'deploy_config_name'; ",
" (file & line not available)",
"Warning: This method is deprecated, please use the stdlib validate_legacy function, with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]",
" (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:25:in `deprecation')",
"Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8",
"Debug: Loading external facts from /etc/puppet/modules/openstacklib/facts.d",
"Debug: Loading external facts from /var/lib/puppet/facts.d",
....
Change-Id: Ia726fb8ca4a6f7bbbd7a1284d76ff42df6825d01
Closes-Bug: #1722752
Services can define external_deploy_tasks, which are meant to be
executed on the undercloud node. They are step-based as the other
Ansible tasks we have, and they get executed during each deployment
step before the puppet and docker tasks.
These tasks can be used to perform complex actions from the
undercloud, such as executing nested installers like kubespray or
ceph-ansible. This should allow deploying overcloud with a single
Ansible playbook, and without creating Ansible->Mistral->Ansible loop.
Implements: blueprint ansible-config-download
Change-Id: I3dcafb96f5cea5fdcebe2b2012b61a38b0568834
Depends-On: I8491540edf78711f3229eabeda22a17cd55e99c8
Presently, "openstack overcloud config download" does not support all
Deployment resources, only those included in the RoleData and are
natively of type group:ansible.
This patch adds support for also pulling all the deployment data for
OS::Heat::SoftwareDeployment (singular) resources applied to individual
servers of any group type. Those resources are mapped to a new nested
stack via the config-download-environment.yaml environment.
The nested stack has the same interface as a SoftwareDeployment but only
creates a OS::Heat::Value resource. The "config download" code will be
updated in a separate patch to read the deployment data from these Value
resources and apply them via ansible.
The related tripleo-common patch (which depends on this patch) is:
I7d7f6b831b8566390d8f747fb6f45e879b0392ba
implements: blueprint ansible-config-download
Change-Id: Ic2af634403b1ab2924c383035f770453f39a2cd5
In the deploy steps playbook downloaded via "openstack overcloud config
download", all the tasks require sudo. The tasks should use "become:
true", otherwise they fail with permission denied errors.
Change-Id: I561b5ef6dee0ee7cac67ba798eda284fb7f7a8d0
Closes-Bug: #1717298
We should start the sequence at 1 instead of 0, since all our puppet
manifests assume the first step is 1. Trying to run our puppet manifests
with a hieradata value of step=0 actually results in an error because no
classes are included.
Change-Id: I93dc8b4cefbd729ba7afa3a4d81b4ac95344cac2
Closes-Bug: #1717292
During the controlplane upgrade the host_prep_tasks are being
executed on the disable_upgrade_deployment roles too.
This sets the role specific host_prep_tasks to an empty list for
those roles during an upgrade, as executing them during the
controlplane upgrade (during -e
major-upgrade-composable-steps-docker.yaml) causes problems.
They will be executed as part of the non controller upgrade as they
are written to the stack outputs to be used as ansible playbooks
(see bug 1708115 for more info on this)
Change-Id: I42c963440b9b1e8222097c3d4e83ffcbe820886c
Closes-Bug: 1719604
After landing https://review.openstack.org/#/c/503484/ we run the
puppet host configuration steps twice. This change removes the
deploy_steps_tasks.yaml playbook in order to run the puppet steps
only once.
Closes-bug: 1717244
Change-Id: I09461094618124915841c8390c8bce8daf64d029
Using the service_ prefix seems incoherent with its use in
service_config_settings (vs config_settings).
Change-Id: Ia39f181415bee0071409dabddfa0c5c312915e1f
This reverts commit a7a02f0da866c66dce9757a42bf56144cfa70d5a.
This change requires a heat functionality which is not yet available so scenario001-containers job fails because of the new tags. Reverting to unblock CI and this should be back after we have heat promotion
Change-Id: Ib0fed291c1c4e41d1ea0bb7fc2ccbdabac1d336b
Closes-Bug: #1716915
This adds a new config/deployment per role that will come after any
post deploy steps. It drives the same ansible config as the
upgrade_tasks but instead collects the post_upgrade_tasks for any
service in the given role.
The workflow is upgrade_tasks, then post deploy steps (either
puppet/ or docker/ depending on the env) and then the
post_upgrade_tasks added here.
This is added to the pacemaker/cinder-volume.yaml service for now
see the bug below for more info
Change-Id: Iced34fecf02ebddc91df9302de54d2f4c2cab680
Closes-Bug: 1706951
I96ec09bc788836584c4b39dcce5bf9b80e914c71 added this output to the
deploy-steps.j2, but missed adding this to the major upgrade template
which means the overcloud RoleConfig output is broken after the upgrade
(until the converge update switches back to the deploy-steps.j2 derived
template)
Closes-Bug: #1716404
Change-Id: I331fa18b456ca2d6c124316d513374e3fe5a5007
This is useful to easily filter workflows created by the templates
and for a specific stack.
Change-Id: I0a26cacaf5ad5709881043434694c9254a9e710b
Related-Bug: #1715389
Use a more restrictive mode for these files, as some may contain sensitive data
which shouldn't be world readable
Closes-Bug: #1714986
Change-Id: Ib1e79b1d4e25d6e329938402b1ca776bdab81bdd
Where applicable, use list_concat instead of yaql to build new lists: it
should be more resilient to errors, easier to debug, and less expensive.
Change-Id: I6d3dbc7ee8eac50f46023a35af4ec7f2d378fd87
Related-Bug: #1714005
For bug 1708115 and the O..P upgrade, and for the upgrade of
'non-controlers' we are now generating ansible playbooks from
collected service upgrade_tasks and these are executed instead
of the legacy tripleo_upgrade_node.sh.
To clarify, by 'non-controllers' it is meant any node for which
the corresponding roles_data.yaml role has the
disable_upgrade_deployment flag set True.
As a first pass, I am removing the workarounds from the script but
keeping its delivery mechanism for now in case it is needed still.
We can either update here to remove it or keep it until next cycle
The most important part for now is that we no longer 'manually'
run puppet here. Instead the post_deploy_steps are also collected
into a playbook and will be executed after the upgrade_tasks
(see the bug for discussion of the mechanism and related reviews)
Change-Id: Ib017b0ab435ca9558cf8659d434489cdf01df955
Related-Bug: 1708115
docker-puppet.py is very aggressive about running concurrently.
It uses python multiprocessing to run multiple config generating
containers at once. This seems to work well in general, but
in some cases... perhaps when the registry is slow or under
heavy load can cause timeouts to occur. Lately I'm seeing
several 'container did not start before the specified timeout'
errors that always seem to occur when config files are generated
(docker-puppet.py is initially executed.
A couple of things:
-when config files are generated this is the first time
most of the containers are pulled to each host machine
during deployment
-docker-puppet.py runs many of these processes at once. Some
of them run faster, other not.
-docker daemon's pull limit defaults to 3. This would throttle
the above a bit perhaps contributing the the likelyhood of a timeout.
One solution that seems to work for me is to set the PROCESS_COUNT
in docker-puppet.py to 3. As this matches docker daemon's default
it is probably safer at the cost of being slightly slower in some
cases.
Change-Id: I17feb3abd9d36fe7c95865a064502ce9902a074e
Closes-bug: #1713188
Force the count start to 0 to ensure that the
update step loop will start to 0 and execute the
update step0
Closes-Bug: #1712498
Change-Id: I71be55c1f56e53e5c565bec281795d63e5845ff6
To get this to work upgrade_tasks need to be rewritten with 'when'
statements like the update tasks (in parent review from shardy).
So that we don't break the existing upgrades workflow, we add these
as part of the config download see the depends on
Related-Bug: 1708115
Depends-On: Ief593dc758a2ffe33c1cbcbda9289393fcf023e4
Change-Id: Ib01b96a2c26721747d81d98e3d57c4c388663004
This enables either deploying without configuring any services, or
temporarily disabling the deploy steps such as will be required
for minor updates where we want to re-run the rolling update outside
of heat.
To deploy directly via ansible-playbook you can do e.g:
openstack overcloud config download --config-dir tmpconfig
cd tmpconfig/tripleo-6b02U7-config
ansible-playbook -vvv -b -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml
Which will run the same ansible steps as we normally run via heat.
Change-Id: I59947b67523dfcc43d454d4ac7d82b06804cf71d
These work the same way as upgrade_tasks *but* they use a step variable
instead of tags, so we can iterate over a count/sequence which isn't
possibly via a wrapper playbook with tags (we may want to align upgrade
tasks with the same approach if this works out well).
Note the tasks can be run via ansible-playbook on the undercloud, like:
openstack overcloud config download --config-dir tmpconfig
cd tmpconfig/tripleo-HCrDA6-config
ansible-playbook -b -i /usr/bin/tripleo-ansible-inventory update_steps_playbook.yaml --limit controller
The above will do a rolling update for the Controller role (note the inconsistent
capitalization, we probably need to fix the group naming in tripleo-ansible-inventory)
because we specify serial: 1 in the playbook.
You can also trigger an update explicitly on one node like this, which is useful for debugging:
ansible-playbook -vvv -b -i /usr/bin/tripleo-ansible-inventory update_steps_playbook.yaml --limit overcloud-controller-0
Change-Id: I20bb3e26ab9d9cadf1a31fd304de8a014a901aa9
This exposes the deploy workflow for all roles from deploy-steps
via overcloud.j2.yaml - which means we can write it via the new
openstack overcloud config download command and/or run the workflow
outside of heat via mistral
With https://review.openstack.org/#/c/485732/ applied to
tripleoclient it becomes possible to do:
openstack overcloud config download --config-dir tmpconfig
cd tmpconfig/tripleo-EvEZk0-config
ansible-playbook -b -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml
This runs the deploy steps, exactly the same as normally run via heat
via ansible-playbook for all overcloud nodes (--limit can be used to restrict
to specific nodes/roles).
Change-Id: I96ec09bc788836584c4b39dcce5bf9b80e914c71
This isn't set unless the playbook is run via heat, so default it to false
to enable easier use via ansible-playbook combined with tripleo-ansible-inventory
Change-Id: I9705e4533831a019dd0051e5522d4b7958682506
If we consolidate these we can focus on one implementation (the new ansible
based one used for docker-steps)
Change-Id: Iec0ad2278d62040bf03613fc9556b199c6a80546
Depends-On: Ifa2afa915e0fee368fb2506c02de75bf5efe82d5
The key_name default is ignored because the parameter is used in
some mutually exclusive environments where the default doesn't
need to be the same.
Change-Id: I77c1a1159fae38d03b0e59b80ae6bee491d734d7
Partial-Bug: 1700664
This makes the RolesData output more accurate, and we can rework
things so docker-puppet only gets run when there is a non-empty
file calculated (e.g there are tasks to run).
Change-Id: I8cdab3c857977c80fe2e359ab9e05740a838d66b
This stores the result of the yaql queries etc for easier debugging, and
also so there's no risk we constantly re-evaluate the expensive query
which can happen with some heat versions and configurations.
This also gives a nicer error when things go wrong as when a query fails
you know which resource had an error, and also the validation on resources
is currently stricter due to bug #1599114. We also get some additional
type validation from each OS::Heat::Value resource, e.g it checks if the
calculated value is a valid map or list.
The final advantage (and the original motivation for doing this) is that
we can easily filter null values for any outputs where this isn't already
done, which makes the config data written via openstack overcloud config
download cleaner.
Change-Id: Ia6697cf2e47f3f7b727d620536e0873a985c98c4
Moving these means we get a more accurate output from the overcloud
RoleData output, which more closely reflects what is actually
deployed.
Change-Id: I154f36c1597cf4abe29ca0bfe15a54f507433fb1
Makes it possible to resolve network subnets within a service
template; the data is transported into a new property ServiceData
wired into every service which hopefully is generic enough to
be extended in the future and transport more data.
Data can be consumed in service templates to set config values
which need to know what is the subnet where a deamon operates (for
example the Ceph Public vs Cluster network).
Change-Id: I28e21c46f1ef609517175f7e7ee19e28d1c0cba2
This new directory has now been added to the RDO packaging so we
can move things common to both puppet/container architecture here,
starting with the recently combined services.yaml
Change-Id: If2ce27188c4c15002b3ad830e8d6eb9504d2f3d2
Move to one common services.yaml not only reduces the duplication, but it
should improve performance for the docker/services.yaml case, because we were
creating two ResourceChains with $many services which we know can be really
slow (especially since we seem to be missing concurrent: true on one)
Change-Id: I76f188438bfc6449b152c2861d99738e6eb3c61b