Due to a bug in ansible, kolla-ansible deploy currently fails in nova
with the following error when used with ansible earlier than 2.8:
TASK [nova : Waiting for nova-compute services to register themselves]
*********
task path:
/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30
fatal: [primary]: FAILED! => {
"failed": true,
"msg": "The field 'vars' has an invalid value, which
includes an undefined variable. The error was:
'nova_compute_services' is undefined\n\nThe error
appears to have been in
'/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml':
line 30, column 3, but may\nbe elsewhere in the file
depending on the exact syntax problem.\n\nThe
offending line appears to be:\n\n\n- name: Waiting
for nova-compute services to register themselves\n ^
here\n"
}
Example:
http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy
This was caused by
https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921,
which hits upon an ansible bug described here:
https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until.
We can work around this by not using an intermediary variable.
Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af
Closes-Bug: #1835817
* Fix wsrep sequence number detection. Log message format is
'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
the UUID rather than the sequence number. This is as good as random.
* Add become: true to log file reading and removal since
I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
'docker cp' command which creates it.
* Don't run handlers during recovery. If the config files change we
would end up restarting the cluster twice.
* Wait for wsrep recovery container completion (don't detach). This
avoids a potential race between wsrep recovery and the subsequent
'stop_container'.
* Finally, we now wait for the bootstrap host to report that it is in
an OPERATIONAL state. Without this we can see errors where the
MariaDB cluster is not ready when used by other services.
Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
Closes-Bug: #1834467
There are now several good tools for deploying Ceph, including Ceph
Ansible and ceph-deploy. Maintaining our own Ceph deployment is a
significant maintenance burden, and we should focus on our core mission
to deploy OpenStack. Given that this is a significant part of kolla
ansible currently we will need a long deprecation period and a migration
path to another tool.
Change-Id: Ic603c85c04d8794580a19f9efaa7a8589565f4f6
Partially-Implements: blueprint remove-ceph
There is a race condition during nova deploy since we wait for at least
one compute service to register itself before performing cells v2 host
discovery. It's quite possible that other compute nodes will not yet
have registered and will therefore not be discovered. This leaves them
not mapped into a cell, and results in the following error if the
scheduler picks one when booting an instance:
Host 'xyz' is not mapped to any cell
The problem has been exacerbated by merging a fix [1][2] for a nova race
condition, which disabled the dynamic periodic discovery mechanism in
the nova scheduler.
This change fixes the issue by waiting for all expected compute services
to register themselves before performing host discovery. This includes
both virtualised compute services and bare metal compute services.
[1] https://bugs.launchpad.net/kolla-ansible/+bug/1832987
[2] https://review.opendev.org/665554
Change-Id: I2915e2610e5c0b8d67412e7ec77f7575b8fe9921
Closes-Bug: #1835002
This is to ensure that any Depends-On does not cause Zuul not to pick up
the change for gating due to no notifications between queues.
Previously W+1-ing a change which depended on non-merged change from
the other project caused it to remain in the same state.
Change-Id: Ib2d88471ac5730c00b5a9721066d1fb3f2998c9c
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Otherwise ara had only the stderr part and logs only the
stdout part which made ordered analysis harder.
Additionally add -vvv for the bootstrap-servers run.
Change-Id: Ia42ac9b90a17245e9df277c40bda24308ebcd11d
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
This proposal will add support to Kolla-Ansible for Cloudkitty
InfluxDB storage system deployment. The feature of InfluxDB as the
storage backend for Cloudkitty was created with the following commit
https://github.com/openstack/cloudkitty/commit/
c4758e78b49386145309a44623502f8095a2c7ee
Problem Description
===================
With the addition of support for InfluxDB in Cloudkitty, which is
achieving general availability via Stein release, we need a method to
easily configure/support this storage backend system via Kolla-ansible.
Kolla-ansible is already able to deploy and configure an InfluxDB
system. Therefore, this proposal will use the InfluxDB deployment
configured via Kolla-ansible to connect to CloudKitty and use it as a
storage backend.
If we do not provide a method for users (operators) to manage
Cloudkitty storage backend via Kolla-ansible, the user has to execute
these changes/configurations manually (or via some other set of
automated scripts), which creates distributed set of configuration
files, "configurations" scripts that have different versioning schemas
and life cycles.
Proposed Change
===============
Architecture
------------
We propose a flag that users can use to make Kolla-ansible configure
CloudKitty to use InfluxDB as the storage backend system. When
enabling this flag, Kolla-ansible will also enable the deployment of
the InfluxDB via Kolla-ansible automatically.
CloudKitty will be configured accordingly to [1] and [2]. We will also
externalize the "retention_policy", "use_ssl", and "insecure", to
allow fine granular configurations to operators. All of these
configurations will only be used when configured; therefore, when they
are not set, the default value/behavior defined in Cloudkitty will be
used. Moreover, when we configure "use_ssl" to "true", the user will
be able to set "cafile" to a custom trusted CA file. Again, if these
variables are not set, the default ones in Cloudkitty will be used.
Implementation
--------------
We need to introduce a new variable called
`cloudkitty_storage_backend`. Valid options are `sqlalchemy` or
`influxdb`. The default value in Kolla-ansible is `sqlalchemy` for
backward compatibility. Then, the first step is to change the
definition for the following variable:
`/ansible/group_vars/all.yml:enable_influxdb: "{{ enable_monasca |
bool }}"`
We also need to enable InfluxDB when CloudKitty is configured to use
it as the storage backend. Afterwards, we need to create tasks in
CloudKitty configurations to create the InfluxDB schema and configure
the configuration files accordingly.
Alternatives
------------
The alternative would be to execute the configurations manually or
handle it via a different set of scripts and configurations files,
which can become cumbersome with time.
Security Impact
---------------
None identified by the author of this spec
Notifications Impact
--------------------
Operators that are already deploying CloudKitty with InfluxDB as
storage backend would need to convert their configurations to
Kolla-ansible (if they wish to adopt Kolla-ansible to execute these
tasks).
Also, deployments (OpenStack environments) that were created with
Cloudkitty using storage v1 will need to migrate all of their data to
V2 before enabling InfluxDB as the storage system.
Other End User Impact
---------------------
None.
Performance Impact
------------------
None.
Other Deployer Impact
---------------------
New configuration options will be available for CloudKitty.
* cloudkitty_storage_backend
* cloudkitty_influxdb_retention_policy
* cloudkitty_influxdb_use_ssl
* cloudkitty_influxdb_cafile
* cloudkitty_influxdb_insecure_connections
* cloudkitty_influxdb_name
Developer Impact
----------------
None
Implementation
==============
Assignee
--------
* `Rafael Weingärtner <rafaelweingartne>`
Work Items
----------
* Extend InfluxDB "enable/disable" variable
* Add new tasks to configure Cloudkitty accordingly to these new
variables that are presented above
* Write documentation and release notes
Dependencies
============
None
Documentation Impact
====================
New documentation for the feature.
References
==========
[1] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/storage.html#influxdb-v2`
[2] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/collector.html#metric-collection`
Change-Id: I65670cb827f8ca5f8529e1786ece635fe44475b0
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
This performs the same as a deploy-bifrost, but first stops the
bifrost services and container if they are running.
This can help where a docker stop may lead to an ungraceful shutdown,
possibly due to running multiple services in one container.
Change-Id: I131ab3c0e850a1d7f5c814ab65385e3a03dfcc74
Implements: blueprint bifrost-upgrade
Closes-Bug: #1834332
Some kolla-ansible jobs failed due to using external mirrors
instead of local ones.
This was due to not using the template override provided by kolla.
This patch fixes that.
Depends-On: https://review.opendev.org/668226
Change-Id: I27f714fdf05e521aa8ce25c5683a452ceb35eeb8
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
This is necessary for some Ansible tests which were renamed in 2.5 -
including 'version' and 'successful'.
Change-Id: Iacf88ef5589c7571fcf56ba8b99d3dbe76975195
otherwise I'm seeing:
TASK [monasca : Creating the monasca agent user] ****************************************************************************************************************************
fatal: [monitor1]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 172.16.3.24 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n F
ile \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 163, in <module>\r\n main()\r\n File \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 141,
in main\r\n output = client.exec_start(job)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/decorators.py\", line 19, in wrapped\r\n
return f(self, resource_id, *args, **kwargs)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/exec_api.py\", line 165, in exec_start\r\
n return self._read_from_socket(res, stream, tty)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/client.py\", line 377, in _read_from_
socket\r\n return six.binary_type().join(gen)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 75, in frames_iter\r\
n n = next_frame_size(socket)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 62, in next_frame_size\r\n data = read_exactly(socket, 8)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 47, in read_exactly\r\n next_data = read(socket, n - len(data))\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 31, in read\r\n return socket.recv(n)\r\nsocket.timeout: timed out\r\n", "msg": "MODULE FAILURE", "rc": 1}
when the monitoring nodes aren't on the public API network.
Change-Id: I7a93f69da0e02c9264da0b081d2e60626f899e3a
Previously we sourced this script in tests/deploy.sh, but this was
recently changed. Following that change we lost the errexit setting,
meaning we ignore errors in init-runonce.
Adding errexit in the script itself means that all callers get error
handling.
Also log init-runonce output.
TrivialFix
Change-Id: I9b35bd5f0f76eec26ddd968d093a3a5fd55a7ce2
Currently, we have a lot of logic for checking if a handler should run,
depending on whether config files have changed and whether the
container configuration has changed. As rm_work pointed out during
the recent haproxy refactor, these conditionals are typically
unnecessary - we can rely on Ansible's handler notification system
to only trigger handlers when they need to run. This removes a lot
of error prone code.
This patch removes conditional handler logic for all services. It is
important to ensure that we no longer trigger handlers when unnecessary,
because without these checks in place it will trigger a restart of the
containers.
Implements: blueprint simplify-handlers
Change-Id: I4f1aa03e9a9faaf8aecd556dfeafdb834042e4cd
Kolla has it already and kolla-ansible should.
Patch to backport as far as pike.
Affects only stable branches.
Change-Id: Iecc46b364ad9fc69fe67dd09ee1b4e3c5511f01c
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>