524 Commits

Author SHA1 Message Date
Mark Goddard
67c59b1cf7 Remove stale nova-consoleauth variables
Nova-consoleauth support was removed in
I099080979f5497537e390f531005a517ab12aa7a, but these variables were
left.

Change-Id: I1ce1631119bba991225835e8e409f11d53276550
2019-08-22 12:25:18 +01:00
Zuul
b93e33e78e Merge "Remove nova [DEFAULT]firewall_driver option" 2019-08-19 07:07:01 +00:00
Zuul
aa135e37f7 Merge "Standardize the configuration of "oslo_messaging" section" 2019-08-15 20:04:56 +00:00
Rafael Weingärtner
22a6223b1b Standardize the configuration of "oslo_messaging" section
After all of the discussions we had on
"https://review.opendev.org/#/c/670626/2", I studied all projects that
have an "oslo_messaging" section. Afterwards, I applied the same method
that is already used in "oslo_messaging" section in Nova, Cinder, and
others. This guarantees that we have a consistent method to
enable/disable notifications across projects based on components (e.g.
Ceilometer) being enabled or disabled. Here follows the list of
components, and the respective changes I did.

* Aodh:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.

* Congress:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.

* Cinder:
It was already properly configured.

* Octavia:
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.

* Heat:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Ceilometer:
Ceilometer publishes some messages in the rabbitMQ. However, the
default driver is "messagingv2", and not ''(empty) as defined in Oslo;
these configurations are defined in ceilometer/publisher/messaging.py.
Therefore, we do not need to do anything for the
"oslo_messaging_notifications" section in Ceilometer

* Tacker:
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Neutron:
It was already properly configured.

* Nova
It was already properly configured. However, we found another issue
with its configuration. Kolla-ansible does not configure nova
notifications as it should. If 'searchlight' is not installed (enabled)
the 'notification_format' should be 'unversioned'. The default is
'both'; so nova will send a notification to the queue
versioned_notifications; but that queue has no consumer when
'searchlight' is disabled. In our case, the queue got 511k messages.
The huge amount of "stuck" messages made the Rabbitmq cluster
unstable.

https://bugzilla.redhat.com/show_bug.cgi?id=1478274
https://bugs.launchpad.net/ceilometer/+bug/1665449

* Nova_hyperv:
I added the same configurations as in Nova project.

* Vitrage
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Searchlight
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.

* Ironic
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.

* Glance
It was already properly configured.

* Trove
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Blazar
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Sahara
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Watcher
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.

* Barbican
I created a mechanism similar to what we have in Cinder, Nova,
and others. I also added a configuration to 'keystone_notifications'
section. Barbican needs its own queue to capture events from Keystone.
Otherwise, it has an impact on Ceilometer and other systems that are
connected to the "notifications" default queue.

* Keystone
Keystone is the system that triggered this work with the discussions
that followed on https://review.opendev.org/#/c/670626/2. After a long
discussion, we agreed to apply the same approach that we have in Nova,
Cinder and other systems in Keystone. That is what we did. Moreover, we
introduce a new topic "barbican_notifications" when barbican is
enabled. We also removed the "variable" enable_cadf_notifications, as
it is obsolete, and the default in Keystone is CADF.

* Mistral:
It was hardcoded "noop" as the driver. However, that does not seem a
good practice. Instead, I applied the same standard of using the driver
and pushing to "notifications" queue if Ceilometer is enabled.

* Cyborg:
I created a mechanism similar to what we have in AODH, Cinder, Nova,
and others.

* Murano
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Senlin
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Manila
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Zun
The section is declared, but it is not used. Therefore, it will
be removed in an upcomming PR.

* Designate
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

* Magnum
It was already using a similar scheme; I just modified it a little bit
to be the same as we have in all other components

Closes-Bug: #1838985

Change-Id: I88bdb004814f37c81c9a9c4e5e491fac69f6f202
Signed-off-by: Rafael Weingärtner <rafael@apache.org>
2019-08-15 13:18:16 -03:00
Zuul
daba362f43 Merge "Handle more return codes from nova-status upgrade check" 2019-08-05 08:42:10 +00:00
Radosław Piliszek
6a737b1968 Fix handling of docker restart policy
Docker has no restart policy named 'never'. It has 'no'.
This has bitten us already (see [1]) and might bite us again whenever
we want to change the restart policy to 'no'.

This patch makes our docker integration honor all valid restart policies
and only valid restart policies.
All relevant docker restart policy usages are patched as well.

I added some FIXMEs around which are relevant to kolla-ansible docker
integration. They are not fixed in here to not alter behavior.

[1] https://review.opendev.org/667363

Change-Id: I1c9764fb9bbda08a71186091aced67433ad4e3d6
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2019-07-18 13:39:06 +00:00
Zuul
61a74c0f5b Merge "Do not require valid migration_interface for controllers" 2019-07-16 14:10:09 +00:00
Mark Goddard
d5e5e885d1 During deploy, always sync DB
A common class of problems goes like this:

* kolla-ansible deploy
* Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
* Re-run kolla-ansible deploy
* Service fails to start

This happens because the DB is created during the first run, but for some
reason we fail before performing the DB sync. This means that on the second run
we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
already exists, and therefore still don't perform the DB sync. However this
time, the command may complete without apparent error.

We should be less careful about when we perform the DB sync, and do it whenever
it is necessary. There is an argument for not doing the sync during a
'reconfigure' command, although we will not change that here.

This change only always performs the DB sync during 'deploy' and
'reconfigure' commands.

Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
Closes-Bug: #1823766
Closes-Bug: #1797814
2019-07-12 08:56:54 +00:00
Radosław Piliszek
b166d2550e Do not require valid migration_interface for controllers
Controllers lacking compute should not be required to provide
valid migration_interface as it is not used there (and prechecks
do not check that either).

Inclusion of libvirt conf section is now conditional on service type.
libvirt conf section has been moved to separate included file to
avoid evaluation of the undefined variable (conditional block did not
prevent it and using 'default' filter may hide future issues).
See https://github.com/ansible/ansible/issues/58835
Additionally this fixes the improper nesting of 'if' blocks for libvirt.

Change-Id: I77af534fbe824cfbe95782ab97838b358c17b928
Closes-Bug: #1835713
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2019-07-10 21:04:14 +02:00
Mark Goddard
5be093ac5a Fix nova deploy with Ansible<2.8
Due to a bug in ansible, kolla-ansible deploy currently fails in nova
with the following error when used with ansible earlier than 2.8:

TASK [nova : Waiting for nova-compute services to register themselves]
*********
task path:
/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30
fatal: [primary]: FAILED! => {
    "failed": true,
    "msg": "The field 'vars' has an invalid value, which
        includes an undefined variable. The error was:
        'nova_compute_services' is undefined\n\nThe error
        appears to have been in
        '/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml':
        line 30, column 3, but may\nbe elsewhere in the file
        depending on the exact syntax problem.\n\nThe
        offending line appears to be:\n\n\n- name: Waiting
        for nova-compute services to register themselves\n ^
            here\n"
}

Example:
http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy

This was caused by
https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921,
which hits upon an ansible bug described here:
https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until.

We can work around this by not using an intermediary variable.

Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af
Closes-Bug: #1835817
2019-07-08 19:58:51 +00:00
Mariusz
c68ed4dd51 Handle more return codes from nova-status upgrade check
In a single controller scenario, the "Upgrade status check result"
does nothing because the previous task can only succeed when
`nova-status upgrade check` returns code 0. This change allows this
command to fail, so that the value of returned code stored in
`nova_upgrade_check_stdout` can then be analysed.

This change also allows for warnings (rc 1) to pass.
Closes-Bug: 1834647

Change-Id: I6f5e37832f43f23604920b9d890cc505ca924ff9
2019-07-08 14:13:27 +01:00
Zuul
8daad1abcf Merge "Wait for all compute services before cell discovery" 2019-07-05 10:31:29 +00:00
Mark Goddard
c38dd76711 Wait for all compute services before cell discovery
There is a race condition during nova deploy since we wait for at least
one compute service to register itself before performing cells v2 host
discovery.  It's quite possible that other compute nodes will not yet
have registered and will therefore not be discovered. This leaves them
not mapped into a cell, and results in the following error if the
scheduler picks one when booting an instance:

Host 'xyz' is not mapped to any cell

The problem has been exacerbated by merging a fix [1][2] for a nova race
condition, which disabled the dynamic periodic discovery mechanism in
the nova scheduler.

This change fixes the issue by waiting for all expected compute services
to register themselves before performing host discovery. This includes
both virtualised compute services and bare metal compute services.

[1] https://bugs.launchpad.net/kolla-ansible/+bug/1832987
[2] https://review.opendev.org/665554

Change-Id: I2915e2610e5c0b8d67412e7ec77f7575b8fe9921
Closes-Bug: #1835002
2019-07-04 13:03:12 +00:00
Mark Goddard
177d6c72c2 Remove nova [DEFAULT]firewall_driver option
This is no longer supported, and has been removed from the
nova-dist.conf config file in RDO.

Change-Id: I6bfaf227ed997db0f39c1ff71ca6c73c49d2f060
2019-06-28 11:24:15 +01:00
Mark Goddard
de00bf491d Simplify handler conditionals
Currently, we have a lot of logic for checking if a handler should run,
depending on whether config files have changed and whether the
container configuration has changed. As rm_work pointed out during
the recent haproxy refactor, these conditionals are typically
unnecessary - we can rely on Ansible's handler notification system
to only trigger handlers when they need to run. This removes a lot
of error prone code.

This patch removes conditional handler logic for all services. It is
important to ensure that we no longer trigger handlers when unnecessary,
because without these checks in place it will trigger a restart of the
containers.

Implements: blueprint simplify-handlers

Change-Id: I4f1aa03e9a9faaf8aecd556dfeafdb834042e4cd
2019-06-27 15:57:19 +00:00
Zuul
651b983bdb Merge "Restart all nova services after upgrade" 2019-06-27 13:39:12 +00:00
Mark Goddard
e6d2b92200 Restart all nova services after upgrade
During an upgrade, nova pins the version of RPC calls to the minimum
seen across all services. This ensures that old services do not receive
data they cannot handle. After the upgrade is complete, all nova
services are supposed to be reloaded via SIGHUP to cause them to check
again the RPC versions of services and use the new latest version which
should now be supported by all running services.

Due to a bug [1] in oslo.service, sending services SIGHUP is currently
broken. We replaced the HUP with a restart for the nova_compute
container for bug 1821362, but not other nova services. It seems we need
to restart all nova services to allow the RPC version pin to be removed.

Testing in a Queens to Rocky upgrade, we find the following in the logs:

Automatically selected compute RPC version 5.0 from minimum service
version 30

However, the service version in Rocky is 35.

There is a second issue in that it takes some time for the upgraded
services to update the nova services database table with their new
version. We need to wait until all nova-compute services have done this
before the restart is performed, otherwise the RPC version cap will
remain in place. There is currently no interface in nova available for
checking these versions [2], so as a workaround we use a configurable
delay with a default duration of 30 seconds. Testing showed it takes
about 10 seconds for the version to be updated, so this gives us some
headroom.

This change restarts all nova services after an upgrade, after a 30
second delay.

[1] https://bugs.launchpad.net/oslo.service/+bug/1715374
[2] https://bugs.launchpad.net/nova/+bug/1833542

Change-Id: Ia6fc9011ee6f5461f40a1307b72709d769814a79
Closes-Bug: #1833069
Related-Bug: #1833542
2019-06-27 09:36:20 +00:00
Radosław Piliszek
cc058f4586 Make nova external ceph key extraction tasks non-changing
They are used only to obtain keys for the next task.

Change-Id: I2fac22af4710b70e4df8e3a272bcfb6cc8b8532e
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2019-06-26 14:21:20 +02:00
Zuul
03976399f0 Merge "Avoid parallel discover_hosts (nova-related race condition)" 2019-06-24 13:08:23 +00:00
Zuul
6cae4dedfe Merge "Remove nova-consoleauth" 2019-06-17 16:28:45 +00:00
Radosław Piliszek
ce680bcfe2 Avoid parallel discover_hosts (nova-related race condition)
In a rare event both kolla-ansible and nova-scheduler try to do
the mapping at the same time and one of them fails.
Since kolla-ansible runs host discovery on each deployment,
there is no need to change the default of no periodic host discovery.

I added some notes for future. They are not critical.
I made the decision explicit in the comments.
I changed the task name to satisfy recommendations.
I removed the variable because it is not used (to avoid future doubts).

Closes-Bug: #1832987

Change-Id: I3128472f028a2dbd7ace02abc179a9629ad74ceb
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2019-06-16 20:42:39 +02:00
Jeffrey Zhang
4e032923c0 Remove nova-consoleauth
The nova-consoleauth service was deprecated during the Rocky release [1]
and has not been necessary since unless you're using cells v1. As Kolla
has never supported cells v1, which is finally being removed during
Train [2], we can get ahead of the curve and stop deploying
nova-consoleauth immediately.

[1] https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/convert-consoles-to-objects.html
[2] https://blueprints.launchpad.net/nova/+spec/remove-cells-v1/

Change-Id: I099080979f5497537e390f531005a517ab12aa7a
2019-06-16 16:39:07 +08:00
Zuul
ee895413eb Merge "Stop duplicating Nova cells" 2019-06-13 18:56:00 +00:00
Zuul
888e50f01b Merge "Use become for all docker tasks" 2019-06-07 10:47:23 +00:00
Zuul
a4431930c6 Merge "Remove /%(tenant_id)s suffix from v2.1 endpoints" 2019-06-07 09:08:07 +00:00
Zuul
0a1ad98105 Merge "Support multi-region discovery of Nova cells" 2019-06-07 09:08:04 +00:00
Zuul
01f0f2387d Merge "Hide logs when looping over passwords" 2019-06-07 08:53:40 +00:00
Mark Goddard
b123bf6621 Use become for all docker tasks
Many tasks that use Docker have become specified already, but
not all. This change ensures all tasks that use the following
modules have become:

* kolla_docker
* kolla_ceph_keyring
* kolla_toolbox
* kolla_container_facts

It also adds become for 'command' tasks that use docker CLI.

Change-Id: I4a5ebcedaccb9261dbc958ec67e8077d7980e496
2019-06-06 19:04:58 +01:00
Pierre Riteau
19b8dbe460 Stop duplicating Nova cells
Check if a base Nova cell already exists before calling `nova-manage
cell_v2 create_cell`, which would otherwise create a duplicate cell when
the transport URL or database connection change.

If a base cell already exists but the connection values have changed, we
now call `nova-manage cell_v2 update_cell` instead. This is only
possible if a duplicate cell has not yet been created. If one already
exists, we print a warning inviting the operator to perform a manual
cleanup. We don't use a hard fail to avoid an abrupt change of behavior
if this is backported to stable branches.

Change-Id: I7841ce0cff08e315fd7761d84e1e681b1a00d43e
Closes-Bug: #1734872
2019-06-06 18:10:06 +01:00
Jason
30c619d1bc
Hide logs when looping over passwords
When ansible goes in to a loop, by default it prints all the keys for
the item it is looping over. Some roles, when setting up the databases,
iterate over an object that includes the database password.

Override the loop label to hide everything but the database name.

Change-Id: I336a81a5ecd824ace7d40e9a35942a1c853554cd
2019-06-05 08:09:51 -05:00
Jason
328e14253d
Support multi-region discovery of Nova cells
In a multi-region environment, each region is being deployed separately.
Cell discovery, however, would sometimes fail due to it picking a region
different than the one being deployed. Most likely, an internal endpoint
for region A will not be visible from region B. Furthermore, it is not
very useful to discover hosts on a region you're not modifying.

This changes the check to only run against nova compute services located
in the region being deployed.

Change-Id: I21eb1164c2f67098b81edbd5cc106472663b92cb
2019-06-05 08:07:13 -05:00
Zuul
9d5b405328 Merge "nova: Fix DBNotAllowed during compute startup" 2019-06-04 03:58:15 +00:00
Pierre Riteau
82551a2bfb Remove /%(tenant_id)s suffix from v2.1 endpoints
The installation guide [1] uses endpoints ending at the /v2.1 suffix.

[1] https://docs.openstack.org/nova/stein/install/controller-install.html

Change-Id: I92af045da67f9e746fd6e4b94e56bb8aa72850c4
2019-05-31 18:42:49 +01:00
Mark Goddard
002eec951f nova: Fix DBNotAllowed during compute startup
backport: stein, rocky

During startup of nova-compute, we see the following error message:

Error gathering result from cell 00000000-0000-0000-0000-000000000000:
DBNotAllowed: nova-compute

This issue was observed in devstack [1], and fixed [2] by removing
database configuration from the compute service.

This change takes the same approach, removing DB config from nova.conf
in the nova-compute* containers.

[1] https://bugs.launchpad.net/devstack/+bug/1812398
[2] 8253787137

Change-Id: I18c99ff4213ce456868e64eab63a4257910b9b8e
Closes-Bug: #1829705
2019-05-20 10:24:28 +01:00
binhong.hua
12ff28a693 Make kolla-ansible support extra volumes
When integrating 3rd party component into openstack with kolla-ansible,
maybe have to mount some extra volumes to container.

Change-Id: I69108209320edad4c4ffa37dabadff62d7340939
Implements: blueprint support-extra-volumes
2019-05-17 11:55:04 +08:00
ZhongShengping
41f3a817ac Move to opendev
1.Use opendev.org instead of git.openstack.org.
2.Use review.opendev.org instead of review.openstack.org.

You can see the discussion below:
http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html

Change-Id: Ice4509204df788a1a44a06fb89fb44cfe6b54b94
2019-04-23 13:28:39 +08:00
Mark Goddard
a4bb8567da Fix up config file permissions on the host
Several config file permissions are incorrect on the host. In general,
files should be 0660, and directories and executables 0770.

Change-Id: Id276ac1864f280554e98b937f2845bb424d521de
Closes-Bug: #1821579
2019-04-02 17:23:31 +01:00
Zuul
ed5588c934 Merge "Don't pull images during upgrade" 2019-03-28 12:41:22 +00:00
Zuul
4a5d8b0d05 Merge "Add mising handlers for external Ceph." 2019-03-26 06:17:09 +00:00
Zuul
14a52effd9 Merge "Fix booting instances after nova-compute upgrade" 2019-03-25 12:53:38 +00:00
Mark Goddard
192dcd1e1b Fix booting instances after nova-compute upgrade
After upgrading from Rocky to Stein, nova-compute services fail to start
new instances with the following error message:

Failed to allocate the network(s), not rescheduling.

Looking in the nova-compute logs, we also see this:

Neutron Reported failure on event
network-vif-plugged-60c05a0d-8758-44c9-81e4-754551567be5 for instance
32c493c4-d88c-4f14-98db-c7af64bf3324: NovaException: In shutdown, no new
events can be scheduled

During the upgrade process, we send nova containers a SIGHUP to cause
them to reload their object version state. Speaking to the nova team in
IRC, there is a known issue with this, caused by oslo.service performing
a full shutdown in response to a SIGHUP, which breaks nova-compute.
There is a patch [1] in review to address this.

The workaround employed here is to restart the nova compute service.

[1] https://review.openstack.org/#/c/641907

Change-Id: Ia4fcc558a3f62ced2d629d7a22d0bc1eb6b879f1
Closes-Bug: #1821362
2019-03-22 16:26:36 +00:00
Scott Solkhon
c70d806666 Add mising handlers for external Ceph.
When Nova, Glance, or Cinder are deployed alongside an external Ceph deployment
handlers will fail to trigger if keyring files are updated, which results in the
containers not being restarted.

This change adds the missing 'when' conditions for nova-libvirt, nova-compute,
cinder-volume, cinder-backup, and glance-api containers.

Change-Id: I8e183aac9a72e7a7210f7edc7cdcbaedd4fbcaa9
2019-03-22 11:20:34 +00:00
Mark Goddard
58d6dc3bcf Don't pull images during upgrade
When adding the rolling upgrade support, some upgrade procedures were
modified to pull images explicitly. This is done inconsistently between
services, and is a change in behaviour from Rocky and earlier releases.

This change removes all image pulling from upgrade tasks.

Change-Id: Id0fed17714235e1daed60b83b1f30620f097eb97
2019-03-20 18:51:45 +00:00
Mark Goddard
40497507ee Use endpoint_override for nova-compute-ironic
The api_endpoint option was deprecated, and will be removed by
https://review.openstack.org/643483.

Change-Id: Ie56a8ab07ab21d2e7d678e636c1408099d8ab3aa
2019-03-18 10:27:11 +00:00
Eduardo Gonzalez
2fc6d4cfc5 Split placement from nova
Depends-On: https://review.openstack.org/#/c/642958
Depends-On: https://review.openstack.org/642984
Change-Id: If795a9eb3ec92f75867ce3f755d6b832eba31af9
2019-03-15 15:19:54 +00:00
chenxing
6722e18465 ubuntu: update configuration Stein UCA
Update wsgi configuration after services migrating to python3.

Change-Id: I25d8db36dabd5f148b2ec96a30381c6a86fa710e
Depends-On: https://review.openstack.org/#/c/625298/
Partially Implements: blueprint python3-support
2019-03-13 21:25:51 +08:00
Zuul
534b491a53 Merge "Allow ironic services to use independent hostnames" 2019-03-11 12:34:23 +00:00
Zuul
372609dca3 Merge "Use keystone_*_url var in all configs" 2019-03-07 12:26:26 +00:00
Zuul
cb648f7816 Merge "Restart containers when ceph.conf changed" 2019-03-07 11:34:00 +00:00
Jim Rollenhagen
d1d1837c25 Allow ironic services to use independent hostnames
This allows ironic service endpoints to use custom hostnames, and adds the
following variables:

* ironic_internal_fqdn
* ironic_external_fqdn
* ironic_inspector_internal_fqdn
* ironic_inspector_external_fqdn

These default to the old values of kolla_internal_fqdn or
kolla_external_fqdn.

This also adds ironic_api_listen_port and ironic_inspector_listen_port
options, which default to ironic_api_port and ironic_inspector_port for
backward compatibility.

These options allow the user to differentiate between the port the
service listens on, and the port the service is reachable on. This is
useful for external load balancers which live on the same host as the
service itself.

Change-Id: I45b175e85866b4cfecad8451b202a5a27f888a84
Implements: blueprint service-hostnames
2019-03-06 15:08:28 -05:00