kolla-ansible

Author	SHA1	Message	Date
Mark Goddard	7b9397566a	Fix ironic inspector iPXE boot with UEFI The ironic inspector iPXE configuration includes the following kernel argument: initrd=agent.ramdisk However, the ramdisk is actually called ironic-agent.initramfs, so the argument should be: initrd=ironic-agent.initramfs In BIOS boot mode this does not cause a problem, but for compute nodes with UEFI enabled, it seems to be more strict about this, and fails to boot. Change-Id: Ic84f3b79fdd3cd1730ca2fb79c11c7a4e4d824de Closes-Bug: #1836375	2019-07-12 15:09:56 +01:00
Mark Goddard	d5e5e885d1	During deploy, always sync DB A common class of problems goes like this: * kolla-ansible deploy * Hit a problem, often in ansible/roles//tasks/bootstrap.yml Re-run kolla-ansible deploy * Service fails to start This happens because the DB is created during the first run, but for some reason we fail before performing the DB sync. This means that on the second run we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB already exists, and therefore still don't perform the DB sync. However this time, the command may complete without apparent error. We should be less careful about when we perform the DB sync, and do it whenever it is necessary. There is an argument for not doing the sync during a 'reconfigure' command, although we will not change that here. This change only always performs the DB sync during 'deploy' and 'reconfigure' commands. Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc Closes-Bug: #1823766 Closes-Bug: #1797814	2019-07-12 08:56:54 +00:00
Michal Nasiadka	4e3054b5da	Add 'allow ' to getting ceph mds keyring Sometimes getting/creating ceph mds keyring fails, similar to https://tracker.ceph.com/issues/16255 Change-Id: I47587cbeb8be0e782c13ba7f40367409e2daa8a8	2019-07-10 13:09:38 +02:00
Zuul	8ec3ffc64b	Merge "Fix nova deploy with Ansible<2.8"	2019-07-09 09:33:28 +00:00
Zuul	48223fe83c	Merge "Deprecate Ceph deployment"	2019-07-08 22:22:57 +00:00
Mark Goddard	5be093ac5a	Fix nova deploy with Ansible<2.8 Due to a bug in ansible, kolla-ansible deploy currently fails in nova with the following error when used with ansible earlier than 2.8: TASK [nova : Waiting for nova-compute services to register themselves] ********* task path: /home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml:30 fatal: [primary]: FAILED! => { "failed": true, "msg": "The field 'vars' has an invalid value, which includes an undefined variable. The error was: 'nova_compute_services' is undefined\n\nThe error appears to have been in '/home/zuul/src/opendev.org/openstack/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml': line 30, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Waiting for nova-compute services to register themselves\n ^ here\n" } Example: http://logs.openstack.org/00/669700/1/check/kolla-ansible-centos-source/81b65b9/primary/logs/ansible/deploy This was caused by https://review.opendev.org/#/q/I2915e2610e5c0b8d67412e7ec77f7575b8fe9921, which hits upon an ansible bug described here: https://github.com/markgoddard/ansible-experiments/tree/master/05-referencing-registered-var-do-until. We can work around this by not using an intermediary variable. Change-Id: I58f8fd0a6e82cb614e02fef6e5b271af1d1ce9af Closes-Bug: #1835817	2019-07-08 19:58:51 +00:00
Zuul	4fc523c3f4	Merge "Fixes for MariaDB bootstrap and recovery"	2019-07-08 09:21:55 +00:00
Zuul	ec78645928	Merge "Bump minimum Ansible version to 2.5"	2019-07-08 09:21:53 +00:00
Zuul	8daad1abcf	Merge "Wait for all compute services before cell discovery"	2019-07-05 10:31:29 +00:00
Mark Goddard	86f373a198	Fixes for MariaDB bootstrap and recovery * Fix wsrep sequence number detection. Log message format is 'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out the UUID rather than the sequence number. This is as good as random. * Add become: true to log file reading and removal since I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the 'docker cp' command which creates it. * Don't run handlers during recovery. If the config files change we would end up restarting the cluster twice. * Wait for wsrep recovery container completion (don't detach). This avoids a potential race between wsrep recovery and the subsequent 'stop_container'. * Finally, we now wait for the bootstrap host to report that it is in an OPERATIONAL state. Without this we can see errors where the MariaDB cluster is not ready when used by other services. Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583 Closes-Bug: #1834467	2019-07-05 09:20:34 +00:00
Zuul	dfa1a3844d	Merge "Add upgrade-bifrost command"	2019-07-05 09:17:16 +00:00
Zuul	70b7cddd2b	Merge "Add parameters to configure number of processes and threads of horizon"	2019-07-05 09:17:14 +00:00
Zuul	af8ae0aa41	Merge "Simplify handler conditionals"	2019-07-04 21:34:14 +00:00
Mark Goddard	e6d0e610c5	Deprecate Ceph deployment There are now several good tools for deploying Ceph, including Ceph Ansible and ceph-deploy. Maintaining our own Ceph deployment is a significant maintenance burden, and we should focus on our core mission to deploy OpenStack. Given that this is a significant part of kolla ansible currently we will need a long deprecation period and a migration path to another tool. Change-Id: Ic603c85c04d8794580a19f9efaa7a8589565f4f6 Partially-Implements: blueprint remove-ceph	2019-07-04 19:05:54 +01:00
Christian Berendt	dc3489df18	Add parameters to configure number of processes and threads of horizon Change-Id: Ib5490d504a5b7c9a37dda7babf1257aa661c11de	2019-07-04 17:23:50 +02:00
Mark Goddard	c38dd76711	Wait for all compute services before cell discovery There is a race condition during nova deploy since we wait for at least one compute service to register itself before performing cells v2 host discovery. It's quite possible that other compute nodes will not yet have registered and will therefore not be discovered. This leaves them not mapped into a cell, and results in the following error if the scheduler picks one when booting an instance: Host 'xyz' is not mapped to any cell The problem has been exacerbated by merging a fix [1][2] for a nova race condition, which disabled the dynamic periodic discovery mechanism in the nova scheduler. This change fixes the issue by waiting for all expected compute services to register themselves before performing host discovery. This includes both virtualised compute services and bare metal compute services. [1] https://bugs.launchpad.net/kolla-ansible/+bug/1832987 [2] https://review.opendev.org/665554 Change-Id: I2915e2610e5c0b8d67412e7ec77f7575b8fe9921 Closes-Bug: #1835002	2019-07-04 13:03:12 +00:00
Zuul	26f2aecfa1	Merge "Don't rotate keystone fernet keys during deploy"	2019-07-04 10:18:28 +00:00
Zuul	2ad7b50010	Merge "Cloudkitty InfluxDB Storage backend via Kolla-ansible"	2019-07-04 03:45:40 +00:00
Rafael Weingärtner	97cb30cdd8	Cloudkitty InfluxDB Storage backend via Kolla-ansible This proposal will add support to Kolla-Ansible for Cloudkitty InfluxDB storage system deployment. The feature of InfluxDB as the storage backend for Cloudkitty was created with the following commit https://github.com/openstack/cloudkitty/commit/ c4758e78b49386145309a44623502f8095a2c7ee Problem Description =================== With the addition of support for InfluxDB in Cloudkitty, which is achieving general availability via Stein release, we need a method to easily configure/support this storage backend system via Kolla-ansible. Kolla-ansible is already able to deploy and configure an InfluxDB system. Therefore, this proposal will use the InfluxDB deployment configured via Kolla-ansible to connect to CloudKitty and use it as a storage backend. If we do not provide a method for users (operators) to manage Cloudkitty storage backend via Kolla-ansible, the user has to execute these changes/configurations manually (or via some other set of automated scripts), which creates distributed set of configuration files, "configurations" scripts that have different versioning schemas and life cycles. Proposed Change =============== Architecture ------------ We propose a flag that users can use to make Kolla-ansible configure CloudKitty to use InfluxDB as the storage backend system. When enabling this flag, Kolla-ansible will also enable the deployment of the InfluxDB via Kolla-ansible automatically. CloudKitty will be configured accordingly to [1] and [2]. We will also externalize the "retention_policy", "use_ssl", and "insecure", to allow fine granular configurations to operators. All of these configurations will only be used when configured; therefore, when they are not set, the default value/behavior defined in Cloudkitty will be used. Moreover, when we configure "use_ssl" to "true", the user will be able to set "cafile" to a custom trusted CA file. Again, if these variables are not set, the default ones in Cloudkitty will be used. Implementation -------------- We need to introduce a new variable called `cloudkitty_storage_backend`. Valid options are `sqlalchemy` or `influxdb`. The default value in Kolla-ansible is `sqlalchemy` for backward compatibility. Then, the first step is to change the definition for the following variable: `/ansible/group_vars/all.yml:enable_influxdb: "{{ enable_monasca \| bool }}"` We also need to enable InfluxDB when CloudKitty is configured to use it as the storage backend. Afterwards, we need to create tasks in CloudKitty configurations to create the InfluxDB schema and configure the configuration files accordingly. Alternatives ------------ The alternative would be to execute the configurations manually or handle it via a different set of scripts and configurations files, which can become cumbersome with time. Security Impact --------------- None identified by the author of this spec Notifications Impact -------------------- Operators that are already deploying CloudKitty with InfluxDB as storage backend would need to convert their configurations to Kolla-ansible (if they wish to adopt Kolla-ansible to execute these tasks). Also, deployments (OpenStack environments) that were created with Cloudkitty using storage v1 will need to migrate all of their data to V2 before enabling InfluxDB as the storage system. Other End User Impact --------------------- None. Performance Impact ------------------ None. Other Deployer Impact --------------------- New configuration options will be available for CloudKitty. * cloudkitty_storage_backend * cloudkitty_influxdb_retention_policy * cloudkitty_influxdb_use_ssl * cloudkitty_influxdb_cafile * cloudkitty_influxdb_insecure_connections * cloudkitty_influxdb_name Developer Impact ---------------- None Implementation ============== Assignee -------- * `Rafael Weingärtner <rafaelweingartne>` Work Items ---------- * Extend InfluxDB "enable/disable" variable * Add new tasks to configure Cloudkitty accordingly to these new variables that are presented above * Write documentation and release notes Dependencies ============ None Documentation Impact ==================== New documentation for the feature. References ========== [1] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/storage.html#influxdb-v2` [2] `https://docs.openstack.org/cloudkitty/latest/admin/configuration/collector.html#metric-collection` Change-Id: I65670cb827f8ca5f8529e1786ece635fe44475b0 Signed-off-by: Rafael Weingärtner <rafael@apache.org>	2019-07-02 11:14:05 -03:00
Mark Goddard	9cac1137d0	Add upgrade-bifrost command This performs the same as a deploy-bifrost, but first stops the bifrost services and container if they are running. This can help where a docker stop may lead to an ungraceful shutdown, possibly due to running multiple services in one container. Change-Id: I131ab3c0e850a1d7f5c814ab65385e3a03dfcc74 Implements: blueprint bifrost-upgrade Closes-Bug: #1834332	2019-07-02 14:30:14 +01:00
Mark Goddard	0a769dc30b	Bump minimum Ansible version to 2.5 This is necessary for some Ansible tests which were renamed in 2.5 - including 'version' and 'successful'. Change-Id: Iacf88ef5589c7571fcf56ba8b99d3dbe76975195	2019-07-01 09:38:01 +01:00
Will Szumski	9074da56a7	Specify endpoint when creating monasca user otherwise I'm seeing: TASK [monasca : Creating the monasca agent user] **************************************************************************************************************************** fatal: [monitor1]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 172.16.3.24 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n F ile \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 163, in <module>\r\n main()\r\n File \"/tmp/ansible_I0RmxQ/ansible_module_kolla_toolbox.py\", line 141, in main\r\n output = client.exec_start(job)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/decorators.py\", line 19, in wrapped\r\n return f(self, resource_id, args, *kwargs)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/exec_api.py\", line 165, in exec_start\r\ n return self._read_from_socket(res, stream, tty)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/api/client.py\", line 377, in _read_from_ socket\r\n return six.binary_type().join(gen)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 75, in frames_iter\r\ n n = next_frame_size(socket)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 62, in next_frame_size\r\n data = read_exactly(socket, 8)\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 47, in read_exactly\r\n next_data = read(socket, n - len(data))\r\n File \"/opt/kayobe/venvs/kolla-ansible/lib/python2.7/site-packages/docker/utils/socket.py\", line 31, in read\r\n return socket.recv(n)\r\nsocket.timeout: timed out\r\n", "msg": "MODULE FAILURE", "rc": 1} when the monitoring nodes aren't on the public API network. Change-Id: I7a93f69da0e02c9264da0b081d2e60626f899e3a	2019-06-28 18:36:24 +01:00
Mark Goddard	de00bf491d	Simplify handler conditionals Currently, we have a lot of logic for checking if a handler should run, depending on whether config files have changed and whether the container configuration has changed. As rm_work pointed out during the recent haproxy refactor, these conditionals are typically unnecessary - we can rely on Ansible's handler notification system to only trigger handlers when they need to run. This removes a lot of error prone code. This patch removes conditional handler logic for all services. It is important to ensure that we no longer trigger handlers when unnecessary, because without these checks in place it will trigger a restart of the containers. Implements: blueprint simplify-handlers Change-Id: I4f1aa03e9a9faaf8aecd556dfeafdb834042e4cd	2019-06-27 15:57:19 +00:00
Zuul	85b9dabcd4	Merge "Add support for neutron custom dnsmasq.conf"	2019-06-27 13:59:42 +00:00
Zuul	651b983bdb	Merge "Restart all nova services after upgrade"	2019-06-27 13:39:12 +00:00
Zuul	e8f210a2d4	Merge "Format internal Fluentd logs"	2019-06-27 12:38:14 +00:00
Zuul	01bc357d0b	Merge "Don't drop unmatched Kolla service logs"	2019-06-27 12:25:11 +00:00
Zuul	067e40ad32	Merge "Increase log coverage for Monasca"	2019-06-27 12:22:20 +00:00
Zuul	e7c19b7413	Merge "Enable InfluxDB TSI by default"	2019-06-27 11:44:51 +00:00
Christian Berendt	a3f1ded357	Add support for neutron custom dnsmasq.conf Change-Id: Ia7041be384ac07d0a790c2c5c68b1b31ff0e567a	2019-06-27 12:20:12 +02:00
Mark Goddard	e6d2b92200	Restart all nova services after upgrade During an upgrade, nova pins the version of RPC calls to the minimum seen across all services. This ensures that old services do not receive data they cannot handle. After the upgrade is complete, all nova services are supposed to be reloaded via SIGHUP to cause them to check again the RPC versions of services and use the new latest version which should now be supported by all running services. Due to a bug [1] in oslo.service, sending services SIGHUP is currently broken. We replaced the HUP with a restart for the nova_compute container for bug 1821362, but not other nova services. It seems we need to restart all nova services to allow the RPC version pin to be removed. Testing in a Queens to Rocky upgrade, we find the following in the logs: Automatically selected compute RPC version 5.0 from minimum service version 30 However, the service version in Rocky is 35. There is a second issue in that it takes some time for the upgraded services to update the nova services database table with their new version. We need to wait until all nova-compute services have done this before the restart is performed, otherwise the RPC version cap will remain in place. There is currently no interface in nova available for checking these versions [2], so as a workaround we use a configurable delay with a default duration of 30 seconds. Testing showed it takes about 10 seconds for the version to be updated, so this gives us some headroom. This change restarts all nova services after an upgrade, after a 30 second delay. [1] https://bugs.launchpad.net/oslo.service/+bug/1715374 [2] https://bugs.launchpad.net/nova/+bug/1833542 Change-Id: Ia6fc9011ee6f5461f40a1307b72709d769814a79 Closes-Bug: #1833069 Related-Bug: #1833542	2019-06-27 09:36:20 +00:00
Mark Goddard	09e29d0db9	Don't rotate keystone fernet keys during deploy When running deploy or reconfigure for Keystone, ansible/roles/keystone/tasks/deploy.yml calls init_fernet.yml, which runs /usr/bin/fernet-rotate.sh, which calls keystone-manage fernet_rotate. This means that a token can become invalid if the operator runs deploy or reconfigure too often. This change splits out fernet-push.sh from the fernet-rotate.sh script, then calls fernet-push.sh after the fernet bootstrap performed in deploy. Change-Id: I824857ddfb1dd026f93994a4ac8db8f80e64072e Closes-Bug: #1833729	2019-06-27 08:41:27 +00:00
Radosław Piliszek	cc058f4586	Make nova external ceph key extraction tasks non-changing They are used only to obtain keys for the next task. Change-Id: I2fac22af4710b70e4df8e3a272bcfb6cc8b8532e Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>	2019-06-26 14:21:20 +02:00
Zuul	100a20769f	Merge "Add gnocchi extra volumes"	2019-06-25 18:38:37 +00:00
Zuul	b32ddaa901	Merge "link kolla_logs volume to docker_runtime_directory if docker_runtime_directory variable exists"	2019-06-24 13:35:45 +00:00
Zuul	2ef50535fe	Merge "Use'openstack_region_name' in cloudkitty collectors and fetchers"	2019-06-24 13:08:30 +00:00
Zuul	7cfab57cb9	Merge "Method to override the default ceilometer meters.yaml via Kolla-ansible"	2019-06-24 13:08:28 +00:00
Zuul	417fa831bc	Merge "Use 'openstack_service_workers' as the nb of Cloudkitty workers"	2019-06-24 13:08:26 +00:00
Zuul	a956c53181	Merge "Remove ``hnas_iscsi`` from the supported storage backends list of Cinder"	2019-06-24 13:08:24 +00:00
Zuul	03976399f0	Merge "Avoid parallel discover_hosts (nova-related race condition)"	2019-06-24 13:08:23 +00:00
Zuul	0cbebc4786	Merge "Fix the redis_connection_string for osprofiler and make it generic"	2019-06-24 13:08:21 +00:00
Zuul	4303de142c	Merge "Implement Apache WSGI for Qinling"	2019-06-24 11:30:59 +00:00
Zuul	d16b68fc1d	Merge "Designate - remove obsolete task"	2019-06-24 11:08:44 +00:00
chenxing	b7ca065edf	Remove ``hnas_iscsi`` from the supported storage backends list of Cinder The Hitachi NAS Platform iSCSI driver was marked as not supported by Cinder in the Ocata realease[1]. [1] https://review.opendev.org/#/c/444287/ Change-Id: I1a25789374fddaefc57bc59badec06f91ee6a52a Closes-Bug: #1832821	2019-06-24 09:04:14 +00:00
ZijianGuo	d23a88d7e8	Add gnocchi extra volumes In some cases, we can mount extra volumes for gnocchi to facilitate integration. Change-Id: Ife475ca7d0555562f6e3ef0867835d69d288c8c4 Signed-off-by: ZijianGuo <guozijn@gmail.com>	2019-06-24 16:57:42 +08:00
Radosław Piliszek	957f6dd453	Designate - remove obsolete task "Check if policies shall be overwritten" already exists in its newer form. The removed one had no effect on play. Change-Id: I48ed6c1c71c4162a3ab28ab2b51dc1e02932dfef Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>	2019-06-21 21:21:15 +02:00
ZijianGuo	d7c4a4f2b9	Replace merge_configs with merge_yaml for merge mongodb.conf Actually, 'mongodb.conf' is a yaml format configuration file. Do not use merge_configs to merge it. Change-Id: Id3c006df00c1e2d66472c2195781e01c640cab22 Signed-off-by: ZijianGuo <guozijn@gmail.com>	2019-06-21 22:40:25 +08:00
Doug Szumski	015ddb6e37	Enable InfluxDB TSI by default The TSI is recommended for all users. Some of the key benefits are a reduction in memory requirements and an increase in the maximum number of time series. For more information see this link: https://docs.influxdata.com/influxdb/v1.7/concepts/tsi-details/ Change-Id: I4b29eb5a4ae82f6c39059d0b6de41debdfd75508	2019-06-21 14:48:12 +01:00
Gaëtan Trellu	f867c471fb	Implement Apache WSGI for Qinling Since this review[1], Qinling supports WSGI execution. From a production perspective, Qinling should be deployed using Apache and mod_wsgi. "api_worker" option is not needed anymore because processes will be handle by Apache mod_wsgi. Qinling Docker image review[2] has ben created. [1] https://review.opendev.org/661851 [2] https://review.opendev.org/666647 Change-Id: I9aaee4c2932f1e4ea9fe780a64e96a28fa6bccfb Story: 2005920 Task: 34181	2019-06-21 09:46:02 -04:00
Zuul	bc7dea58c2	Merge "Ingest non-standard Monasca logs"	2019-06-20 10:03:11 +00:00

1 2 3 4 5 ...

5037 Commits