kolla-ansible

Author	SHA1	Message	Date
Michal Arbet	4838591c6c	Add loadbalancer-config role and wrap haproxy-config role inside This patch adds loadbalancer-config role which is "wrapper" around haproxy-config and proxysql-config role which will be added in follow-up patches. Change-Id: I64d41507317081e1860a94b9481a85c8d400797d	2022-08-09 12:15:49 +02:00
Radosław Piliszek	fe522955d5	Remove RabbitMQ policy removal code It is no longer needed per the removed comment. Change-Id: I8d88c21c7e115b842a56f0ba5c780c3bde593964	2022-07-27 21:58:10 +02:00
Michal Nasiadka	dcf5a8b65f	Fix var-spacing ansible-lint introduced var-spacing - let's fix our code. Change-Id: I0d8aaf3c522a5a6a5495032f6dbed8a2be0251f0	2022-07-25 22:15:15 +02:00
Radosław Piliszek	3e75a33ad4	Use the new image naming scheme Change-Id: Ib4b15ed4feac82d8492b1c0f0238a752eac668e6	2022-05-23 06:37:25 +00:00
Marcin Juszkiewicz	1620ab5be9	drop install_type from image names We have only one value for install_type now and it gets removed from image names. Change-Id: I8bf95fd7aa9dd26b80d618ca0fcb097003b4cb0a	2022-04-20 12:29:12 +02:00
Zuul	3f5d77af61	Merge "rabbitmq: follow up for classic queue mirror removal"	2022-03-31 15:21:51 +00:00
Zuul	60f8b7410d	Merge "re-add rabbitmq config for clustering interface"	2022-03-25 09:52:10 +00:00
Sven Kieske	1599252483	re-add rabbitmq config for clustering interface this adds back the ability to configure the rabbitmq/erlang kernel network interface which was removed in https://review.opendev.org/#/c/584427/ seemingly by accident. Closes-Bug: 1900160 Change-Id: I6f00396495853e117429c17fadfafe809e322a31	2022-03-24 17:52:17 +01:00
Mark Goddard	692e10eff2	rabbitmq: follow up for classic queue mirror removal Follow up to I91d0e23b22319cf3fdb7603f5401d24e3b76a56e, which fixes a conditional corner case when removing the ha-all policy. Change-Id: Iea75551bc6d0da7dd10515dd8bd28c014eed7a5e	2022-03-18 17:30:33 +00:00
Doug Szumski	6bfe1927f0	Remove classic queue mirroring for internal RabbitMQ When OpenStack is deployed with Kolla-Ansible, by default there are no durable queues or exchanges created by the OpenStack services in RabbitMQ. In Rabbit terminology, not being durable is referred to as `transient`, and this means that the queue is generally held in memory. Whether OpenStack services create durable or transient queues is traditionally controlled by the Oslo Notification config option: `amqp_durable_queues`. In Kolla-Ansible, this remains set to the default of `False` in all services. The only `durable` objects are the `amq*` exchanges which are internal to RabbitMQ. More recently, Oslo Notification has introduced support for Quorum queues [7]. These are a successor to durable classic queues, however it isn't yet clear if they are a good fit for OpenStack in general [8]. For clustered RabbitMQ deployments, Kolla-Ansible configures all queues as `replicated` [1]. Replication occurs over all nodes in the cluster. RabbitMQ refers to this as 'mirroring of classic queues'. In summary, this means that a multi-node Kolla-Ansible deployment will end up with a large number of transient, mirrored queues and exchanges. However, the RabbitMQ documentation warns against this, stating that 'For replicated queues, the only reasonable option is to use durable queues: [2]`. This is discussed further in the following bug report: [3]. Whilst we could try enabling the `amqp_durable_queues` option for each service (this is suggested in [4]), there are a number of complexities with this approach, not limited to: 1) RabbitMQ is planning to remove classic queue mirroring in favor of 'Quorum queues' in a forthcoming release [5]. 2) Durable queues will be written to disk, which may cause performance problems at scale. Note that this includes Quorum queues which are always durable. 3) Potential for race conditions and other complexity discussed recently on the mailing list under: `[ops] [kolla] RabbitMQ High Availability` The remaining option, proposed here, is to use classic non-mirrored queues everywhere, and rely on services to recover if the node hosting a queue or exchange they are using fails. There is some discussion of this approach in [6]. The downside of potential message loss needs to be weighed against the real upsides of increasing the performance of RabbitMQ, and moving to a configuration which is officially supported and hopefully more stable. In the future, we can then consider promoting specific queues to quorum queues, in cases where message loss can result in failure states which are hard to recover from. [1] https://www.rabbitmq.com/ha.html [2] https://www.rabbitmq.com/queues.html [3] https://github.com/rabbitmq/rabbitmq-server/issues/2045 [4] https://wiki.openstack.org/wiki/Large_Scale_Configuration_Rabbit [5] https://blog.rabbitmq.com/posts/2021/08/4.0-deprecation-announcements/ [6] https://fuel-ccp.readthedocs.io/en/latest/design/ref_arch_1000_nodes.html#replication [7] https://bugs.launchpad.net/oslo.messaging/+bug/1942933 [8] https://www.rabbitmq.com/quorum-queues.html#use-cases Partial-Bug: #1954925 Change-Id: I91d0e23b22319cf3fdb7603f5401d24e3b76a56e	2022-02-21 18:54:04 +00:00
Zuul	d83959beaf	Merge "multiple: remove duplicated variables between defaults and group vars"	2022-01-18 16:34:41 +00:00
Michal Nasiadka	1736c788d1	multiple: remove duplicated variables between defaults and group vars Change-Id: I547ab4b05aa14ed3bbee8be2dc77a6840d4816f6	2022-01-12 09:28:41 +00:00
Zuul	c052bbff90	Merge "rabbitmq: enable/disable prometheus plugin follow up"	2022-01-11 12:29:02 +00:00
Mark Goddard	71f24586de	rabbitmq: enable/disable prometheus plugin follow up Move new variables added in I4d694d6224c813285d228d6bc7eece5731db1078 to role defaults. Change-Id: Ie09a2dbae2701cb18fd1eb5bfab76e82f9920fb3	2022-01-11 09:10:08 +00:00
Zuul	93ececffae	Merge "Support enable/disable rabbitmq prometheus plugins"	2022-01-10 19:00:35 +00:00
LinPeiWen	1f3dcce5ac	Support enable/disable rabbitmq prometheus plugins rabbitmq starting from 3.8.0, built-in Prometheus support, prometheus plugins are enabled by default, when the environment is "enable_prometheus is no", rabbitmq role will disable prometheus plugins Closes-Bug: #1885106 Change-Id: I4d694d6224c813285d228d6bc7eece5731db1078	2022-01-09 09:50:00 +00:00
Pierre Riteau	56fc74f231	Move project_name and kolla_role_name to role vars Role vars have a higher precedence than role defaults. This allows to import default vars from another role via vars_files without overriding project_name (see related bug for details). Change-Id: I3d919736e53d6f3e1a70d1267cf42c8d2c0ad221 Related-Bug: #1951785	2021-12-31 09:26:25 +00:00
Zuul	a98076f11c	Merge "Use more RMQ flags for less busy wait"	2021-08-19 18:20:13 +00:00
Radosław Piliszek	9ff2ecb031	Refactor and optimise image pulling We get a nice optimisation by using a filtered loop instead of task skipping per service with 'when'. Partially-Implements: blueprint performance-improvements Change-Id: I8f68100870ab90cb2d6b68a66a4c97df9ea4ff52	2021-08-10 11:57:54 +00:00
Radosław Piliszek	d7cdad5325	Use more RMQ flags for less busy wait As mentioned in the Iced014acee7e590c10848e73feca166f48b622dc commit message, in Ussuri+ we can use ``+sbwtdcpu none +sbwtdio none`` as well. This is due to relying on RMQ-provided erlang in version 23.x. This change adds the extra arguments by default. It should be backported down to Ussuri before we do a release with Iced014acee7e590c10848e73feca166f48b622dc. Change-Id: I32e247a6cb34d7f6763b544f247fd408dce2b3a2	2021-07-28 19:14:43 +00:00
Zuul	bc060c2049	Merge "Use ansible_facts to reference facts"	2021-07-01 02:37:09 +00:00
Zuul	f80bc6d998	Merge "Use Docker healthchecks for rabbitmq services"	2021-06-24 13:17:27 +00:00
Mark Goddard	ade5bfa302	Use ansible_facts to reference facts By default, Ansible injects a variable for every fact, prefixed with ansible_. This can result in a large number of variables for each host, which at scale can incur a performance penalty. Ansible provides a configuration option [0] that can be set to False to prevent this injection of facts. In this case, facts should be referenced via ansible_facts.<fact>. This change updates all references to Ansible facts within Kolla Ansible from using individual fact variables to using the items in the ansible_facts dictionary. This allows users to disable fact variable injection in their Ansible configuration, which may provide some performance improvement. This change disables fact variable injection in the ansible configuration used in CI, to catch any attempts to use the injected variables. [0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1 Partially-Implements: blueprint performance-improvements	2021-06-23 10:38:06 +01:00
Zuul	01142ecf2d	Merge "Reduce RabbitMQ busy waiting, lowering CPU load"	2021-06-11 09:35:24 +00:00
Mark Goddard	0cd5b027c9	Fix RabbitMQ restart ordering The host list order seen during Ansible handlers may differ to the usual play host list order, due to race conditions in notifying handlers. This means that restart_services.yml for RabbitMQ may be included in a different order than the rabbitmq group, resulting in a node other than the 'first' being restarted first. This can cause some nodes to fail to join the cluster. The include_tasks loop was introduced in [1]. This change fixes the issue by splitting the handler into two tasks, and restarting the first node before all others. [1] https://review.opendev.org/c/openstack/kolla-ansible/+/763137 Change-Id: I1823301d5889589bfd48326ed7de03c6061ea5ba Closes-Bug: #1930293	2021-06-08 08:20:46 +00:00
John Garbutt	70f6f8e4c0	Reduce RabbitMQ busy waiting, lowering CPU load On machines with many cores, we were seeing excessive CPU load on systems that were not very busy. With the following Erlang VM argument we saw RabbitMQ CPU usage drop from about 150% to around 20%, on a system with 40 hyperthreads. +S 2:2 By default RabbitMQ starts N schedulers where N is the number of CPU cores, including hyper-threaded cores. This is fine when you assume all your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical Kolla Ansible setup. Here we go for two scheduler threads. More details can be found here: https://www.rabbitmq.com/runtime.html#scheduling and here: https://erlang.org/doc/man/erl.html#emulator-flags +sbwt none This stops busy waiting of the scheduler, for more details see: https://www.rabbitmq.com/runtime.html#busy-waiting Newer versions of rabbit may need additional flags: "+sbwt none +sbwtdcpu none +sbwtdio none" But this patch should be back portable to older versions of RabbitMQ used in Train and Stein. Note that information on this tuning was found by looking at data from: rabbitmq-diagnostics runtime_thread_stats More details on that can be found here: https://www.rabbitmq.com/runtime.html#thread-stats Related-Bug: #1846467 Change-Id: Iced014acee7e590c10848e73feca166f48b622dc	2021-06-07 13:18:39 +01:00
Michal Arbet	c99841272f	Make rabbitmq cluster_partition_handling configurable Change-Id: If2fdab2ae0f981d9fcbb0fea7a92fcde325804f8	2021-05-21 10:55:24 +02:00
LinPeiWen	1ddef85973	Use Docker healthchecks for rabbitmq services This change enables the use of Docker healthchecks for rabbitmq services. Implements: blueprint container-health-check Depends-On: https://review.opendev.org/c/openstack/kolla/+/784562 Change-Id: I23a2c2efab858b9ed39c6ce0ec4a82df10e7f93d	2021-04-14 05:47:13 +00:00
Zuul	860c32de76	Merge "Revert "Performance: Use import_tasks in the main plays""	2020-12-15 19:52:24 +00:00
Mark Goddard	db4fc85c33	Revert "Performance: Use import_tasks in the main plays" This reverts commit 9cae59be51e8d2d798830042a5fd448a4aa5e7dc. Reason for revert: This patch was found to introduce issues with fluentd customisation. The underlying issue is not currently fully understood, but could be a sign of other obscure issues. Change-Id: Ia4859c23d85699621a3b734d6cedb70225576dfc Closes-Bug: #1906288	2020-12-14 10:36:55 +00:00
Zuul	ace3562f2e	Merge "RabbitMQ handler refactored to restart services in serial"	2020-12-01 14:28:46 +00:00
Victor Chembaev	4cc4ba59da	RabbitMQ handler refactored to restart services in serial Change-Id: I1ff4cbdf3f60cb7fd5fe5d3c5d498e05fe2df79a Closes-Bug: #1904702	2020-11-19 20:11:58 +03:00
Radosław Piliszek	9cae59be51	Performance: Use import_tasks in the main plays Main plays are action-redirect-stubs, ideal for import_tasks. This avoids 'include' penalty and makes logs/ara look nicer. Fixes haproxy and rabbitmq not to check the host group as well. Change-Id: I46136fc40b815e341befff80b54a91ef431eabc0 Partially-Implements: blueprint performance-improvements	2020-10-27 19:09:32 +01:00
Radosław Piliszek	3411b9e420	Performance: optimize genconfig Config plays do not need to check containers. This avoids skipping tasks during the genconfig action. Ironic and Glance rolling upgrades are handled specially. Swift and Bifrost do not use the handlers at all. Partially-Implements: blueprint performance-improvements Change-Id: I140bf71d62e8f0932c96270d1f08940a5ba4542a	2020-10-12 19:30:06 +02:00
Zuul	ba933f16e9	Merge "Support TLS encryption of RabbitMQ client-server traffic"	2020-09-29 11:31:03 +00:00
Mark Goddard	761ea9a333	Support TLS encryption of RabbitMQ client-server traffic This change adds support for encryption of communication between OpenStack services and RabbitMQ. Server certificates are supported, but currently client certificates are not. The kolla-ansible certificates command has been updated to support generating certificates for RabbitMQ for development and testing. RabbitMQ TLS is enabled in the all-in-one source CI jobs, or when The Zuul 'tls_enabled' variable is true. Change-Id: I4f1d04150fb2b5af085b762890092f87ae6076b5 Implements: blueprint message-queue-ssl-support	2020-09-17 12:05:44 +01:00
Mark Goddard	b685ac44e0	Performance: replace unconditional include_tasks with import_tasks Including tasks has a performance penalty when compared with importing tasks. If the include has a condition associated with it, then the overhead of the include may be lower than the overhead of skipping all imported tasks. For unconditionally included tasks, switching to import_tasks provides a clear benefit. Benchmarking of include vs. import is available at [1]. This change switches from include_tasks to import_tasks where there is no condition applied to the include. [1] https://github.com/stackhpc/ansible-scaling/blob/master/doc/include-and-import.md#task-include-and-import Partially-Implements: blueprint performance-improvements Change-Id: Ia45af4a198e422773d9f009c7f7b2e32ce9e3b97	2020-08-28 16:12:03 +00:00
Mark Goddard	146b00efa7	Mount /etc/timezone based on host OS Previously we mounted /etc/timezone if the kolla_base_distro is debian or ubuntu. This would fail prechecks if debian or ubuntu images were deployed on CentOS. While this is not a supported combination, for correctness we should fix the condition to reference the host OS rather than the container OS, since that is where the /etc/timezone file is located. Change-Id: Ifc252ae793e6974356fcdca810b373f362d24ba5 Closes-Bug: #1882553	2020-08-10 10:14:18 +01:00
Mark Goddard	9702d4c3c3	Performance: use import_tasks for check-containers.yml Including tasks has a performance penalty when compared with importing tasks. If the include has a condition associated with it, then the overhead of the include may be lower than the overhead of skipping all imported tasks. In the case of the check-containers.yml include, the included file only has a single task, so the overhead of skipping this task will not be greater than the overhead of the task import. It therefore makes sense to switch to use import_tasks there. Partially-Implements: blueprint performance-improvements Change-Id: I65d911670649960708b9f6a4c110d1a7df1ad8f7	2020-07-28 12:10:59 +01:00
Mark Goddard	56ae2db7ac	Performance: Run common role in a separate play The common role was previously added as a dependency to all other roles. It would set a fact after running on a host to avoid running twice. This had the nice effect that deploying any service would automatically pull in the common services for that host. When using tags, any services with matching tags would also run the common role. This could be both surprising and sometimes useful. When using Ansible at large scale, there is a penalty associated with executing a task against a large number of hosts, even if it is skipped. The common role introduces some overhead, just in determining that it has already run. This change extracts the common role into a separate play, and removes the dependency on it from all other roles. New groups have been added for cron, fluentd, and kolla-toolbox, similar to other services. This changes the behaviour in the following ways: * The common role is now run for all hosts at the beginning, rather than prior to their first enabled service * Hosts must be in the necessary group for each of the common services in order to have that service deployed. This is mostly to avoid deploying on localhost or the deployment host * If tags are specified for another service e.g. nova, the common role will not automatically run for matching hosts. The common tag must be specified explicitly The last of these is probably the largest behaviour change. While it would be possible to determine which hosts should automatically run the common role, it would be quite complex, and would introduce some overhead that would probably negate the benefit of splitting out the common role. Partially-Implements: blueprint performance-improvements Change-Id: I6a4676bf6efeebc61383ec7a406db07c7a868b2a	2020-07-07 15:00:47 +00:00
Zuul	f942e93d12	Merge "Deprecate rabbitmq_hipe_compile"	2020-05-17 12:47:35 +00:00
Christian Berendt	19564b1533	Deprecate rabbitmq_hipe_compile Erlang 22.x dropped support for HiPE so use of "rabbitmq_hipe_compile" is deprecated. Change-Id: I8e0173c7aa6204e5b4c60dafbb8b464482cae90b	2020-04-27 10:46:00 +00:00
Zuul	87984f5425	Merge "Add Ansible group check to prechecks"	2020-04-16 15:33:46 +00:00
Dincer Celik	4b5df0d866	Introduce /etc/timezone to Debian/Ubuntu containers Some services look for /etc/timezone on Debian/Ubuntu, so we should introduce it to the containers. In addition, added prechecks for /etc/localtime and /etc/timezone. Closes-Bug: #1821592 Change-Id: I9fef14643d1bcc7eee9547eb87fa1fb436d8a6b3	2020-04-09 18:53:36 +00:00
Radosław Piliszek	266fd61ad7	Use "name:" instead of "role:" for *_role modules Both include_role and import_role expect role's name to be given via "name" param instead of "role". This worked but caused errors with ansible-lint. See: https://review.opendev.org/694779 Change-Id: I388d4ae27111e430d38df1abcb6c6127d90a06e0	2020-03-02 10:01:17 +01:00
Mark Goddard	49fb55f182	Add Ansible group check to prechecks We assume that all groups are present in the inventory, and quite obtuse errors can result if any are not. This change adds a precheck that checks for the presence of all expected groups in the inventory for each service. It also introduces a common service-precheck role that we can use for other common prechecks. Change-Id: Ia0af1e7df4fff7f07cd6530e5b017db8fba530b3 Partially-Implements: blueprint improve-prechecks	2020-02-28 16:23:14 +00:00
Radosław Piliszek	5dd9c532c6	Fix RabbitMQ hostname address resolution precheck Make it require uniqueness of resolution as well to avoid later issues with RabbitMQ going crazy. Change-Id: I000ba6c62ab44eac0abdf8d5d1f069adfbc6552f Closes-bug: #1863363	2020-02-16 10:07:12 +01:00
Zuul	5126087af5	Merge "CentOS 8: Support variable image tag suffix"	2020-01-21 09:29:58 +00:00
Mark Goddard	5fb10e08fe	Ansible lint: use command module instead of shell Change-Id: Ibf40216b847f103e383f19fe1ef608a75fcfd452 Co-Authored-By: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>	2020-01-13 10:45:10 +00:00
Mark Goddard	a6cb008c54	Ansible lint: task names Change-Id: Iecbc2fe5fa3391dca5a3cc7e575314b95942114b Co-Authored-By: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>	2020-01-13 10:38:12 +00:00

1 2 3 4

166 Commits