Kolla Ansible is switching to OpenSearch and is dropping support for
deploying ElasticSearch. This is because the final OSS release of
ElasticSearch has exceeded its end of life.
Monasca is affected because it uses both Logstash and ElasticSearch.
Whilst it may continue to work with OpenSearch, Logstash remains an
issue.
In the absence of any renewed interest in the project, we remove
support for deploying it. This helps to reduce the complexity
of log processing configuration in Kolla Ansible, freeing up
development time.
Change-Id: I6fc7842bcda18e417a3fd21c11e28979a470f1cf
Second part of patchset:
https://review.opendev.org/c/openstack/kolla-ansible/+/799229/
in which was suggested to split patch into smaller ones.
THis change adds container_engine to module parameters
so when we introduce podman, kolla_toolbox can be used
for both engines.
Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Co-authored-by: Martin Hiner <m.hiner@partner.samsung.com>
Change-Id: Ic2093aa9341a0cb36df8f340cf290d62437504ad
Second part of patchset:
https://review.opendev.org/c/openstack/kolla-ansible/+/799229/
in which was suggested to split patch into smaller ones.
This change adds container_engine variable to kolla_container_facts
module, this prepares module to be used with docker and podman as well
without further changes in roles.
Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Co-authored-by: Martin Hiner <m.hiner@partner.samsung.com>
Change-Id: I9e8fa30646844ab4a288555f3aafdda345b3a118
First part of patchset:
https://review.opendev.org/c/openstack/kolla-ansible/+/799229/
in which was suggested to split patch into smaller ones.
This implements kolla_container_engine variable
in command calls of docker,so later on it can be
also used for podman without further change.
Signed-off-by: Ivan Halomi <i.halomi@partner.samsung.com>
Change-Id: Ic30b67daa2e215524096ad1f4385c569e3d41b95
This patch adds loadbalancer-config role
which is "wrapper" around haproxy-config
and proxysql-config role which will be added
in follow-up patches.
Change-Id: I64d41507317081e1860a94b9481a85c8d400797d
Render {{ openstack_service_workers }} for workers
of each openstack service is not enough. There are
several services which has to have more workers because
there are more requests sent to them.
This patch is just adding default value for workers for
each service and sets {{ openstack_service_workers }} as
default, so value can be overrided in hostvars per server.
Nothing changed for normal user.
Change-Id: Ifa5863f8ec865bbf8e39c9b2add42c92abe40616
Fixes an issue where access rules failed to validate:
Cannot validate request with restricted access rules. Set
service_type in [keystone_authtoken] to allow access rule validation
I've used the values from the endpoint. This was mostly a straight
forward copy and paste, except:
- versioned endpoints e.g cinderv3 where I stripped the version
- monasca has multiple endpoints associated with a single service. For
this, I concatenated logging and monitoring to be logging-monitoring.
Closes-Bug: #1965111
Change-Id: Ic4b3ab60abad8c3dd96cd4923a67f2a8f9d195d7
Following up on [1].
The 3 variables are only introducing noise after we removed
the reliance on Keystone's admin port.
[1] I5099b08953789b280c915a6b7a22bdd4e3404076
Change-Id: I3f9dab93042799eda9174257e604fd1844684c1c
Role vars have a higher precedence than role defaults. This allows to
import default vars from another role via vars_files without overriding
project_name (see related bug for details).
Change-Id: I3d919736e53d6f3e1a70d1267cf42c8d2c0ad221
Related-Bug: #1951785
The admin interface for endpoints never had any real use, the
functionality was the same as for the public or internal endpoints,
except for Keystone. Even for Keystone with API v3 it would no longer
really be needed, but it is still being required by some libraries that
cannot be changed in order to stay backwards compatible.
Signed-off-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: Icf3bf08deab2c445361f0a0124d87ad8b0e4e9d9
Change Ib225d76076782d695c9387e1c2693bae9a4521d7 introduced a new
upgrade task for monasca-thresh. Because the task is not restricted to
the correct group, it fails while trying to stop monasca_thresh on hosts
not running this container.
Change-Id: I33c2c458a98145315b0de0c069f13b83f59622eb
Closes-Bug: #1952408
This service was deprecated in the Wallaby release and we
can now start removing it if it hasn't already been removed.
Change-Id: I7d825906edc4b78677d839942cba3a158f44b2e2
Updates the default value of 'monasca_ntp_server' from
'external_ntp_servers[0]' to '0.pool.ntp.org'. This is due to the
removal of the 'external_ntp_servers' variable as part of the removal of
Chrony deployment.
Change-Id: I2e7538a2e95c7b8e9280eb051ee634b4313db129
We get a nice optimisation by using a filtered loop instead
of task skipping per service with 'when'.
Partially-Implements: blueprint performance-improvements
Change-Id: I8f68100870ab90cb2d6b68a66a4c97df9ea4ff52
By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.
This change updates all references to Ansible facts within Kolla Ansible
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.
This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.
[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars
Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1
Partially-Implements: blueprint performance-improvements
monasca-thresh currently runs a local copy of the storm
to handle the threshold topology. However, it doesn't setup
the environment correctly, and the executable fails, causing
the container to continually restart.
This patch updates the container command to correctly
submit the topology to the running Apache storm. The
container will exit after it finishes the submission,
so the restart_policy is updated to on-failure, this way
if the storm is temporarily unavailable, the submission
will be retried. (NOTE: further deploys will see the
container as "changed" as it won't be running)
Patch uses KOLLA_BOOTSTRAP to trigger the container to
check if the topology is already submitted, and if so skips
the submission command so the container doesn't fail.
The config task now triggers a new reconfigure handler that
spawns a one-shot container to replace any existing topology
if the configuration has changed.
Also, all the storm.* variables in storm.yml.j2 are
removed as they were only needed for local mode and
make submitted topologies fail to load when the storm
is restarted (the referenced directories not mounted
on nimbus).
Depends-On: https://review.opendev.org/c/openstack/kolla/+/792751
Closes-Bug: #1808805
Change-Id: Ib225d76076782d695c9387e1c2693bae9a4521d7
In the Xena cycle it was decided to remove the Monasca
Grafana fork due to lack of maintenance. This commit removes
the service and provides a limited workaround using the
Monasca Grafana datasource with vanilla Grafana.
Depends-On: I9db7ec2df050fa20317d84f6cea40d1f5fd42e60
Change-Id: I4917ece1951084f6665722ba9a91d47764d3709a
In services which use the Apache HTTP server to service HTTP requests,
there exists a TimeOut directive [1] which defaults to 60 seconds. APIs
which come under heavy load, such as Cinder, can sometimes exceed this
which results in a HTTP 504 Gateway timeout, or similar. However, the
request can still be serviced without error. For example, if Nova calls
the Cinder API to detach a volume, and this operation takes longer
than the shortest of the two timeouts, Nova will emit a stack trace
with a 504 Gateway timeout. At some time later, the request to detach
the volume will succeed. The Nova and Cinder DBs then become
out-of-sync with each other, and frequently DB surgery is required.
Although strictly this category of bugs should be fixed in OpenStack
services, it is not realistic to expect this to happen in the short
term. Therefore, this change makes it easier to set the Apache HTTP
timeout via a new variable.
An example of a related bug is here:
https://bugs.launchpad.net/nova/+bug/1888665
Whilst this timeout can currently be set by overriding the WSGI
config for individual services, this change makes it much easier.
Change-Id: Ie452516655cbd40d63bdad3635fd66693e40ce34
Closes-Bug: #1917648
The Monasca alerting pipeline provides multi-tenancy alerts and
notifications. It runs as an Apache Storm topology and generally
places a significant memory and CPU burden on monitoring hosts,
particularly when there are lot of metrics. This is fine if the
alerting service is in use, but sometimes it is not. For example
you may use Prometheus for monitoring the control plane, and
wish to offer tenants a monitoring service via Monasca without
alerting and notification functionality. In this case it makes
sense to disable this part of the Monasca pipeline and this patch
adds support for that.
If the service is ever re-enabled, all alerts and notifications
should spawn back automatically since they are persisted in the
central mysql database cluster.
Change-Id: I84aa04125c621712f805f41c8efbc92c8e156db9
The Log Metrics service is an admin only service. We now have
support in Fluentd via the Prometheus plugin to create metrics
from logs. These metrics can be scraped into Monasca or Prometheus.
It therefore makes sense to deprecate this service, starting by
disabling it by default, and then removing it in the Xena release.
This should improve the stability of the Monasca metrics pipeline
by ensuring that all metrics pass via the Monasca API for
validation, and ensure that metrics generated from logs are
available to both Prometheus and Monasca users by default.
Change-Id: I704feb4434c1eece3eb00c19dc5f934fd4bc27b4
Historically Monasca Log Transformer has been for log
standardisation and processing. For example, logs from different
sources may use slightly different error levels such as WARN, 5,
or WARNING. Monasca Log Transformer is a place where these could
be 'squashed' into a single error level to simplify log searches
based on labels such as these.
However, in Kolla Ansible, we do this processing in Fluentd so
that the simpler Fluentd -> Elastic -> Kibana pipeline also
benefits. This helps to avoid spreading out log parsing
configuration over many services, with the Fluentd Monasca output
plugin being yet another potential place for processing (which
should be avoided). It therefore makes sense to remove this
service entirely, and squash any existing configuration which
can't be moved to Fluentd into the Log Perister service. I.e.
by removing this pipeline, we don't loose any functionality,
we encourage log processing to take place in Fluentd, or at least
outside of Monasca, and we make significant gains in efficiency
by removing a topic from Kafka which contains a copy of all logs
in transit.
Finally, users forwarding logs from outside the control plane,
eg. from tenant instances, should be encouraged to process the
logs at the point of sending using whichever framework they are
forwarding them with. This makes sense, because all Logstash
configuration in Monasca is only accessible by control plane
admins. A user can't typically do any processing inside Monasca,
with or without this change.
Change-Id: I65c76d0d1cd488725e4233b7e75a11d03866095c
- Increment retries: waiting 20 seconds (i.e., 10 retries) seem to
be not enough for monasca-grafana to start on the first node.
Increasing to 80 seconds (i.e., 40 retries) fixes the issue.
- Prevent the check from running when kolla_action=config. In that
case, the command would never succeed as the service is not
deployed yet (similarly to
https://review.opendev.org/c/openstack/kolla-ansible/+/771237).
Closes-Bug: #1915060
Related-Bug: #1821285
Change-Id: I7b42c51a66caed0eccf118615d841dca97a7af9d
With this patch, Monasca no longer relies on automatic topic creation
in Kafka, and instead pre-creates all topics before bringing up the
containers. If the topic already exists then it will not be
changed, therefore existing users are not affected.
This patch allows per topic customisations, such as increasing the
number of partitions on particular topics and also works around
a race condition in automatic topic creation where multiple instances
of the same service could race to create a topic causing some of the
services to restart and throw an error before resuming normal
operation.
Change-Id: Ib15c95bb72cf79e9e55945d757b248e06f5f4065
This reverts commit 9cae59be51e8d2d798830042a5fd448a4aa5e7dc.
Reason for revert: This patch was found to introduce issues with fluentd customisation. The underlying issue is not currently fully understood, but could be a sign of other obscure issues.
Change-Id: Ia4859c23d85699621a3b734d6cedb70225576dfc
Closes-Bug: #1906288
The task "Stopping all Monasca Grafana instances but the first node"
can fail with:
error while evaluating conditional (monasca_grafana_differs['result']): 'dict object' has no attribute 'result'
This is fixed by running this task on the same set of hosts than the
task defining monasca_grafana_differs, i.e. groups['monasca-grafana'].
Change-Id: I6ad0256fb2a3cdc91dddf441e5e1c41f4ac69017
Closes-Bug: #1907689
Those loglevels can build up over time and create unnecessary high metrics cardinality.
Change-Id: Ib1a03772d0bd58758430b37b4f2f67126cf86fa3
Closes-bug: #1906796