By default, Ansible injects a variable for every fact, prefixed with
ansible_. This can result in a large number of variables for each host,
which at scale can incur a performance penalty. Ansible provides a
configuration option [0] that can be set to False to prevent this
injection of facts. In this case, facts should be referenced via
ansible_facts.<fact>.
This change updates all references to Ansible facts within Kolla Ansible
from using individual fact variables to using the items in the
ansible_facts dictionary. This allows users to disable fact variable
injection in their Ansible configuration, which may provide some
performance improvement.
This change disables fact variable injection in the ansible
configuration used in CI, to catch any attempts to use the injected
variables.
[0] https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars
Change-Id: I7e9d5c9b8b9164d4aee3abb4e37c8f28d98ff5d1
Partially-Implements: blueprint performance-improvements
monasca-thresh currently runs a local copy of the storm
to handle the threshold topology. However, it doesn't setup
the environment correctly, and the executable fails, causing
the container to continually restart.
This patch updates the container command to correctly
submit the topology to the running Apache storm. The
container will exit after it finishes the submission,
so the restart_policy is updated to on-failure, this way
if the storm is temporarily unavailable, the submission
will be retried. (NOTE: further deploys will see the
container as "changed" as it won't be running)
Patch uses KOLLA_BOOTSTRAP to trigger the container to
check if the topology is already submitted, and if so skips
the submission command so the container doesn't fail.
The config task now triggers a new reconfigure handler that
spawns a one-shot container to replace any existing topology
if the configuration has changed.
Also, all the storm.* variables in storm.yml.j2 are
removed as they were only needed for local mode and
make submitted topologies fail to load when the storm
is restarted (the referenced directories not mounted
on nimbus).
Depends-On: https://review.opendev.org/c/openstack/kolla/+/792751
Closes-Bug: #1808805
Change-Id: Ib225d76076782d695c9387e1c2693bae9a4521d7
In the Xena cycle it was decided to remove the Monasca
Grafana fork due to lack of maintenance. This commit removes
the service and provides a limited workaround using the
Monasca Grafana datasource with vanilla Grafana.
Depends-On: I9db7ec2df050fa20317d84f6cea40d1f5fd42e60
Change-Id: I4917ece1951084f6665722ba9a91d47764d3709a
In services which use the Apache HTTP server to service HTTP requests,
there exists a TimeOut directive [1] which defaults to 60 seconds. APIs
which come under heavy load, such as Cinder, can sometimes exceed this
which results in a HTTP 504 Gateway timeout, or similar. However, the
request can still be serviced without error. For example, if Nova calls
the Cinder API to detach a volume, and this operation takes longer
than the shortest of the two timeouts, Nova will emit a stack trace
with a 504 Gateway timeout. At some time later, the request to detach
the volume will succeed. The Nova and Cinder DBs then become
out-of-sync with each other, and frequently DB surgery is required.
Although strictly this category of bugs should be fixed in OpenStack
services, it is not realistic to expect this to happen in the short
term. Therefore, this change makes it easier to set the Apache HTTP
timeout via a new variable.
An example of a related bug is here:
https://bugs.launchpad.net/nova/+bug/1888665
Whilst this timeout can currently be set by overriding the WSGI
config for individual services, this change makes it much easier.
Change-Id: Ie452516655cbd40d63bdad3635fd66693e40ce34
Closes-Bug: #1917648
The Monasca alerting pipeline provides multi-tenancy alerts and
notifications. It runs as an Apache Storm topology and generally
places a significant memory and CPU burden on monitoring hosts,
particularly when there are lot of metrics. This is fine if the
alerting service is in use, but sometimes it is not. For example
you may use Prometheus for monitoring the control plane, and
wish to offer tenants a monitoring service via Monasca without
alerting and notification functionality. In this case it makes
sense to disable this part of the Monasca pipeline and this patch
adds support for that.
If the service is ever re-enabled, all alerts and notifications
should spawn back automatically since they are persisted in the
central mysql database cluster.
Change-Id: I84aa04125c621712f805f41c8efbc92c8e156db9
Historically Monasca Log Transformer has been for log
standardisation and processing. For example, logs from different
sources may use slightly different error levels such as WARN, 5,
or WARNING. Monasca Log Transformer is a place where these could
be 'squashed' into a single error level to simplify log searches
based on labels such as these.
However, in Kolla Ansible, we do this processing in Fluentd so
that the simpler Fluentd -> Elastic -> Kibana pipeline also
benefits. This helps to avoid spreading out log parsing
configuration over many services, with the Fluentd Monasca output
plugin being yet another potential place for processing (which
should be avoided). It therefore makes sense to remove this
service entirely, and squash any existing configuration which
can't be moved to Fluentd into the Log Perister service. I.e.
by removing this pipeline, we don't loose any functionality,
we encourage log processing to take place in Fluentd, or at least
outside of Monasca, and we make significant gains in efficiency
by removing a topic from Kafka which contains a copy of all logs
in transit.
Finally, users forwarding logs from outside the control plane,
eg. from tenant instances, should be encouraged to process the
logs at the point of sending using whichever framework they are
forwarding them with. This makes sense, because all Logstash
configuration in Monasca is only accessible by control plane
admins. A user can't typically do any processing inside Monasca,
with or without this change.
Change-Id: I65c76d0d1cd488725e4233b7e75a11d03866095c
Those loglevels can build up over time and create unnecessary high metrics cardinality.
Change-Id: Ib1a03772d0bd58758430b37b4f2f67126cf86fa3
Closes-bug: #1906796
When the internal VIP is moved in the event of a failure of the active
controller, OpenStack services can become unresponsive as they try to
talk with MariaDB using connections from the SQLAlchemy pool.
It has been argued that OpenStack doesn't really need to use connection
pooling with MariaDB [1]. This commit reduces the use of connection
pooling via two configuration options:
- max_pool_size is set to 1 to allow only a single connection in the
pool (it is not possible to disable connection pooling entirely via
oslo.db, and max_pool_size = 0 means unlimited pool size)
- lower connection_recycle_time from the default of one hour to 10
seconds, which means the single connection in the pool will be
recreated regularly
These settings have shown better reactivity of the system in the event
of a failover.
[1] http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html
Change-Id: Ib6a62d4428db9b95569314084090472870417f3d
Closes-Bug: #1896635
It was found to be useless in [1].
It is one of distro_python_version usages.
Note Freezer and Horizon still use python_path (and hence
distro_python_version) for different purposes.
[1] https://review.opendev.org/675822
Change-Id: I6d6d9fdf4c28cb2b686d548955108c994b685bb1
Partially-Implements: blueprint drop-distro-python-version
This updates the Elasticsearch template used by Monasca to
persist logs so that is uses the 'new' string types [1]. As
an aside it helps to make the template more clear; full text
search for log messages, and keyword searches for everything
else.
[1] https://www.elastic.co/blog/strings-are-dead-long-live-strings
Closes-Bug: #1892376
Change-Id: I0cd6bf22d4695d88d93241da4364d170d8d8c80e
This patch introduces a global keep alive timeout value for services
that leverage httpd + wsgi to handle http/https requests. The default
value is one minute.
Change-Id: Icf7cb0baf86b428a60a7e9bbed642999711865cd
Partially-Implements: blueprint add-ssl-internal-network
Switch to the Confluent Kafka client in all remaining Python based
Monasca services. This should allow us to later un-pin the Kafka
messaging version for Monasca.
Change-Id: I42bc78ffe304ba21c448c2e08b025e93a70ddb44
I9b6bf5b6690f4b4b3445e7d15a40e45dd42d2e84 was updated to use the original
config file name during review, but the config file was not renamed
accordingly. The result is that an empty config file is written out.
TrivialFix
Change-Id: I5d0384b38ddb38133e5e11df85d8cf76f4044a64
The Monasca Log API has been removed and in this change we switch
to using the unified API. If dedicated log APIs are required then
this can be supported through configuration. Out of the box the
Monasca API is used for both logs and metrics which is envisaged to
work for most use cases.
In order to use the unified API for logs, we need to disable the
legacy Kafka client. We also rename the Monasca API config file
to remove a warning about using the old style name.
Depends-On: https://review.opendev.org/#/c/728638
Change-Id: I9b6bf5b6690f4b4b3445e7d15a40e45dd42d2e84
Monasca deployment fails on master due to an invalid variable reference
(monasca_log_dir) in the config.json for monasca API and monasca log
API.
This change fixes the issue by correcting the variable definition.
Change-Id: I2ec497fa430c2f301dca6a7653ac988e49007469
Closes-Bug: #1864181
The use of default(omit) is for module parameters, not templates. We
define a default value for openstack_cacert, so it should never be
undefined anyway.
Change-Id: Idfa73097ca168c76559dc4f3aa8bb30b7113ab28
Currently the WSGI configuration for binary images uses python2.7
site-packages in some places. This change uses distro_python_version to
select the correct python path.
Change-Id: Id5f3f0ede106498b9264942fa0399d7c7862c122
Partially-Implements: blueprint python-3
Include a reference to the globally configured Certificate Authority to
all services. Services use the CA to verify HTTPs connections.
Change-Id: I38da931cdd7ff46cce1994763b5c713652b096cc
Partially-Implements: blueprint support-trusted-ca-certificate-file
Currently we don't put global Apache error logs into /var/log/kolla,
this change adds statements that redirect those logs there.
Adapted the logfile names to catch into openstack wsgi logging fluentd
input config and existing logrotate cron entries.
Change-Id: I21216e688a1993239e3e81411a4e8b6f13e138c2
Introduce kolla_address filter.
Introduce put_address_in_context filter.
Add AF config to vars.
Address contexts:
- raw (default): <ADDR>
- memcache: inet6:[<ADDR>]
- url: [<ADDR>]
Other changes:
globals.yml - mention just IP in comment
prechecks/port_checks (api_intf) - kolla_address handles validation
3x interface conditional (swift configs: replication/storage)
2x interface variable definition with hostname
(haproxy listens; api intf)
1x interface variable definition with hostname with bifrost exclusion
(baremetal pre-install /etc/hosts; api intf)
neutron's ml2 'overlay_ip_version' set to 6 for IPv6 on tunnel network
basic multinode source CI job for IPv6
prechecks for rabbitmq and qdrouterd use proper NSS database now
MariaDB Galera Cluster WSREP SST mariabackup workaround
(socat and IPv6)
Ceph naming workaround in CI
TODO: probably needs documenting
RabbitMQ IPv6-only proto_dist
Ceph ms switch to IPv6 mode
Remove neutron-server ml2_type_vxlan/vxlan_group setting
as it is not used (let's avoid any confusion)
and could break setups without proper multicast routing
if it started working (also IPv4-only)
haproxy upgrade checks for slaves based on ipv6 addresses
TODO:
ovs-dpdk grabs ipv4 network address (w/ prefix len / submask)
not supported, invalid by default because neutron_external has no address
No idea whether ovs-dpdk works at all atm.
ml2 for xenapi
Xen is not supported too well.
This would require working with XenAPI facts.
rp_filter setting
This would require meddling with ip6tables (there is no sysctl param).
By default nothing is dropped.
Unlikely we really need it.
ironic dnsmasq is configured IPv4-only
dnsmasq needs DHCPv6 options and testing in vivo.
KNOWN ISSUES (beyond us):
One cannot use IPv6 address to reference the image for docker like we
currently do, see: https://github.com/moby/moby/issues/39033
(docker_registry; docker API 400 - invalid reference format)
workaround: use hostname/FQDN
RabbitMQ may fail to bind to IPv6 if hostname resolves also to IPv4.
This is due to old RabbitMQ versions available in images.
IPv4 is preferred by default and may fail in the IPv6-only scenario.
This should be no problem in real life as IPv6-only is indeed IPv6-only.
Also, when new RabbitMQ (3.7.16/3.8+) makes it into images, this will
no longer be relevant as we supply all the necessary config.
See: https://github.com/rabbitmq/rabbitmq-server/pull/1982
For reliable runs, at least Ansible 2.8 is required (2.8.5 confirmed
to work well). Older Ansible versions are known to miss IPv6 addresses
in interface facts. This may affect redeploys, reconfigures and
upgrades which run after VIP address is assigned.
See: https://github.com/ansible/ansible/issues/63227
Bifrost Train does not support IPv6 deployments.
See: https://storyboard.openstack.org/#!/story/2006689
Change-Id: Ia34e6916ea4f99e9522cd2ddde03a0a4776f7e2c
Implements: blueprint ipv6-control-plane
Signed-off-by: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Monasca log transformer currently throws exceptions on encountering a
non-UTC time offset (+0000):
"""
"exception": "Invalid format: \"2019-08-08 17:39:45 +0100\" is malformed at \" +0100\"",
"config_parsers":"yyyy-MM-dd HH:mm:ss +0000,ISO8601"}
"""
This fix allows logstash to interpret any valid ISO8601 offset.
Change-Id: Id70c3dd9cdcf681e955931f18a054e19cc284c0a
Closes-Bug: #1839597
A user may want to define and use Logstash patterns. This
commit adds support to copy them into the Monasca Log
Transformer container. In the future support could be
added for other Logstash containers.
Change-Id: Id8cde14af6dc7f49714f6b1cb878882d0048d293
This prevents the container's root filesystem from filling up.
Change-Id: Icc5a08c82312d6688edf2ef36562967ac94e8ac9
Depends-On: https://review.opendev.org/#/c/674779
Closes-Bug: #1839149
This plugin is useful for monitoring host clock synchronisation with
an NTP reference. If the delta becomes too large, the metrics from
this plugin can be used to trigger an alarm.
Change-Id: Id1fe6d7c823f8404c19c81ccdeb8b311bcb46e47
Change I0ca38f2cc7d63b9b47eedb304ba7b00a94816f9a removed the roles
middleware from the example paste pipeline.
Change-Id: Ie9a3b0fef395aaf414407f6bae1ac4bca158240d
When using the the default domain name there are issues authenticating
with Keystone. For example, you can only log in on the second attempt
and the Monasca datasource fails to authenticate. Switching to the
default domain id resolves these issues.
Change-Id: I2cb4b2608c74dd853c97e4fc27078930bc72fdf8
backport: stein
If I deploy monasca by setting enable_monasca to true, the monasca_notification
restarts with the following error:
ERROR:__main__:MissingRequiredSource: /var/lib/kolla/config_files/notification_templates/* file is not found
These templates are optional, so we need to mark this directory as optional in
config.json.
Change-Id: Ia2dd835daa7ab1153617cc92f17c2d8d498c73e0
Closes-Bug: #1823726
By parsing the creation_time timestamp in Logstash, Elasticsearch
can parse it correctly. This closes a bug where the creation_time
timestamp was shown as a date shortly after the epoch (1970) when
viewed in Kibana.
Closes-Bug: #1816585
Change-Id: I00decfe94607845ef0eae9bec631a0e729aac3fa
Use <project>_install_type instead of kolla_install_type
to set python_path. For example, general kolla_install_type
is 'binary', but user wants to deploy Horizon from 'source'.
Horizon templates still use python_path=/usr/share/openstack-dashboard,
it is wrong.
Change-Id: Ide6a24e17b1f8ab6506aa5e53f70693706830418
Some Monasca services support sending StatsD metrics to
allow monitoring those services. This commit connects
these services to the StatsD service provided by the
Monasca Agent.
Partially-Implements: blueprint monasca-roles
Change-Id: I1da376384a31b89fea1b8a6f907aea35282909a4
The Monasca Grafana fork allows users to log into Grafana with their
OpenStack user credentials and see metrics associated with their
OpenStack project. The long term goal is to enable Keystone support
in upstream Grafana, but this work seems to have stalled.
Partially-Implements: blueprint monasca-grafana
Change-Id: Icc04613b2571c094ae23b66d0bcc38b58c0ee4e1
This changes allows the user to configure a Monasca database
which may be different from the default database.
Partially-Implements: blueprint monasca-roles
Change-Id: Ia905190b8037ecb1782a758c0b65581fe9024bf6